Malware Analyzer is an enterprise-grade, open-source static analysis framework designed for researchers, SOC analysts, and cybersecurity enthusiasts. It automates the dissection of suspicious binaries, extracting Critical Indicators of Compromise (IOCs) and leveraging Generative AI (Google Gemini) to produce human-readable threat reports.
🔍 Why this tool? Analyzing malware manually requires distinct tools for Windows (
PEStudio), Linux (readelf), and Android (jadx). Malware Analyzer unifies these into a single, automated pipeline with a modern web interface.
| Command Center | Threat Intelligence |
|---|---|
![]() |
![]() |
| Centralized Dashboard for Analysis | High-Level Verdict & Risk Scoring |
| AI Detective | Code Inspection |
|---|---|
![]() |
![]() |
| Gemini AI Explaining Attack Vectors | Assembly Code & String Extraction |
We don't just "read" files; we dissect their internal organs. Here is exactly what we analyze for each format:
Extensions: .exe, .dll, .sys
- What it is: The standard format for Windows programs. It contains code, data, and resources wrapped in specific "Headers".
- What we analyze:
- DOS Header & NT Header: Checked for validity and machine type (x86 vs x64).
- Timestamp: Detects "TimeStomping" (when attackers fake the compilation date).
- Imports (IAT): We list every function the malware borrows from Windows.
- Suspicious:
WriteProcessMemory(Injecting code),SetWindowsHookEx(Keylogging),InternetOpen(C2 Communication).
- Suspicious:
- Sections: we look for
.text(code) and.data(variables). If a section is non-standard (e.g., named.upx0), it indicates Packing.
Extensions: Binary files (no extension), .so
- What it is: The standard binary format for Unix, Linux, and many IoT devices.
- Why it matters: Most server-side malware and IoT Botnets (like Mirai) are ELFs.
- What we analyze:
- Program Headers: Describes how the OS should create the process.
- Section Headers: Contains linking information.
- Dynamic Tags: Lists external libraries (
libc.so,libssl.so). Malware often statically links libraries to avoid dependencies.
Extensions: .apk
- What it is: A zip archive containing
classes.dex(Dalvik Executable) code and anAndroidManifest.xmlfile. - What we analyze:
- Permissions: we scan
AndroidManifest.xmlfor dangerous requests.- Critical:
RECEIVE_SMS(Stealing OTPS),READ_CONTACTS,ACCESS_FINE_LOCATION.
- Critical:
- Secrets: We scan for hardcoded API keys (AWS, Google Maps) often left by developers.
- Permissions: we scan
This application uses a multi-layered approach to determine if a file is malicious.
Before looking inside, we calculate the file's Fingerprint.
- MD5 & SHA256: Unique strings representing the file content.
- ImpHash (Import Hash): A hash calculated based on only the imported functions.
- Significance: If a hacker recompiles their malware with minor changes, the SHA256 changes, but the ImpHash often remains the same, allowing us to link it to the same threat actor.
-
The Math:
$\sum P(x) \log_2 P(x)$ - The Logic: Measures the randomness of data in the file on a scale of 0 to 8.
-
The Verdict:
- 0 - 5.5: Normal Code (Structured).
- 6.0 - 6.8: Suspicious (Possibly mild obfuscation).
- 7.0 - 8.0: CRITICAL. The code is mathematically random. This means it is Packed (compressed) or Encrypted. Legitimate software rarely has entropy this high in its code section.
- What it is: A pattern-matching engine for malware researchers.
- How we use it: We compile a database of regex-based rules.
- Example: If we see the byte sequence
E8 ?? ?? ?? ?? 8B 45 08near text saying "WannaDecryptor", YARA flags it as Ransomware.
- Example: If we see the byte sequence
- Google Gemini (GenAI): We construct a JSON prompt containing the Entropy, Top Strings, Imports, and YARA matches.
- The Prompt: "Act as a Level 3 Security Analyst. Analyze these technical artifacts and explain the attack chain."
- The Output: A natural language explanation of the threat, bridging the gap between raw data and human understanding.
For those new to malware analysis, here is exactly what happens when you click "Scan":
1. The Upload
You upload a file (e.g., suspicious_invoice.exe). The server instantly saves it to a secure, isolated folder and renames it to a random ID to prevent it from accidentally running.
2. The Identification
The tool looks at the file's "Magic Bytes" (the first few hex digits).
- If it sees
4D 5A, it knows it's a Windows App. - If it sees
7F 45 4C 46, it knows it's a Linux App. This ensures we don't try to read a PDF like it's a program.
3. The Extraction
We pull out the "Metadata". Think of this like reading the nutrition label on a cereal box.
- Imports: What ingredients does it use? (Does it use "Internet" functions? Does it use "Keyboard" functions?).
- Strings: We dump all text. If we see "192.168.1.10" or "wallet.dat", that's a clue.
4. The Verdict (Risk Scoring)
We calculate a score (0-100).
- Is it packed? (+40 points)
- Does YARA say it's ransomware? (+50 points)
- Final Score: 90/100 (Malicious).
5. The AI Expert Opinion (Google Gemini)
Finally, we send all these clues to Google Gemini AI. It acts as a virtual senior analyst, synthesizing the data to write a detailed report: "This file appears to be a Keylogger. It hooks the keyboard API and tries to send captured keystrokes to an external IP address."
| MITRE ATT&CK Mapping | AI Threat Summary |
|---|---|
![]() |
![]() |
| Detects specific hacker techniques like 'Input Capture' or 'Defense Evasion' | Detailed explanation of capabilities generated by GenAI |
graph TD
User["User / Client"] -->|Uploads File| Web["Flask Web Server (app.py)"]
Web -->|Checks Cache| DB[("Report Storage (JSON)")]
subgraph "Core Analysis Engine (analyze.py)"
Orchestrator["Analysis Orchestrator"]
Orchestrator -->|Feature Extraction| Hashing["Hashing (SHA256/ImpHash)"]
Orchestrator -->|Feature Extraction| Entropy["Shannon Entropy Calc"]
Orchestrator -->|Feature Extraction| Strings["String Extraction"]
Orchestrator -->|Dispatch| Type{"File Type?"}
Type -->|PE/Windows| PE["PE Header Parser"]
Type -->|ELF/Linux| ELF["ELF Segment Parser"]
Type -->|APK/Android| APK["Manifest Analyzer"]
end
subgraph "Threat Intelligence"
YARA["YARA Rule Engine"]
VT["VirusTotal API v3"]
GenAI["Google Gemini Flash"]
end
Orchestrator --> YARA
Orchestrator --> VT
Orchestrator --> GenAI
PE & ELF & APK & YARA & VT & GenAI --> Report["Final Report Object"]
Report -->|Save| DB
Report -->|Render| UI["Web Interface (HTML/JS)"]
- Python 3.10 or higher
- Internet Connection (for VirusTotal/Gemini APIs)
# 1. Clone the repository
git clone https://github.com/souravkr529/Malware-Analyzer.git
cd Malware-Analyzer
# 2. Install Python Dependencies
pip install -r requirements.txt# Start the Web Server
python run.py- Access the dashboard at:
http://127.0.0.1:5000
📌 Full English walkthrough explaining features, architecture, and AI-based malware analysis.
📌 Step-by-step Hindi explanation of the Malware Analyzer tool for beginners and students.
Safety First:
- ❌ NEVER run this tool on your host operating system if you are handling live malware.
- ✅ ALWAYS use a Virtual Machine (VM) (VMware, VirtualBox) or a Sandbox.
- ✅ This tool performs Static Analysis, which is generally safe as it does not execute the code. However, parsing malformed files can occasionally leverage vulnerabilities in parsing libraries.
Legal Disclaimer: This tool is intended for Educational Purposes and defensive security research. The author is not liable for any misuse of this software or damage caused by analyzing malicious files.
Sourav Kumar
Cybersecurity Researcher & Developer
📧 Email: souravkr529@gmail.com
🔗 GitHub: souravkr529
🔗 LinkedIn: Sourav Kumar
Keywords: AI Malware Analysis, Generative AI Security, LLM for Cybersecurity, Google Gemini Integration, Static Analysis, Reverse Engineering, Threat Intelligence, VirusTotal, Python Security, PE Analysis, ELF Analysis, APK Analysis, YARA, Entropy, Ransomware Detection.







