Project Documentation
A comprehensive guide to understanding, deploying, and using the Malware Detection System powered by Multi-Channel PE Image Analysis.
Overview
This system detects malware in Windows PE executables (.exe) by
converting them into multi-channel visual representations and classifying them with our custom
ResNet50 model — without relying on disassembly or signature databases.
99.57% Accuracy
Evaluated on a balanced dataset of benign and malicious PE files.
Visual Pipeline
Grayscale, Markov Transition Matrix, and Shannon Entropy channels.
REST API
Integrate automated scanning directly into your CI/CD pipeline.
Installation
You can easily clone the repository to deploy and run the system in your local environment or
personal computer. The architecture is designed to be highly flexible: you can either use our
provided model or seamlessly swap it with your own custom model. Note: any
custom model must be trained on a 3-channel image input in the specific order of
(Grayscale, Markov, Entropy).
Model available at:
https://www.kaggle.com/code/phucmaihuu/resnet50-sigmoid
Next Steps: Once you have a model, place it in the src/models/
and update the MODEL_PATH
variable in config.py.
git clone https://github.com/VnGirl17yrs/Web-Malware-Detection cd Web\Web_Malware\ pip install -r requirements.txt python run.py
How It Works
Data Acquisition
We extract the raw byte stream from suspected executables, distilling complex logic into a normalized 1D numerical array.
Multi-Channel Feature Generation
Three parallel forensic branches transform binary arrays into a high-dimensional feature landscape.
Grayscale
Linear mapping followed by Bilinear Interpolation.
Markov Chain
Logarithmic scaling log(Freq+1) and Min-Max normalization.
Shannon Entropy
Sliding Window (256B, 128B overlap). Visualizing information density.
Fusion & CNN Inference
Channels are stacked into a 256x256x3 tensor and passed through a ResNet50 model fine-tuned for malware detection.
Classification Decision
A Sigmoid function calculates the confidence score. Using a 0.5 threshold, the system provides a clear malware detection.
Model Details
Architecture
| Base Model | ResNet50 (Training from Scratch) |
| Input Shape | 256 × 256 × 3 |
| Output | Sigmoid |
| Framework | TensorFlow / Keras |
Performance Metrics
| Accuracy | 99.57% |
| F1-Score | 0.9954 |
| Precision | 99.61% |
| Recall | 99.47% |
FAQ
What file types are supported? expand_more
Currently only Windows PE
executables (.exe) are supported. The MZ header is
validated server-side regardless of file extension.
Is my file stored permanently? expand_more
No. Uploaded files are saved temporarily under a unique session directory and are deleted automatically after processing completes.
What is the maximum file size? expand_more
The total batch size per request is limited to 200MB. Individual files exceeding the per-file limit are skipped. As this represents the initial release (v1.0.0), the system is currently not optimized for analyzing exceptionally large binaries.
How accurate is the detection? expand_more
The model achieves 99.57% accuracy and an F1-score of 0.9954 on the evaluation dataset, ensuring highly reliable predictions for standard executable files. However, performance may degrade on zero-day threats or highly novel malware families absent from the training set. Detection efficacy is also constrained when encountering variants subjected to aggressive multi-layer packing or strong encryption. Additionally, larger file sizes increase the probability of misclassification, primarily due to the loss of fine-grained spatial features during the PE-to-image resizing process.