Project Documentation

A comprehensive guide to understanding, deploying, and using the Malware Detection System powered by Multi-Channel PE Image Analysis.

01

Overview

This system detects malware in Windows PE executables (.exe) by converting them into multi-channel visual representations and classifying them with our custom ResNet50 model — without relying on disassembly or signature databases.

shield

99.57% Accuracy

Evaluated on a balanced dataset of benign and malicious PE files.

image

Visual Pipeline

Grayscale, Markov Transition Matrix, and Shannon Entropy channels.

api

REST API

Integrate automated scanning directly into your CI/CD pipeline.


02

Installation

You can easily clone the repository to deploy and run the system in your local environment or personal computer. The architecture is designed to be highly flexible: you can either use our provided model or seamlessly swap it with your own custom model. Note: any custom model must be trained on a 3-channel image input in the specific order of (Grayscale, Markov, Entropy).

Model available at:
https://www.kaggle.com/code/phucmaihuu/resnet50-sigmoid

Next Steps: Once you have a model, place it in the src/models/ and update the MODEL_PATH variable in config.py.

bash
git clone https://github.com/VnGirl17yrs/Web-Malware-Detection
cd Web\Web_Malware\
pip install -r requirements.txt
python run.py

03

How It Works

description

Data Acquisition

0-255 Byte Sequence

We extract the raw byte stream from suspected executables, distilling complex logic into a normalized 1D numerical array.

account_tree

Multi-Channel Feature Generation

Three parallel forensic branches transform binary arrays into a high-dimensional feature landscape.

Grayscale

Linear mapping followed by Bilinear Interpolation.

Markov Chain

Logarithmic scaling log(Freq+1) and Min-Max normalization.

Shannon Entropy

Sliding Window (256B, 128B overlap). Visualizing information density.

arrow_drop_down
arrow_drop_down
arrow_drop_down
layers
Fused Tensor
256 x 256 x 3
arrow_drop_down
layers

Fusion & CNN Inference

Channels are stacked into a 256x256x3 tensor and passed through a ResNet50 model fine-tuned for malware detection.

psychology
gavel

Classification Decision

A Sigmoid function calculates the confidence score. Using a 0.5 threshold, the system provides a clear malware detection.

MALWARE (≥ 0.5)
BENIGN (< 0.5)

04

Model Details

Architecture

Base Model ResNet50 (Training from Scratch)
Input Shape 256 × 256 × 3
Output Sigmoid
Framework TensorFlow / Keras

Performance Metrics

Accuracy 99.57%
F1-Score 0.9954
Precision 99.61%
Recall 99.47%

05

FAQ

What file types are supported? expand_more

Currently only Windows PE executables (.exe) are supported. The MZ header is validated server-side regardless of file extension.

Is my file stored permanently? expand_more

No. Uploaded files are saved temporarily under a unique session directory and are deleted automatically after processing completes.

What is the maximum file size? expand_more

The total batch size per request is limited to 200MB. Individual files exceeding the per-file limit are skipped. As this represents the initial release (v1.0.0), the system is currently not optimized for analyzing exceptionally large binaries.

How accurate is the detection? expand_more

The model achieves 99.57% accuracy and an F1-score of 0.9954 on the evaluation dataset, ensuring highly reliable predictions for standard executable files. However, performance may degrade on zero-day threats or highly novel malware families absent from the training set. Detection efficacy is also constrained when encountering variants subjected to aggressive multi-layer packing or strong encryption. Additionally, larger file sizes increase the probability of misclassification, primarily due to the loss of fine-grained spatial features during the PE-to-image resizing process.