AntiArgfuscator is a very lightweight XGBoost modelled trained to detect obfuscated CLI commands.
To run it follow the following steps:
git clone https://github.com/0x-Apollyon/AntiArgfuscator.git
cd AntiArgfuscator
python -m venv venv (optional)
source venv/bin/activate (optional)
pip install -r requirements.txt
python prediction.py
and voila....
Training Metrics:
accuracy: 0.9554
precision: 1.0000
recall: 0.9429
f1: 0.9706
average_precision: 1.0000
Testing Metrics:
accuracy: 0.8929
precision: 0.9750
recall: 0.8864
f1: 0.9286
average_precision: 0.9726
This approach is in no way the best one but rather the fastest one.
I believe creating vector embeddings of each command in the training set and then training a transformer would result in the best accuracy.
However a transformer is very very computationally expensive and slow.
Another issue I noticed is that the model is somewhat biased towards labelling samples as "unobfuscated".
