HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

The paper can be found on arXiv.

Introduction

We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire. These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents.

Using the repository

You will have to create a Hugging Face and a Neptune.ai account
Copy your API keys to the .env file, and set the desired CUDA devices, based on the .env_example
Set up the PicoCTF benchmark
Set up the OverTheWire benchmark
Start the HackSynth Agent
- Install the environment:
```
python -m venv cyber_venv
source cyber_venv/bin/activate
pip install -r requirements.txt
```
- Start the benchmark with the following:
```
python run_bench.py -b benchmark.json -c config.json
```
  The benchmark.json should be one of the generated benchmark_solved.json files, or an equivalently structured file. The configuration files used by us for the measurements in the paper are also available in the configs folder.

How to Cite

If you use this code in your work or research, please cite the corresponding paper:

@misc{muzsai2024hacksynthllmagentevaluation,
      title={HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing}, 
      author={Lajos Muzsai and David Imolai and András Lukács},
      year={2024},
      eprint={2412.01778},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2412.01778}, 
}

Contributors

Lajos Muzsai ([email protected])
David Imolai ([email protected])
András Lukács ([email protected])

License

The project uses the GNU AGPLv3 license.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
configs		configs
overthewire_bench		overthewire_bench
picoctf_bench		picoctf_bench
.env_example		.env_example
LICENSE.md		LICENSE.md
README.md		README.md
docker_setup.py		docker_setup.py
pentest_agent.py		pentest_agent.py
requirements.txt		requirements.txt
run.py		run.py
run_bench.py		run_bench.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Introduction

Using the repository

How to Cite

Contributors

License

About

Releases

Packages

Contributors 3

Languages

License

aielte-research/HackSynth

Folders and files

Latest commit

History

Repository files navigation

HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Introduction

Using the repository

How to Cite

Contributors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages