Mastering HumanEval with Reflexion

This is a spin-off project inspired by the paper: Reflexion: an autonomous agent with dynamic memory and self-reflection. Noah Shinn, Beck Labash, Ashwin Gopinath. Preprint, 2023

Read more about this project in this post

Check out an interesting type-inference implementation here: OpenTau

Check out the code for the original paper here

Check out a new superhuman programming agent gym here

Note

This repo contains scratch code that was used for testing. The real version of Reflexion for benchmark-agnostic, language-agnostic code generation will be released after the first version of the upcoming paper to respect the privacy of the work (and collaboration) in progress.

If you have any questions, please contact [email protected]

Another Note

Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results due to limited access to GPT-4 and significant API charges. Due to recent requests, both trials have been rerun once more and are dumped in ./root with a script here to validate the solutions with the unit tests provided by HumanEval.

To run the validation on your log files or the provided log files:

python ./validate_py_results.py <path to jsonlines file>

Warning

Please do not run the Reflexion agent in an unsecure environment as the generated code is not validated before execution.

Cite

Note: This is a spin-off implementation that implements a relaxation on the internal success criteria proposed in the original paper.

@article{shinn2023reflexion,
  title={Reflexion: an autonomous agent with dynamic memory and self-reflection},
  author={Shinn, Noah and Labash, Beck and Gopinath, Ashwin},
  journal={arXiv preprint arXiv:2303.11366},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
human-eval		human-eval
media		media
root		root
safuraireflexion		safuraireflexion
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
caller.py		caller.py
main.py		main.py
plot.py		plot.py
requirements.txt		requirements.txt
run_reflexion.sh		run_reflexion.sh
run_simple.sh		run_simple.sh
setup.py		setup.py
simple.py		simple.py
validate_py_results.py		validate_py_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mastering HumanEval with Reflexion

Note

Another Note

Warning

Cite

About

Releases

Packages

Languages

License

sandrostar/SafuraiReflexion

Folders and files

Latest commit

History

Repository files navigation

Mastering HumanEval with Reflexion

Note

Another Note

Warning

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages