Kilian Lieret ec6a50b52a Format with black and strip trailing whitespace | 4 月之前 | |
---|---|---|
.. | ||
README.md | 6 月之前 | |
remove_all_containers.sh | 6 月之前 | |
run.sh | 6 月之前 | |
run_and_eval.sh | 6 月之前 | |
run_from_url.sh | 6 月之前 | |
run_jsonl.sh | 4 月之前 | |
run_replay.sh | 6 月之前 |
This README contains documentation for the main inference script run.sh
along with some miscellaneous scripts that may be helpful.
[!WARNING] These scripts have been written to be invoked from the root of this codebase (i.e.
./scripts/run.sh
).
The ./run.sh
script has been provided as an example of how to invoke run.py
.
A single run.py
call will generate a trajectory/<username>/<experiment name>
folder containing the trajectories and predictions generated by a <model_name>
model run on every instance in the <data_path>
dataset.
The following is a comprehensive guide to using the provided run.py
script, detailing available command-line arguments, their purposes, and default values. Flags that you might find helpful have been marked with a 💡.
The code and explanation of the implementations for configuration based workflows are explained in agent/
.
[!TIP] Run
python run.py --help
to view the most up-to-date documentation of the arguments.
-h, --help
: Show the help message and exit.These arguments configure the script's behavior:
--instance_filter <str>
💡: Run instances that match this regex pattern. Default is .*.--noskip_existing, --skip_existing,
: [Do not] skip instances that have been completed before.--suffix <str>
: Appends a suffix to the name of the folder containing the trajectories for an experiment run.These arguments are related to the environment configuration:
--data_path <str>
💡: Path to the data file -or- a Hugging Face dataset -or- a GitHub issue URL.--base_commit <str>
: You can specify the base commit sha to checkout. This is determined automatically for instances in SWE-bench.--image_name <str>
: Name of the Docker image to use. Default is swe-agent.--noinstall_environment, --install_environment
: [Do not] install the environment. Default is True.--noverbose, --verbose
: Enable verbose output. Default is False.--timeout <int>
: Timeout in seconds. Default is 35.--container_name <str>
💡: Name of the Docker container if you would like to create a persistent container. Optional.[!WARNING] If you specify a container name, do not run multiple instances of
run.py
with the same container name!
Configure agent behavior:
--config_file <Path>
💡: Path to the configuration YAML file. Default is config/default.yaml.Configure model parameters:
--model_name <str>
💡: Name of the model. Default is gpt4
.--per_instance_cost_limit <float>
💡: Per-instance cost limit (interactive loop will automatically terminate when cost limit is hit). Default is 3.0.--temperature <float>
💡: Model temperature. Default is 0.0.--top_p <float>
💡: Top p filtering. Default is 0.95.--total_cost_limit <float>
: Total cost limit. Default is 0.0 (unlimited).Run with custom data path and verbose mode:
python run.py --data_path /path/to/data.json --verbose
Specify a model and adjust the temperature and top_p parameters:
python run.py --model_name gpt4 --temperature 0.2 --top_p 0.9
remove_all_containers.sh
: Forcibly removes all Docker containers currently present on the system.run_and_eval.sh
: Runs SWE-agent inference and evaluation on a specified dataset N times. You can specify the dataset_path
, num_runs
, template
, and suffix
arguments.run_jsonl.sh
: Run SWE-agent inference from a .jsonl
file that contains a SWE-bench style task instance.run_replay.sh
: Run SWE-agent inference from a .traj
file. This is useful for automatically creating a new demonstration for a new config from an existing sequence of actions.