TransitGPT is a specialized chatbot that helps transit enthusiasts retrieve transit information and analyze GTFS feeds via code. Try the chatbot here.
This diagram illustrates the high-level architecture of the TransitGPT system, showing how different components interact. The workflow consists of 4 key steps:
-
Moderation
- All queries are moderated
- Irrelevant queries are blocked
-
Main LLM
- Generates code response for the query of interest
-
Code Execution
- Code generated by the main LLM is executed in a safe environment
- Includes retry mechanism for failed executions
-
Summary
- Results are summarized in a chat-like response format
- Interactive chat interface for querying GTFS data
- Code generation and execution for GTFS analysis
- Support for multiple LLM models. Default models are:
Claude 3.5 Sonnet
,Claude 3.5 Haiku
,GPT-4o
,GPT-4o-mini
- Visualization of results using Matplotlib, Plotly, and Folium
- Feedback system for user interactions
- Support for multiple GTFS feeds
- Support for multiple visualization types including:
- Static/Interactive maps
- Static/Interactive plots
- Tables (DataFrames)
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Ensure you have the necessary GTFS data files and update the
gtfs_data/file_mapping.json
accordingly:
- Place the GTFS File: Add the GTFS zip file to the appropriate directory within
gtfs_data/
. - Update
file_mapping.json
: Add a new entry for the transit agency in the following format:"New Transit": { "file_loc": "gtfs_data/New Transit Agency/gtfs.zip", "distance_unit": "m", "pickle_loc": "gtfs_data/feed_pickles/New_Transit_gtfs_loader.pkl" }
- Place the GTFS File: Add the GTFS zip file to the appropriate directory within
-
Generate pickled GTFS feeds for faster loading:
python utils/generate_feed_pickles.py
-
Set up your environment variables for API keys and other sensitive information:
- Create a
.streamlit/secrets.toml
file in your project directory. - Add your API keys in the following format:
[general] OPENAI_API_KEY = "your_openai_api_key" GROQ_API_KEY = "your_groq_api_key" ANTHROPIC_API_KEY = "your_anthropic_api_key" GMAP_API = "your_google_maps_api_key"
- Ensure that this file is not included in version control by adding it to your
.gitignore
.
- Create a
-
Run the Streamlit app:
streamlit run chat_app.py
- Select an LLM model and GTFS feed from the sidebar
- Type your query in the chat input or select a sample question
- View the generated code, execution results, and visualizations
- Provide feedback on the responses
- LLM models available: Claude 3.5 Sonnet, GPT-4o, GPT-4o-mini, Llama 3.1 8B Instant
- Maximum chat history:
16
messages - Timeout for code execution:
5
minutes
chat_app.py
: Main Streamlit applicationcomponents/
: UI components and interface setuputils/
: Utility functions and helper methodsprompts/
: LLM prompts and examplesdata/
: Sample questions and few-shot examplesgtfs_data/
: GTFS feed files and mappingsgtfs_agent/
: GTFS data loading, processing, and LLM agentevaluator/
: Code execution and evaluationtests/
: Unit tests for various components
gtfs_agent/gtfs_loader.py
: GTFS data loading and processinggtfs_agent/agent.py
: LLM Agent implementationevaluator/eval_code.py
: Code execution and evaluationutils/feedback.py
: Feedback collection systemprompts/generate_prompt.py
: Dynamic prompt generationutils/generate_feed_pickles.py
: Generate pickled GTFS feedsutils/constants.py
: Constant values used across the projectutils/helper.py
: Helper functions for various tasksgtfs_agent/llm_client.py
: LLM API clients for different models
This chatbot is an AI-powered tool designed to assist with GTFS data analysis and code generation. Please be aware of its limitations, verify critical information, and review generated code before use in production environments.
Contributions are welcome! Please feel free to submit a Pull Request.
Thank you for your interest in contributing to our few-shot examples! This guide will help you add new examples to our dataset, ensuring consistency and quality across all contributions.
-
Understand the Structure: Each example in the
data/few_shot.yaml
anddata/few_shot_viz.yaml
files follows a specific format. If you example generates a visualization, add it todata/few_shot_viz.yaml
. If it does not, add it todata/few_shot.yaml
. -
Use Clear and Descriptive Questions: Ensure that the question field clearly describes the task or query. It should be concise yet informative.
-
Provide Accurate Answers: The answer should be a valid Python code snippet that solves the question. Ensure the code is correct and follows best practices.
-
Include Additional Information: Where applicable, provide additional information that explains the context or any assumptions made in the answer.
-
Test Your Code: Before submitting, test your code to ensure it works as expected with the GTFS data.
-
Select the Appropriate File:
- Use
few_shot.yaml
for examples that do not involve visualization. - Use
few_shot_viz.yaml
for examples that include visualizations like plots or maps.
- Use
-
Follow the Example Template:
- Each example should have a unique identifier (e.g.,
example_XX
). - Include the
feed
andquestion
fields. - Provide the
answer
as a Python code block. - Add any
additional_info
if necessary.
- Each example should have a unique identifier (e.g.,
-
Example Template:
example_XX: feed: [Feed Name] question: [Your question here] answer: | ```python # Your Python code here ``` additional_info: [Optional additional information]
-
Ensure Consistency:
- Use consistent naming conventions and formatting.
- Follow the existing style for comments and code structure.
-
Validate Your Contribution:
- Check for syntax errors and logical correctness.
- Ensure the example is unique and not a duplicate of existing examples.
-
Submit Your Contribution:
- Fork the repository and create a new branch for your contribution.
- Add your example to the appropriate file.
- Submit a pull request with a clear description of your changes.
- Your contribution will be reviewed by the maintainers.
- Feedback may be provided for improvements or corrections.
- Once approved, your example will be merged into the main branch.
Copyright © 2024 Urban Traffic & Economics Lab (UTEL)
If you use TransitGPT in your research, please cite our paper:
@misc{devunuri2024transitgpt,
title={TransitGPT: A Generative AI-based framework for interacting with GTFS data using Large Language Models},
author={Saipraneeth Devunuri and Lewis Lehe},
year={2024},
eprint={2412.06831},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.06831},
}