Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do on timeout? #831

Open
silgon opened this issue Nov 24, 2024 · 4 comments
Open

What to do on timeout? #831

silgon opened this issue Nov 24, 2024 · 4 comments

Comments

@silgon
Copy link

silgon commented Nov 24, 2024

The following is a script from the readme with two lines changed, prompt and source in the parameters' list of SmartScraperGraph. I get a timeout and in the documentation, I don't see how to configure the parameter

import json
from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "api_key": "YOUR_OPENAI_APIKEY",
        "model": "openai/gpt-4o-mini",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="Give me the url of all the modules including #",
    source="https://scrapegraph-ai.readthedocs.io/en/latest/modules/modules.html",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

Result

--- Executing Fetch Node ---
--- (Fetching HTML from: https://scrapegraph-ai.readthedocs.io/en/latest/modules/modules.html) ---
--- Executing ParseNode Node ---
--- Executing GenerateAnswer Node ---
Timeout error: Response took longer than 30 seconds
{
    "error": "Response timeout exceeded"
}

I tried to add timetout in the llm code without much success as follows:

graph_config = {
    "llm": {
        "api_key": "YOUR_OPENAI_APIKEY",
        "model": "openai/gpt-4o-mini",
+       "timeout": "300",
    },
    "verbose": True,
    "headless": False,
}
@zonay
Copy link

zonay commented Dec 5, 2024

As someone who's at the print("Hello, World!") level in Python, I managed to address this issue by modifying the self.timeout parameter in the generate_answer_node.py file of the scrapegraphai library. While this fix worked for me, I'm not entirely sure if it's the most proper or robust solution, especially since I also couldn't find a parameterized way to configure it.

File Path

venv\Lib\site-packages\scrapegraphai\nodes\generate_answer_node.py

Code Changes

Look for the section where the node_config values are initialized. Update the self.timeout value as shown:

        self.verbose = node_config.get("verbose", False)
        self.force = node_config.get("force", False)
        self.script_creator = node_config.get("script_creator", False)
        self.is_md_scraper = node_config.get("is_md_scraper", False)
        self.additional_info = node_config.get("additional_info")
        self.timeout = 500  # Adjust timeout value here

@silgon
Copy link
Author

silgon commented Dec 5, 2024

Thanks @zonay , you are awesome! Given your answer, I just added:

smart_scraper_graph = SmartScraperGraph(
    prompt="Give me the url of all the modules including #",
    source="https://scrapegraph-ai.readthedocs.io/en/latest/modules/modules.html",
    config=graph_config
)
smart_scraper_graph.graph.nodes[2].timeout = 300

And it works nicely.

@RakeshK01
Copy link

@silgon It worked

@silgon
Copy link
Author

silgon commented Dec 6, 2024

As for information. It seems it was solved by @VinciGit00 in 1.33.1 (as seen in 32ef554). Well, it seems so, however, in my case I'm not able to install that version. I get

ImportError: cannot import name 'SyncClient' from 'scrapegraph_py' (/home/user/anaconda/envs/generic/lib/python3.12/site-packages/scrapegraph_py/__init__.py)                                                                                  

In any case, I included the information because it seems useful =).
In the meantime I continue using the workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants