Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling interaction like scrolling to load the full page #793

Open
RichieHakim opened this issue Nov 8, 2024 · 13 comments
Open

Handling interaction like scrolling to load the full page #793

RichieHakim opened this issue Nov 8, 2024 · 13 comments

Comments

@RichieHakim
Copy link

Is your feature request related to a problem? Please describe.
I am frustrated when I query a website for information, but it is not completely loaded. This is often the case for shopify websites with a large number of elements. You must scroll down to load the rest of the elements in the page.

Describe the solution you'd like
I'm uncertain of what the best solution is. I think some basic agentic commands would be extremely useful. Perhaps not as exhaustive as lavague, but perhaps some basic ability to send commands to selenium or similar. Something as simple as allowing being able to pass in a selenium driver to a graph class would be great.

Describe alternatives you've considered
Lavague, custom langchain. I've just started to play with using the node pieces separately, but it is challenging.

Additional context
For a simple demo, try to load all of the different coffees from this webpage: https://georgehowellcoffee.com/collections/all-coffee
You'll find that it only loads the first dozen or so.

@VinciGit00
Copy link
Collaborator

Ok would you like to implement and we will accept the pr?

@RichieHakim
Copy link
Author

I have near zero experience with this field or library. Do you have a recommendation on how you think implementing this would be best do point me in the right direction?

Do you agree that allowing for the source input to graphs to be a selenium or playwright driver is an appropriate approach? I think this would allow for actions to be taken on a page prior to scraping, which would allow for iteration between LLM calls and interaction steps.

@aflansburg
Copy link
Contributor

Stumbled upon this issue and was interested in solutioning. It would be nice to be able to pass an existing Playwright browser context in to a scraper. This could be useful for leveraging storage_state for things like authenticated sessions, other cookies, etc. and also expose the page object for manipulation like scrolling.

@aleenprd
Copy link

Ok guys, so for this you need to use something like Selenium to interact with Javascript to push on buttons or do infinite scrolls. I can also volunteer to help if the developers would guide me a bit on when to start

@VinciGit00
Copy link
Collaborator

@aflansburg and @aleenprd would you like to make a meeting for scheduling the design?

@aleenprd
Copy link

@VinciGit00 I'll reach out to you on LinkedIn for my contact

@aflansburg
Copy link
Contributor

@aflansburg and @aleenprd would you like to make a meeting for scheduling the design?

Yes interested!

@RichieHakim
Copy link
Author

@aflansburg I wouldn't mind sitting in as well as an observer, end user, if you don't think it'll slow you down.

@aflansburg
Copy link
Contributor

I did a small dive into the project ahead of the call, and this was useful exercise to at least learn about the project. I'm not sure how 'clean' it is, but I think a relevantly simple way (minimal refactoring) to expose the page object would be set it as an attribute on the ChromiumLoader class and then, if I understand correctly perhaps update the state object on FetchNode or FetchNodeLevelK to store the page object, i.e.:

state["original_html"] = document
state.update({self.output[0]: compressed_document,})
if page:
    state["page"] = page
return state

I'm unsure if this bit will work, however, I assume something like this could accomplish additional interaction with the page (such as scrolling):

    graph = OmniScraperGraph(
        app_config.prompt,
        app_config.url,
        graph_config,
    )
    results_a = graph.run()
    page = graph.final_state.get("page")
    # call relevant `page` methods -> https://playwright.dev/docs/api/class-page
    results_b = graph.run() # subsequent run

As a consequence, this would require the user of the library to close the page (doable essentially via page.context().browser().close() when working with the playwright page object.

For my issue (authentication & cookies in a separate chromium instance) I was able to determine that threading storage_state through the call chain to ChromiumLoader (excuse my dirty hack in this branch):

context = await browser.new_context(
    java_script_enabled=True,
    storage_state=self.storage_state,
    user_agent=self.user_agent,
)

Then I was able to leverage session state from a separate invocation of playwright, i.e.:

from playwright.async_api import Page, Browser
...
async def async_run_login(browser: Browser):
    browser_state_file = app_config.browser_state_file

    user_agent = app_config.user_agent_str

    # check if the state file exists and if it is less than 24 hours old
    if (
        os.path.exists(browser_state_file)
        and os.path.getmtime(browser_state_file) > time.time() - 24 * 60 * 60
    ):
        print("Using existing state file.")
    else:
        browser_state_file = None
        print(
            "No existing state file found or it is older than 24 hours. I will create a new state file."
        )

    context = await browser.new_context(
        user_agent=user_agent,
        storage_state=browser_state_file,
    )

    page = await context.new_page()

    if await _is_logged_in(page):
        print("Already logged in.")
        await page.close()
        await browser.close()
        return

    logged_in = await _login(page)

    if not logged_in:
        await page.close()
        await browser.close()
        raise Exception("Unable to login. Time to debug!")

    await page.close()
    await browser.close()

@RichieHakim
Copy link
Author

RichieHakim commented Nov 20, 2024

This is quite close to the use case I was initially describing. To make it more concrete. Here is a snippet of something I am feeding into my prompt that should give a good idea of the functionality I am hoping for. I prompt the LLM agent to output a field called navigation, specified below.

- `navigation` [str]: The code to be run to interact with the browser / website to prepare for the next step. 
        Be certain that the approach is distinct from previous *steps* and will yield new results. 
        Code should use the playwright library; the code string will be called directly by (exec(navigation)). 
        No modification to the code will occur, so all steps must be fully implemented, no steps can be written in pseudocode, and nothing should be commented.
        This will also be passed on to the next agent call.
        The code block will be placed within the following pseudocode template:

            ```python
            import asyncio
            from playwright.async_api import async_playwright
            from agent_library import model
            async def navigate(page, navigation):
                if navigation is not None:
                    exec(navigation)  ## The `navigation` string will be called directly here
            async def main():
                async with async_playwright() as p:
                    browser = await p.chromium.launch()
                    page = await browser.new_page()
                    await page.goto(current_url)
                    converged = False
                    information = []
                    while not converged:
                        converged = model(page, information)  ## This is you
                        await navigate(page, navigation)
                    await browser.close()
                return information
            asyncio.run(main())
            ```

@RichieHakim
Copy link
Author

This feature is still desired

@AliUofT
Copy link

AliUofT commented Nov 27, 2024

Yea if we are able to enable auto scrolling, that would be very very handy, it will make the scrapper truly able to scrape everything

@Kilowhisky
Copy link

Yea if we are able to enable auto scrolling, that would be very very handy, it will make the scrapper truly able to scrape everything

So how would you handle running into infinite scrolling pages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants