Handling interaction like scrolling to load the full page #793

RichieHakim · 2024-11-08T06:02:20Z

Is your feature request related to a problem? Please describe.
I am frustrated when I query a website for information, but it is not completely loaded. This is often the case for shopify websites with a large number of elements. You must scroll down to load the rest of the elements in the page.

Describe the solution you'd like
I'm uncertain of what the best solution is. I think some basic agentic commands would be extremely useful. Perhaps not as exhaustive as lavague, but perhaps some basic ability to send commands to selenium or similar. Something as simple as allowing being able to pass in a selenium driver to a graph class would be great.

Describe alternatives you've considered
Lavague, custom langchain. I've just started to play with using the node pieces separately, but it is challenging.

Additional context
For a simple demo, try to load all of the different coffees from this webpage: https://georgehowellcoffee.com/collections/all-coffee
You'll find that it only loads the first dozen or so.

VinciGit00 · 2024-11-08T07:54:27Z

Ok would you like to implement and we will accept the pr?

RichieHakim · 2024-11-09T18:40:15Z

I have near zero experience with this field or library. Do you have a recommendation on how you think implementing this would be best do point me in the right direction?

Do you agree that allowing for the source input to graphs to be a selenium or playwright driver is an appropriate approach? I think this would allow for actions to be taken on a page prior to scraping, which would allow for iteration between LLM calls and interaction steps.

aflansburg · 2024-11-12T06:35:23Z

Stumbled upon this issue and was interested in solutioning. It would be nice to be able to pass an existing Playwright browser context in to a scraper. This could be useful for leveraging storage_state for things like authenticated sessions, other cookies, etc. and also expose the page object for manipulation like scrolling.

aleenprd · 2024-11-12T08:59:42Z

Ok guys, so for this you need to use something like Selenium to interact with Javascript to push on buttons or do infinite scrolls. I can also volunteer to help if the developers would guide me a bit on when to start

VinciGit00 · 2024-11-12T09:02:04Z

@aflansburg and @aleenprd would you like to make a meeting for scheduling the design?

aleenprd · 2024-11-12T09:22:54Z

@VinciGit00 I'll reach out to you on LinkedIn for my contact

aflansburg · 2024-11-12T20:03:25Z

@aflansburg and @aleenprd would you like to make a meeting for scheduling the design?

Yes interested!

RichieHakim · 2024-11-12T20:07:22Z

@aflansburg I wouldn't mind sitting in as well as an observer, end user, if you don't think it'll slow you down.

aflansburg · 2024-11-19T04:00:01Z

I did a small dive into the project ahead of the call, and this was useful exercise to at least learn about the project. I'm not sure how 'clean' it is, but I think a relevantly simple way (minimal refactoring) to expose the page object would be set it as an attribute on the ChromiumLoader class and then, if I understand correctly perhaps update the state object on FetchNode or FetchNodeLevelK to store the page object, i.e.:

state["original_html"] = document
state.update({self.output[0]: compressed_document,})
if page:
    state["page"] = page
return state

I'm unsure if this bit will work, however, I assume something like this could accomplish additional interaction with the page (such as scrolling):

    graph = OmniScraperGraph(
        app_config.prompt,
        app_config.url,
        graph_config,
    )
    results_a = graph.run()
    page = graph.final_state.get("page")
    # call relevant `page` methods -> https://playwright.dev/docs/api/class-page
    results_b = graph.run() # subsequent run

As a consequence, this would require the user of the library to close the page (doable essentially via page.context().browser().close() when working with the playwright page object.

For my issue (authentication & cookies in a separate chromium instance) I was able to determine that threading storage_state through the call chain to ChromiumLoader (excuse my dirty hack in this branch):

context = await browser.new_context(
    java_script_enabled=True,
    storage_state=self.storage_state,
    user_agent=self.user_agent,
)

Then I was able to leverage session state from a separate invocation of playwright, i.e.:

from playwright.async_api import Page, Browser
...
async def async_run_login(browser: Browser):
    browser_state_file = app_config.browser_state_file

    user_agent = app_config.user_agent_str

    # check if the state file exists and if it is less than 24 hours old
    if (
        os.path.exists(browser_state_file)
        and os.path.getmtime(browser_state_file) > time.time() - 24 * 60 * 60
    ):
        print("Using existing state file.")
    else:
        browser_state_file = None
        print(
            "No existing state file found or it is older than 24 hours. I will create a new state file."
        )

    context = await browser.new_context(
        user_agent=user_agent,
        storage_state=browser_state_file,
    )

    page = await context.new_page()

    if await _is_logged_in(page):
        print("Already logged in.")
        await page.close()
        await browser.close()
        return

    logged_in = await _login(page)

    if not logged_in:
        await page.close()
        await browser.close()
        raise Exception("Unable to login. Time to debug!")

    await page.close()
    await browser.close()

RichieHakim · 2024-11-20T00:42:29Z

This is quite close to the use case I was initially describing. To make it more concrete. Here is a snippet of something I am feeding into my prompt that should give a good idea of the functionality I am hoping for. I prompt the LLM agent to output a field called navigation, specified below.

- `navigation` [str]: The code to be run to interact with the browser / website to prepare for the next step. 
        Be certain that the approach is distinct from previous *steps* and will yield new results. 
        Code should use the playwright library; the code string will be called directly by (exec(navigation)). 
        No modification to the code will occur, so all steps must be fully implemented, no steps can be written in pseudocode, and nothing should be commented.
        This will also be passed on to the next agent call.
        The code block will be placed within the following pseudocode template:

            ```python
            import asyncio
            from playwright.async_api import async_playwright
            from agent_library import model
            async def navigate(page, navigation):
                if navigation is not None:
                    exec(navigation)  ## The `navigation` string will be called directly here
            async def main():
                async with async_playwright() as p:
                    browser = await p.chromium.launch()
                    page = await browser.new_page()
                    await page.goto(current_url)
                    converged = False
                    information = []
                    while not converged:
                        converged = model(page, information)  ## This is you
                        await navigate(page, navigation)
                    await browser.close()
                return information
            asyncio.run(main())
            ```

RichieHakim · 2024-11-27T03:00:42Z

This feature is still desired

AliUofT · 2024-11-27T13:10:48Z

Yea if we are able to enable auto scrolling, that would be very very handy, it will make the scrapper truly able to scrape everything

Kilowhisky · 2024-12-23T01:47:29Z

Yea if we are able to enable auto scrolling, that would be very very handy, it will make the scrapper truly able to scrape everything

So how would you handle running into infinite scrolling pages?

aleenprd mentioned this issue Dec 3, 2024

Feat: chromium scroller #836

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling interaction like scrolling to load the full page #793

Handling interaction like scrolling to load the full page #793

RichieHakim commented Nov 8, 2024

VinciGit00 commented Nov 8, 2024

RichieHakim commented Nov 9, 2024

aflansburg commented Nov 12, 2024

aleenprd commented Nov 12, 2024

VinciGit00 commented Nov 12, 2024

aleenprd commented Nov 12, 2024

aflansburg commented Nov 12, 2024

RichieHakim commented Nov 12, 2024

aflansburg commented Nov 19, 2024

RichieHakim commented Nov 20, 2024 •

edited

Loading

RichieHakim commented Nov 27, 2024

AliUofT commented Nov 27, 2024

Kilowhisky commented Dec 23, 2024

Handling interaction like scrolling to load the full page #793

Handling interaction like scrolling to load the full page #793

Comments

RichieHakim commented Nov 8, 2024

VinciGit00 commented Nov 8, 2024

RichieHakim commented Nov 9, 2024

aflansburg commented Nov 12, 2024

aleenprd commented Nov 12, 2024

VinciGit00 commented Nov 12, 2024

aleenprd commented Nov 12, 2024

aflansburg commented Nov 12, 2024

RichieHakim commented Nov 12, 2024

aflansburg commented Nov 19, 2024

RichieHakim commented Nov 20, 2024 • edited Loading

RichieHakim commented Nov 27, 2024

AliUofT commented Nov 27, 2024

Kilowhisky commented Dec 23, 2024

RichieHakim commented Nov 20, 2024 •

edited

Loading