Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFIUM is not working in Webkit(playwright) #158

Open
KameshRajendran opened this issue Feb 23, 2024 · 9 comments
Open

PDFIUM is not working in Webkit(playwright) #158

KameshRajendran opened this issue Feb 23, 2024 · 9 comments

Comments

@KameshRajendran
Copy link

We are encountering a problem with the pdfium module in WebKit during Playwright testing. The provided sample code is designed to inform the client side once both the document and pdfium.wasm has been successfully loaded. This code ensures that both the pdfium module and the DOM are fully loaded before proceeding.

<!DOCTYPE html>
<html lang="en">
<head>
    <script src="pdfium.js"></script>

    <script>
        window.createDivElement = function (value) {
            var divElement = document.createElement("div");
            divElement.textContent = value;
            divElement.style = "Left:20px;Width: 300px; Height:50px"
            document.body.appendChild(divElement);
        }
        document.addEventListener("DOMContentLoaded", function () {  
            let pageLoaded = false;
            let moduleLoaded = false;
            window.createDivElement("Start DOM Loading...");
            // Module. onRuntimeInitialized will be called from pdfium.js to notify the user for further process
            Module.onRuntimeInitialized = async _ => {
                moduleLoaded = true;
                window.createDivElement("PDFIUM Module Loaded...");
                checkIfEverythingWasLoaded();
            };

            function checkIfEverythingWasLoaded() {
                if (pageLoaded && moduleLoaded) {
                    window.createDivElement("Both Page and M    odule loaded...");
                }
            }
            window.onload = function (e) {
                window.createDivElement("Page Loaded...");
                pageLoaded = true;
                checkIfEverythingWasLoaded();
            }
        });
    </script>
</head>

<body>
    @RenderBody()
    <script src="_framework/blazor.server.js"></script>
</body>
</html>

Please find the blazor sample: WEBKIT~1739968950.zip

The table below illustrates the outcomes of executing the aforementioned code in both Chrome and WebKit. In the case of WebKit (Playwright), the Module.onRuntimeInitialized event fails to trigger, preventing us from proceeding with subsequent steps to read the document.

chrome vs webkit

For image reading, we utilized pdfium.wasm, loading only the pdfium.js file into the application. The pdfium.js file, in turn, loads the pdfium.wasm file independently and notifies the success handler for further processing. This mechanism functions correctly in major browsers such as Chrome, Edge, Firefox, and Safari. However, it encounters an issue in the webkit environment.

Upon closer examination, we found that within the pdfium.js file, WebAssembly.instantiateStreaming is employed to read the .wasm file. However, in Safari, this method fails to return either a success or failure handler.

Wekit-pdfium-issue

Can anyone redirect us if you have any idea on this?

Note :

We used the below comment to run the application in WebKit with Node version v16.20.1

npx playwright install
npx playwright install webkit (If Needed)
npx playwright wk http://localhost:7185/

@KameshRajendran
Copy link
Author

@bblanchon
Copy link
Owner

Ping, @jerbob92.

@KameshRajendran
Copy link
Author

@jerbob92 - Could you please help on this?

@jerbob92
Copy link
Contributor

@bblanchon @KameshRajendran I don't use the browser build of pdfium so I can't help out here

@jerbob92
Copy link
Contributor

jerbob92 commented Feb 27, 2024

I couldn't help myself and was interested in what Playwright is, but 2 minutes of searching told me that Playwright does not support Webassembly:
microsoft/playwright#2876
microsoft/playwright#14536

Are you sure this is supposed to work?

Edit: looks like Playwright uses WebKit 17.4, which does not have the Webassembly support on Windows.

@KameshRajendran
Copy link
Author

KameshRajendran commented Mar 1, 2024

@jerbob92 , Thanks for your update and interest on our issue using web assembly version of pdfium.

I believed that WebKit lacked support solely for Blazor WebAssembly. However, our understanding was that WebKit would fully support native WebAssembly implementation, as announced in their official blog post in 2017 (https://webkit.org/blog/7691/webassembly/).

On Mac, Safari utilizes WebKit, and in this environment, we can successfully load native WebAssembly components (pdfium.wasm and pdfium.js). I have a suspicion that the issue lies with our pdfium.wasm when attempting to load it only with WebKit on Playwright.

Can you review the provided example and the steps I've outlined to replicate the problem?

@GokulprasathVenkatachalam
Copy link

GokulprasathVenkatachalam commented Mar 1, 2024

@jerbob92, I can successfully load native wasm files and retrieve information from them. However, the problem arises only when attempting to load or retrieve information from pdfium.wasm and pdfium.js using WebKit on Playwright.

We have the below sample with the GitHub folder instantiate-streaming

We have opened the below sample from the above GitHub source and we can able to load and get the information from Webkit.

https://mdn.github.io/webassembly-examples/js-api-examples/instantiate-streaming.html

Please find the screen shot for above sample in webkit.
image

@jerbob92
Copy link
Contributor

jerbob92 commented Mar 1, 2024

@KameshRajendran I think the issue on both Playwright and WebKit itself are pretty clear that WebAssembly was not enabled on Windows, so it make sense that it would work on Safari on Mac. I don't have a Windows machine myself and also don't really have time to test this out for you.

It is quite unclear to me what version Playwright actually uses so it might be that they are actually using a version that has that merged in.

It could also be something inside the WebAssembly that is not supported, the WebAssembly example that you linked is very simple if you compare it to pdfium.

I'd suggest trying to get more information yourself on why the loading fails.

@mxschmitt
Copy link

Update: This should be fixed in Playwright v1.49 which gets released in a few weeks. In the meantime you could try our Canary versions. Credits to go to @iangrunert 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants