Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle escaped URI-Rs #146

Merged
merged 5 commits into from
Apr 17, 2024
Merged

Handle escaped URI-Rs #146

merged 5 commits into from
Apr 17, 2024

Conversation

machawk1
Copy link
Member

Closes #110

I also tested URIs that would get decoded as spaces, e.g.,

curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org%2F%20index.html"

The %20 remains in the URIR after being decoded.

@machawk1 machawk1 changed the title Handles escaped URI-Rs Handle escaped URI-Rs Apr 15, 2024
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@machawk1 machawk1 requested a review from ibnesayeed April 16, 2024 17:47
Copy link
Member

@ibnesayeed ibnesayeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you make sure that the URLs with spaces in them are working as expected, even after escaping?

Other than that, it looks okay, but I would prefer if the blank lines were removed, because it looks odd to have one function in the whole code base that has black space while others don't.

@machawk1
Copy link
Member Author

@ibnesayeed I removed the extraneous spacing.

Regarding testing URLs with spaces in them, the browser converts the space to %20 before submission and curl rejects the URL with a space, i.e.,

$ curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/space test.html"
curl: (3) URL rejected: Malformed input to a URL function
$

@machawk1 machawk1 requested a review from ibnesayeed April 17, 2024 14:42
@ibnesayeed
Copy link
Member

Test with %20 in the path and query parameter using curl and see if nothing is dropped from the URL in the TimmeMap. The purpose of this test would be to ensure that when %20 is converted to plain white-space in the function in question, eventually it is escaped back to %20 for further flow dow the the process. If that is not the case, then we will have to explicitly replace white-spaces with %20 soon after unescaping the URL.

Moreover, I would also test + in query parameter to make sure that is not broken either.

@machawk1
Copy link
Member Author

machawk1 commented Apr 17, 2024

curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/space%20test.html"

...results in https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/space%20test.html from the MemGator log.

curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/test.html?sawood%20alam=ibnesayeed"

...produces https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/test.html?sawood alam=ibnesayeed" from the MemGator log.

curl -i "http://localhost:1208/timemap/link/http%3A%2F%2Fexample.org/test.html?sawood+alam=ibnesayeed"

...produces https://www.webarchive.org.uk/wayback/archive/timemap/link/http://example.org/test.html?sawood+alam=ibnesayeed from the MemGator log.

Are the results of any of these variations incorrect to you, @ibnesayeed?

@ibnesayeed
Copy link
Member

Please try:

$ curl -i "http://localhost:1208/timemap/link/https://google.com/?q=united%20states"

To see if you get https://web.archive.org/web/20131212220548/https://google.com/?q=united%20states in the produced TimMap. Looking in the logs might be inadequate. Also, logs contain spaces (as reported above) when %20 is used, so it is an undesired behavior.

@machawk1
Copy link
Member Author

curl -i "http://localhost:1208/timemap/link/https://google.com/?q=united%20states" returns a 404 from my local MemGator instance.

Copy link
Member

@ibnesayeed ibnesayeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding this explicit replacement, which might take care of the issue.

main.go Show resolved Hide resolved
Co-authored-by: Mat Kelly <[email protected]>
Co-authored-by: Sawood Alam <[email protected]>
@machawk1 machawk1 requested a review from ibnesayeed April 17, 2024 18:58
Copy link
Member

@ibnesayeed ibnesayeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ibnesayeed ibnesayeed merged commit ae71612 into master Apr 17, 2024
2 checks passed
@machawk1 machawk1 deleted the issue-110 branch May 7, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle URL-encoded URI-R
2 participants