-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"noindex" URL are listed in sitemap #82
Comments
Hi, This is a sample to explain my issue : This is inconsistent for Google because of we ask to Google Bot to index this URL (on sitemap), but when Google Bot try to analyse this URL, it see a noindex, so it can't index this URL. I hope it's more clear now. |
Thanks, this is very helpful. I'm really short on time lately and I'm unlikely to be able to address this until late April. Hopefully you'll manage until then. |
Thanks. I'll try to manage it waiting your update. Regards. |
Hi, In my side, I created a script which is cleaning bad url, but it's very slowly, so I never update my sitemap. Best regards. |
I'll get on it in a few days. |
Aaaand I'm done with final exams. |
Greaaaat 👍 |
I'm sorry, but I can't find your patch. |
It's a work in progress. I thought this would be easier. A major problem is that with links I only need to match a single attribute (href), with meta tags, I need to match both the name and content. It's tricky to get right. |
A cheap that you can apply yourself is to simply check if the meta tag string is present in the html but hard-coding the check here: https://github.com/knyzorg/Sitemap-Generator-Crawler/blob/0b89cd5f53b02472d33131a2ebb62396003bf8df/sitemap.functions.php#L367 But my regular expression skills are somewhat rusty and regular expressions were never meant to parse html. The entire project was written back for when PHP installations had finicky support for parsing HTML natively, and should have become unnecessary with the release of PHP7... yet here we are. I will eventually re-write as a binary with a proper HTML parser and deprecate the project. |
@knyzorg example: Result: Page A should be omitted but Page B and C should be added to the sitemap.xml file. |
My sitemap contains a lot of URL that have a Meta "noindex".
So webmastertools send me an alert.
The text was updated successfully, but these errors were encountered: