-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data collection for different languages #2
Comments
I have already a script for the RKI FAQ. Will share it later! |
If someone needs a starting point, I already wrote scrapers for WHO and some pages of CDC: |
I will do some scraping for Romanian |
I'll add Italian |
I will look into some more german pages. |
@tkh42 let me know which so we are not doing double-work. This would make sense probably https://www.infektionsschutz.de/coronavirus/faqs-coronaviruscovid-19.html |
@HenrykBorzymowski Ok. Yes I have thought about doing that one too, I think I will start with https://www.bmas.de/DE/Presse/Meldungen/2020/corona-virus-arbeitsrechtliche-auswirkungen.html |
Perfect people, this is taking off rather quickly :D I would also suggest that you create small issues stating on which website you want to work on, so we do not have double work or do a crawler twice. state the website in the title so github can find related issues very easily! Thanks |
Here is a google table in which we can track which pages we already have a scraper for etc. Please fill in and change if necessary: https://docs.google.com/spreadsheets/d/1er-7sDvgMZ484FRhPL7X6rl1fgRIRtA7fJfj-gLp3jg/edit?usp=sharing |
@tkh42 Can I somehow help or motivate you creating scrapers for German Sites? :D We already started the label process and need more questions! |
@Timoeller I am finished with the BMAS one will create the pull request and continue with the next.:) |
One way to "easily" get multilingual data is to machine-translate. A workflow like this could then work for the user: This would be easier than real-time translation and/or getting sufficient data in many languages. |
Multilingual resource can also easily be found using linguee and checking the sources of the found sentences in the language pairs, e.g. for DE: |
merge latest update back to local repo
Find official data sources for FAQ about COVID-19 in different languages and scrape them.
The text was updated successfully, but these errors were encountered: