Skip to content
View adbar's full-sized avatar

Organizations

@deutschestextarchiv @zentrum-lexikographie

Block or report adbar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
adbar/README.md

Hi there! 👋

Links

⚡  Web   |   ✍  Blog   |   🐦  Twitter   |   🎞  Youtube   |   ☕  Coffee

Activity

🔭  Currently working on gathering texts on the Web and detecting word trends

Programming experience

🖩  First programs written on a TI-83 Plus in TI-BASIC

Top Langs


Most popular blog posts

Pinned Loading

  1. trafilatura trafilatura Public

    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

    Python 3.8k 266

  2. htmldate htmldate Public

    Fast and robust date extraction from web pages, with Python or on the command-line

    Python 122 26

  3. simplemma simplemma Public

    Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

    Python 146 12

  4. py3langid py3langid Public

    Forked from saffsd/langid.py

    Faster, modernized fork of the language identification tool langid.py

    Python 49 8

  5. courlan courlan Public

    Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

    Python 127 9

  6. German-NLP German-NLP Public

    Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

    454 66