Skip to content

Latest commit

 

History

History
597 lines (511 loc) · 28.5 KB

text-processing.md

File metadata and controls

597 lines (511 loc) · 28.5 KB

Bookmarks tagged [text-processing]

https://github.com/TankerHQ/ruplacer

Find and replace text in source files


https://github.com/lavifb/todo_r

Find all your TODO notes with one command!


https://github.com/whitfin/runiq

an efficient way to filter duplicate lines from unsorted input.


https://github.com/whitfin/bytelines

Read input lines as byte slices for high efficiency.


https://github.com/vishaltelangre/ff

Find files (ff) by name!


https://github.com/BurntSushi/suffix

Linear time suffix array construction (with Unicode support)


https://github.com/BurntSushi/tabwriter

Elastic tab stops (i.e., text column alignment)


https://github.com/pwoolcoc/ngrams

Construct n-grams from arbitrary iterators


https://github.com/ps1dr3x/easy_reader

A reader that allows forwards, backwards and random navigations through the lines of huge files without consuming iterators [...


https://github.com/rust-lang/regex

Regular expressions (RE2 style)


https://github.com/greyblake/whatlang-rs

Natural language detection library based on trigrams


https://github.com/yaa110/rake-rs

Multilingual implementation of RAKE algorithm for Rust


https://github.com/Guitarbum722/align

A general purpose application that aligns text.


https://github.com/sbstjn/allot

Placeholder and wildcard text parsing for CLI tools and bots.


https://github.com/CalebQ42/bbConvert

Converts bbCode to HTML that allows you to add support for custom bbCode tags.


https://github.com/russross/blackfriday

Markdown processor in Go.


https://github.com/microcosm-cc/bluemonday

HTML Sanitizer.


https://github.com/aerogo/codetree

Parses indented code (python, pixy, scarlet, etc.) and returns a tree structure.


https://github.com/asciimoo/colly

Fast and Elegant Scraping Framework for Gophers.


https://github.com/mingrammer/commonregex

A collection of common regular expressions for Go.


https://github.com/slotix/dataflowkit

Web scraping Framework to turn websites into structured data.


https://github.com/ockam-network/did

DID (Decentralized Identifiers) Parser and Stringer in Go.


https://github.com/hscells/doi

Document object identifier (doi) parser in Go.


https://github.com/editorconfig/editorconfig-core-go

Editorconfig file parser and manipulator for Go.


https://github.com/endeveit/enca

Minimal cgo bindings for libenca.


https://github.com/mickep76/encdec

Package provides a generic interface to encoders and decodersa.


https://github.com/alixaxel/genex

Count and expand Regular Expressions into all matching Strings.


https://godoc.org/github.com/shurcooL/github_flavored_markdown

GitHub Flavored Markdown renderer (using blackfriday) with fenced code block highlighting, clickable header anchor links.


https://github.com/ianlopshire/go-fixedwidth

Fixed-width text formatting (encoder/decoder with reflection).


https://github.com/dustin/go-humanize

Formatters for time, numbers, and memory size to human readable format.


https://github.com/adrianmo/go-nmea

NMEA parser library for the Go language.


https://github.com/mattn/go-runewidth

Functions to get fixed width of the character or string.


https://github.com/mozillazg/go-slugify

Make pretty slug with multiple languages support.


https://github.com/pelletier/go-toml

Go library for the TOML format with query support and handy cli tools.


https://github.com/emersion/go-vcard

Parse and format vCard.


https://github.com/trubitsyn/go-zero-width

Zero-width character detection and removal for Go.


https://github.com/mmcdole/gofeed

Parse RSS and Atom feeds in Go.


https://github.com/awalterschulze/gographviz

Parses the Graphviz DOT language.


https://github.com/labstack/gommon/tree/master/bytes

Format bytes to string.


https://github.com/polera/gonameparts

Parses human names into individual name parts.


https://github.com/andrewstuart/goq

Declarative unmarshaling of HTML using struct tags with jQuery syntax (uses GoQuery).


https://github.com/PuerkitoBio/goquery

GoQuery brings a syntax and a set of features similar to jQuery to the Go language.


https://github.com/zach-klippenstein/goregen

Library for generating random strings from regular expressions.


https://github.com/leonelquinteros/gotext

GNU gettext utilities for Go.


https://github.com/endeveit/guesslanguage

Functions to determine the natural language of a unicode text.


https://github.com/antchfx/htmlquery

An XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.


https://github.com/facebookgo/inject

Package inject provides a reflect based injector.


https://github.com/Wing924/ltsv

High performance LTSV (Labeled Tab Separeted Value) reader for Go.


https://github.com/clbanning/mxj

Encode / decode XML as JSON or map[string]interface{}; extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.


https://github.com/gortc/sdp

SDP: Session Description Protocol [RFC 4566].


https://github.com/mvdan/sh

Shell parser and formatter.


https://github.com/gosimple/slug

URL-friendly slugify with multiple languages support.


https://github.com/avelino/slugify

Go slugify application that handles string.


https://github.com/zhengchun/syndfeed

A syndication feed for Atom 1.0 and RSS 2.0.


https://github.com/BurntSushi/toml

TOML configuration format (encoder/decoder with reflection).


https://github.com/JoshuaDoes/gofuckyourself

A sanitization-based swear filter for Go.


https://github.com/bndr/gotabulate

Easily pretty-print your tabular data with Go.


https://github.com/codemodus/kace

Common case conversions covering common initialisms.


https://github.com/nproc/parseargs-go

string argument parser that understands quotes and backslashes.


https://github.com/codemodus/parth

URL path segmentation parsing.


https://github.com/yourbasic/radix

fast string sorting algorithm.


https://github.com/Dynom/TySug

Alternative suggestions with respect to keyboard layouts.


https://github.com/stackerzzq/xj2go

Convert xml or json to go struct.


https://github.com/mvdan/xurls

Extract urls from text.


https://github.com/chardet/chardet

Python 2/3 compatible character encoding detector.


https://docs.python.org/3/library/difflib.html

(Python standard library) Helpers for computing deltas.


https://github.com/LuminosoInsight/python-ftfy

Makes Unicode text less broken and more consistent automagically.


https://github.com/seatgeek/fuzzywuzzy

Fuzzy String Matching.


https://github.com/ztane/python-Levenshtein/

Fast computation of Levenshtein distance and string similarity.


https://github.com/vinta/pangu.py

Paranoid text spacing.


https://github.com/pwaller/pyfiglet

An implementation of figlet written in Python.


https://github.com/mozillazg/python-pinyin

Convert Chinese hanzi (漢字) to pinyin (拼音).


https://github.com/orsinium/textdistance

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.


https://pypi.python.org/pypi/Unidecode

ASCII transliterations of Unicode text.


https://github.com/dimka665/awesome-slugify

A Python slugify library that can preserve unicode.


https://github.com/un33k/python-slugify

A Python slugify library that translates unicode to ASCII.


https://github.com/mozilla/unicode-slugify

A slugifier that generates unicode slugs with Django as a dependency.


https://github.com/davidaurelio/hashids-python

Implementation of hashids in Python.


https://github.com/skorokithakis/shortuuid

A generator library for concise, unambiguous and URL-safe UUIDs.


https://github.com/dabeaz/ply

Implementation of lex and yacc parsing tools for Python.


https://github.com/pyparsing/pyparsing

A general purpose framework for generating parsers.


https://github.com/derek73/python-nameparser

Parsing human names into their individual components.


https://github.com/daviddrysdale/python-phonenumbers

Parsing, formatting, storing and validating international phone numbers.


https://github.com/selwin/python-user-agents

Browser user agent parser.


https://github.com/andialbrecht/sqlparse

A non-validating SQL parser.