Releases: datacoon/undatum
Releases · datacoon/undatum
Release 1.0.14
Added JSON to JSON lines conversion
Fixed #19 missing xmltodict dependency
Release 1.0.12
Changes:
- Added command "analyze" it provides human-readable information about data files: CSV, BSON, JSON lines, JSON, XML. Detects encoding, delimiters, type of files, fields with objects for JSON and XML files. Doesn't support Gzipped, ZIPped and other compressed files yet.
- Updated setup.py and requirements.txt to require certain versions of libs and Python 3.8
Analyze command is very helpful working with JSON and XML files. Next step is to update convert command and re-use analyze code. Convert command should support small XML files to process them without SAX parser, using xmltodict instead and automatically detected list tags and convert command should support JSON files, with detection of JSON file type.
Release 1.0.10
- Added encoding and delimiter detection for commands: uniq, select, frequency and headers. Completely rewrote these functions. If options for encoding and delimiter set, they override detected. If not set, detected delimiter and encoding used.
- Added support of .parquet files to convert to. It's done in a simpliest way using pandas "to_parquet" function.
- Added support for CSV and BSON files for "stats" command