Skip to content

Commit

Permalink
Improve and update the documentation in the README
Browse files Browse the repository at this point in the history
  • Loading branch information
avaucher committed Oct 30, 2023
1 parent 8d12364 commit fa2f00e
Showing 1 changed file with 66 additions and 0 deletions.
66 changes: 66 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,69 @@ For local development, the package can be installed with:
```bash
pip install -e ".[dev]"
```

## Package highlights

### File-related utilities

* [`load_list_from_file`](./src/rxn/utilities/files.py): read a files into a list of strings.
* [`iterate_lines_from_file`](./src/rxn/utilities/files.py): same as `load_list_from_file`, but produces an iterator instead of a list. This can be much more memory-efficient.
* [`dump_list_to_file`](./src/rxn/utilities/files.py) and [`append_to_file`](./src/rxn/utilities/files.py): Write an iterable of strings to a file (one per line).
* [`named_temporary_path`](./src/rxn/utilities/files.py) and [`named_temporary_directory`](./src/rxn/utilities/files.py): provide a context with a file or directory that will be deleted when the context closes. Useful for unit tests.
```pycon
>>> with named_temporary_path() as temporary_path:
... # do something on the temporary path.
... # The file or directory at that path will be deleted at the
... # end of the context, except if delete=False.
```
* ... and others.

### CSV-related functionality

* The function [`iterate_csv_column`](./src/rxn/utilities/csv/column_iterator.py) and the related executable `rxn-extract-csv-column` provide an easy way to extract one single column from a CSV file.
* The [`StreamingCsvEditor`](./src/rxn/utilities/csv/streaming_csv_editor.py) allows for doing a series of operations onto a CSV file without loading it fully in the memory.
This is for instance used in [`rxn-reaction-preprocessing`](https://github.com/rxn4chemistry/rxn-reaction-preprocessing).
See a few examples in the [unit tests](./tests/csv/test_streaming_csv_editor.py).

### Stable shuffling

For reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the [`stable_shuffle`](./src/rxn/utilities/files.py) function.
The executable `rxn-stable-shuffle` is also provided for this purpose.

Both also work with CSV files if the appropriate flag is provided.

### `chunker` and `remove_duplicates`

For batching an iterable into lists of a specified size, `chunker` comes in handy.
It also does so in a memory-efficient way.
```pycon
>>> from rxn.utilities.containers import chunker
>>> for chunk in chunker(range(1, 10), chunk_size=4):
... print(chunk)
[1, 2, 3, 4]
[5, 6, 7, 8]
[9]
```

[`remove_duplicates`](./src/rxn/utilities/containers.py) (or [`iterate_unique_values`](./src/rxn/utilities/containers.py), its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:
```pycon
>>> from rxn.utilities.containers import remove_duplicates
>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])
[3, 6, 9, 2, 1]
>>> remove_duplicates(["ab", "cd", "efg", "hijk", "", "lmn"], key=lambda x: len(x))
['ab', 'efg', 'hijk', '']
```

### Regex utilities

[`regex.py`](./src/rxn/utilities/regex.py) provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).

### Others

* A custom, more general enum class, [`RxnEnum`](./src/rxn/utilities/types.py).
* [`remove_prefix`](./src/rxn/utilities/strings.py), [`remove_postfix`](./src/rxn/utilities/strings.py).
* Initialization of loggers, in a `logging`-compatible way: [`logging.py`](./src/rxn/utilities/logging.py).
* Initialization of loggers, in a `logging`-compatible way: [`logging.py`](./src/rxn/utilities/logging.py).
* [`sandboxed_random_context`](./src/rxn/utilities/basic.py) and [`temporary_random_seed`](./src/rxn/utilities/basic.py), to create a context with a specific random state that will not have side effects.
Especially useful for testing purposes (unit tests).
* ... and others.

0 comments on commit fa2f00e

Please sign in to comment.