Skip to content

Commit

Permalink
various typo fixes and clearer language
Browse files Browse the repository at this point in the history
  • Loading branch information
alistairewj committed Feb 20, 2024
1 parent ad08bae commit f826074
Show file tree
Hide file tree
Showing 5 changed files with 6 additions and 73 deletions.
72 changes: 3 additions & 69 deletions mimic-iii/buildmimic/duckdb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,6 @@ The scripts in this folder create the schema for MIMIC-III and
loads the data into the appropriate tables for
[DuckDB](https://duckdb.org/).

The Python script (`import_duckdb.py`) also includes the option to
add the [concepts views](../../concepts/README.md) to the database.
This makes it much easier to use the concepts views as you do not
have to install and setup PostgreSQL or use BigQuery.

DuckDB, like SQLite, is serverless and
stores all information in a single file.
Unlike SQLite, an OLTP database,
Expand All @@ -34,8 +29,6 @@ wget -r -N -c -np -nH --cut-dirs=1 --user YOURUSERNAME --ask-password https://ph

Replace `YOURUSERNAME` with your physionet username.

This will make you `mimic_data_dir` be `mimiciii/1.4`.

The rest of these intructions assume the CSV files are in the folder structure as follows:

```
Expand All @@ -45,7 +38,7 @@ mimic_data_dir/
...
```

The CSV files can be uncompressed (end in `.csv`) or compressed (end in `.csv.gz`).
By default, the above `wget` downloads the data into `mimiciii/1.4` (as we used `--cut-dirs=1` to remove the base folder). Thus, by default, `mimic_data_dir` is `mimiciii/1.4` (relative to the current folder). The CSV files can be uncompressed (end in `.csv`) or compressed (end in `.csv.gz`).


## Shell script method (`import_duckdb.sh`)
Expand Down Expand Up @@ -74,7 +67,7 @@ e.g. `/usr/local/bin`.

### Create DuckDB database and load data

You can do all of this will one shell script, `import_duckdb.sh`,
You can do all of this with one shell script, `import_duckdb.sh`,
located in this repository.

See the help for it below:
Expand Down Expand Up @@ -105,66 +98,7 @@ The script will print out progress as it goes.
Be patient, this can take minutes to hours to load
depending on your computer's configuration.

## Python script method (`import_duckdb.py`)

This method does not require the DuckDB executable, it only requires the DuckDB Python
module and the [SQLGlot](https://github.com/tobymao/sqlglot) Python module, both of which can be
easily installed with `pip`.

### Install dependencies

Install the dependencies by using the included `requirements.txt` file:

```sh
python3 -m pip install -r ./requirements.txt
```

### Create DuckDB database and load data

Create the MIMIC-III database with `import_duckdb.py` like so:

```sh
python ./import_duckdb.py /path/to/mimic_data_dir ./mimic3.db
```

...where `/path/to/mimic_data_dir` is the path containing the .csv or .csv.gz
data files downloaded above.

This command will create the `mimic3.db` file in the current directory. Be aware that
for the full MIMIC-III v1.4 dataset the resulting file will be about 34GB in size.
This process will take some time, as with the shell script version.

The default options will create only the tables and load the data, and assume
that you are running the script from the same directory where this README.md
is located. See the full options below if the defaults are insufficient.

### Create the concepts views

In most cases you will want to create the concepts views at the same time as
the database. To do this, add the `--make-concepts` option:

```sh
python ./import_duckdb.py /path/to/mimic_data_dir ./mimic3.db --make-concepts
```

If you want to add the concepts to a database already created without this
option (or created with the shell script version), you can add the
`--skip-tables` option as well:

```sh
python ./import_duckdb.py /path/to/mimic_data_dir ./mimic3.db --make-concepts --skip-tables
```

### Additional options

There are a few additional options for special situations:

| Option | Description
| - | -
| `--skip-indexes` | Don't create additional indexes when creating tables and loading data. This may be useful in memory-constrained systems or to save a little time.
| `--mimic-code-root [path]` | This argument specifies the location of the mimic-code repository files. This is needed to find the concepts SQL files. This is useful if you are running the script from a different directory than the one where this README.md file is located (the default is `../../../`)
| `--schema-name [name]` | This puts the tables and concepts views into a named schema in the database. This is mainly useful to mirror the behavior of the PostgreSQL version of the database, which places objects in a schema named `mimiciii` by default--if you have existing code designed for the PostgreSQL version, this may make migration easier. Note that--like the PostgreSQL version--the `ccs_dx` view is *not* placed in the specified schema, but in the default schema (which is `main` in DuckDB, not `public` as in PostgreSQL).

# Help

Please see the [issues page](https://github.com/MIT-LCP/mimic-iii/issues) to discuss other issues you may be having.
Please see the [issues page](https://github.com/MIT-LCP/mimic-code/issues) to discuss other issues you may be having.
1 change: 0 additions & 1 deletion mimic-iii/buildmimic/duckdb/duckdb_add_indexes.sql
Original file line number Diff line number Diff line change
Expand Up @@ -551,4 +551,3 @@ CREATE INDEX TRANSFERS_idx03

-- FIXME: Remove this index when the PK can be re-added...
CREATE UNIQUE INDEX chartevents_rowid_pk ON CHARTEVENTS (ROW_ID);

2 changes: 1 addition & 1 deletion mimic-iii/buildmimic/duckdb/duckdb_add_tables.sql
Original file line number Diff line number Diff line change
Expand Up @@ -487,4 +487,4 @@ CREATE TABLE TRANSFERS
OUTTIME TIMESTAMP,
LOS DOUBLE PRECISION,
CONSTRAINT transfers_rowid_pk PRIMARY KEY (ROW_ID)
) ;
) ;
2 changes: 1 addition & 1 deletion mimic-iv/buildmimic/duckdb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ e.g. `/usr/local/bin`.

Download the CSV files for [MIMIC-IV](https://physionet.org/content/mimiciv/)
by any method you wish.
These instructionds were tested with MIMIC-IV v2.2.
These instructions were tested with MIMIC-IV v2.2.

The CSV files should be a folder structure as follows:

Expand Down
2 changes: 1 addition & 1 deletion mimic-iv/concepts_duckdb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This folder has SQL compatible with [DuckDB](https://duckdb.org/).
These concepts were generated automatically from the BigQuery SQL dialect using the [sqlglot](https://sqlglot.com/) package.
If you would like to contribute a correction, it should be for the corresponding file in the concepts folder.
If you would like to contribute a correction, do not make it here. Instead, make your correction in the [concepts folder](/mimic-iv/concepts/) using the BigQuery SQL syntax.

See the [README](/mimic-iv/README.md) in the parent folder for more information.

Expand Down

0 comments on commit f826074

Please sign in to comment.