Skip to content

Commit

Permalink
Merge pull request #1689 from MIT-LCP/duckdb_concepts
Browse files Browse the repository at this point in the history
Add duckdb build/concepts and use SQLGlot to convert BigQuery SQL into other dialects
  • Loading branch information
alistairewj authored Feb 20, 2024
2 parents 8cb6028 + 1dfa41c commit b9ed7a3
Show file tree
Hide file tree
Showing 163 changed files with 13,606 additions and 7,860 deletions.
17 changes: 16 additions & 1 deletion .github/workflows/psql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ jobs:
- name: Check out repository code
uses: actions/checkout@v3

- name: Install Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Download demo data
uses: ./.github/actions/download-demo

Expand Down Expand Up @@ -60,7 +65,7 @@ jobs:
PGPASSWORD: postgres
BUILDCODE_PATH: mimic-iv/buildmimic/postgres

- name: Build mimic-iv concepts
- name: mimic-iv/concepts psql build
run: |
psql -h $POSTGRES_HOST -U postgres -f postgres-functions.sql
psql -h $POSTGRES_HOST -U postgres -f postgres-make-concepts.sql
Expand All @@ -69,6 +74,16 @@ jobs:
POSTGRES_HOST: postgres
PGPASSWORD: postgres

- name: mimic_utils - convert mimic-iv concepts to PostgreSQL and rebuild
run: |
pip install .
mimic_utils convert_folder mimic-iv/concepts mimic-iv/concepts_postgres --source_dialect bigquery --destination_dialect postgres
psql -h $POSTGRES_HOST -U postgres -f mimic-iv/concepts_postgres/postgres-make-concepts.sql
working-directory: ./
env:
POSTGRES_HOST: postgres
PGPASSWORD: postgres

- name: Load ed data into PostgreSQL
run: |
echo "Loading data into psql."
Expand Down
3 changes: 3 additions & 0 deletions README_mimic_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# mimic_utils package

This package contains utilities for working with the MIMIC datasets.
85 changes: 41 additions & 44 deletions mimic-iii/buildmimic/duckdb/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,51 @@
# DuckDB
# MIMIC-III in DuckDB

The script in this folder creates the schema for MIMIC-IV and
The scripts in this folder create the schema for MIMIC-III and
loads the data into the appropriate tables for
[DuckDB](https://duckdb.org/).

DuckDB, like SQLite, is serverless and
stores all information in a single file.
Unlike SQLite, an OLTP database,
DuckDB is an OLAP database, and therefore optimized for analytical queries.
This will result in faster queries for researchers using MIMIC-IV
This will result in faster queries for researchers using MIMIC-III
with DuckDB compared to SQLite.
To learn more, please read their ["why duckdb"](https://duckdb.org/docs/why_duckdb)
page.

The instructions to load MIMIC-III into a DuckDB
only require:
1. DuckDB to be installed and
## Download MIMIC-III files

[Download](https://physionet.org/content/mimiciii/1.4/)
the CSV files for MIMIC-III by any method you wish.
(These scripts should also work with the much smaller
[demo version](https://physionet.org/content/mimiciii-demo/1.4/#files-panel)
of the dataset.)

The easiest way to download them is to open a terminal then run:

```
wget -r -N -c -np -nH --cut-dirs=1 --user YOURUSERNAME --ask-password https://physionet.org/files/mimiciii/1.4/
```

Replace `YOURUSERNAME` with your physionet username.

The rest of these intructions assume the CSV files are in the folder structure as follows:

```
mimic_data_dir/
ADMISSIONS.csv.gz
CALLOUT.csv.gz
...
```

By default, the above `wget` downloads the data into `mimiciii/1.4` (as we used `--cut-dirs=1` to remove the base folder). Thus, by default, `mimic_data_dir` is `mimiciii/1.4` (relative to the current folder). The CSV files can be uncompressed (end in `.csv`) or compressed (end in `.csv.gz`).


## Shell script method (`import_duckdb.sh`)

Using this script to load MIMIC-III into a DuckDB
only requires:
1. DuckDB to be installed (the `duckdb` executable must be in your PATH)
2. Your computer to have a POSIX-compliant terminal shell,
which is already found by default on any Mac OSX, Linux, or BSD installation.

Expand All @@ -24,14 +55,6 @@ which you can obtain by either installing
[Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
or [Cygwin](https://www.cygwin.com/).

## Set-up

### Quick overview

1. [Install](https://duckdb.org/docs/installation/) the CLI version of DuckDB
2. [Download](https://physionet.org/content/mimiciii/1.4/) the MIMIC-III files
3. Create DuckDB database and load data

### Install DuckDB

Follow instructions on their website to
Expand All @@ -41,37 +64,10 @@ the CLI version of DuckDB.
You will need to place the `duckdb` binary in a folder on your environment path,
e.g. `/usr/local/bin`.

### Download MIMIC-III files

[Download](https://physionet.org/content/mimiciii/1.4/)
the CSV files for MIMIC-III by any method you wish.

The intructions assume the CSV files are in the folder structure as follows:

```
mimic_data_dir
ADMISSIONS.csv.gz
...
```

The CSV files can be uncompressed (end in `.csv`) or compressed (end in `.csv.gz`).
### Create DuckDB database and load data

The easiest way to download them is to open a terminal then run:

```
wget -r -N -c -np -nH --cut-dirs=1 --user YOURUSERNAME --ask-password https://physionet.org/files/mimiciii/1.4/
```

Replace `YOURUSERNAME` with your physionet username.

This will make you `mimic_data_dir` be `mimiciii/1.4`.

# Create DuckDB database and load data

The last step requires creating a DuckDB database and
loading the data into it.

You can do all of this will one shell script, `import_duckdb.sh`,
You can do all of this with one shell script, `import_duckdb.sh`,
located in this repository.

See the help for it below:
Expand Down Expand Up @@ -102,6 +98,7 @@ The script will print out progress as it goes.
Be patient, this can take minutes to hours to load
depending on your computer's configuration.


# Help

Please see the [issues page](https://github.com/MIT-LCP/mimic-iii/issues) to discuss other issues you may be having.
Please see the [issues page](https://github.com/MIT-LCP/mimic-code/issues) to discuss other issues you may be having.
Loading

0 comments on commit b9ed7a3

Please sign in to comment.