Skip to content

Commit

Permalink
Merge pull request #138 from pacospace/pre-commit-happy
Browse files Browse the repository at this point in the history
Make pre-commit happy
  • Loading branch information
Francesco Murdaca authored Mar 16, 2022
2 parents c7c9208 + 0f602e6 commit ef29347
Show file tree
Hide file tree
Showing 46 changed files with 794 additions and 402 deletions.
24 changes: 6 additions & 18 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
repos:
- repo: git://github.com/Lucas-C/pre-commit-hooks
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.1.10
hooks:
- id: remove-tabs

- repo: git://github.com/pre-commit/pre-commit-hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
Expand All @@ -21,36 +21,24 @@ repos:
- id: check-ast
- id: debug-statements

- repo: git://github.com/pycqa/pydocstyle.git
- repo: https://github.com/pycqa/pydocstyle.git
rev: 6.1.1
hooks:
- id: pydocstyle

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
rev: v4.1.0
hooks:
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.902
hooks:
- id: mypy
exclude: '^(docs|tasks|tests)|setup\.py'
args: [--ignore-missing-imports]

- repo: https://github.com/psf/black
rev: 21.6b0
rev: 22.1.0
hooks:
- id: black

- repo: https://github.com/tomcatling/black-nb
rev: '0.5.0'
hooks:
- id: black-nb

# Enable this in repositories with python packages.
# - repo: https://github.com/mgedmin/check-manifest
# rev: '0.39'
Expand All @@ -63,4 +51,4 @@ repos:
- id: flake8-nb
additional_dependencies: ['pep8-naming']
# Ignore all format-related checks as Black takes care of those.
args: ['--ignore', 'E2,W5', '--select', 'E,W,F,N', '--max-line-length=120']
args: ['--ignore', 'E2,W5', '--select', 'E,W,F,N', '--max-line-length=130']
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The following demos provide examples of how to use the tools available with [Ope
* [Run SQL queries from a Jupyter Notebook environment](notebooks/demo1/demo1-join-tables.ipynb)
* [Demo 1 Elyra Pipeline](https://github.com/os-climate/aicoe-osc-demo/blob/master/notebooks/demo1/demo1.pipeline)
* [Results visualized on a Superset Dashboard](https://superset-secure-odh-superset.apps.odh-cl1.apps.os-climate.org/superset/dashboard/3/)
* [Video on creating Elyra Pipelines and Superset Dashboard](https://youtu.be/TFgsR7UlcHA)
* [Video on creating Elyra Pipelines and Superset Dashboard](https://youtu.be/TFgsR7UlcHA)


## [Demo 2 - Automated Inference Pipeline and Dashboarding](notebooks/demo2/README.md)
Expand Down
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
title: Open Data Hub and Operate First for OS-Climate
title: Open Data Hub and Operate First for OS-Climate
author: AIOps
# logo: logo.png
execute:
Expand Down Expand Up @@ -33,4 +33,4 @@ launch_buttons:
sphinx:
config:
html_js_files:
- https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js
- https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js
2 changes: 1 addition & 1 deletion _toc.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
format: jb-book
root: README
parts:
- caption: Demo 1 - ETL & Dashboarding
- caption: Demo 1 - ETL & Dashboarding
chapters:
- file: notebooks/demo1/README
- file: notebooks/demo1/demo1-create-tables
Expand Down
2 changes: 1 addition & 1 deletion data/superset/demo1.json
Original file line number Diff line number Diff line change
Expand Up @@ -10051,4 +10051,4 @@
}
}
]
}
}
2 changes: 1 addition & 1 deletion data/superset/demo2.json
Original file line number Diff line number Diff line change
Expand Up @@ -358,4 +358,4 @@
}
}
]
}
}
14 changes: 7 additions & 7 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
master_doc = "index"

# General information about the project.
project = u"project-template"
project = "project-template"

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
Expand Down Expand Up @@ -183,8 +183,8 @@
(
"index",
"project-template.tex",
u"project-template Documentation",
u"aicoe-aiops",
"project-template Documentation",
"aicoe-aiops",
"manual",
),
]
Expand Down Expand Up @@ -218,8 +218,8 @@
(
"index",
"project-template",
u"project-template Documentation",
[u"aicoe-aiops"],
"project-template Documentation",
["aicoe-aiops"],
1,
)
]
Expand All @@ -237,8 +237,8 @@
(
"index",
"project-template",
u"project-template Documentation",
u"aicoe-aiops",
"project-template Documentation",
"aicoe-aiops",
"project-template",
"template for the team to use",
"Miscellaneous",
Expand Down
1 change: 1 addition & 0 deletions notebooks/demo1/demo1-create-tables.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@
" pz = list(zip(df.columns.to_list(), stypes))\n",
" return \",\\n\".join([\" {n} {t}\".format(n=e[0], t=e[1]) for e in pz])\n",
"\n",
"\n",
"# Convert GHG values with string representation of numbers to float\n",
"def str_w_spaces_to_numeric(df, col):\n",
" df[col] = df[col].str.replace(' ','').str.replace(',','')\n",
Expand Down
13 changes: 6 additions & 7 deletions notebooks/demo2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ You can also use AICoE-CI to enable other Thoth services such as pre-commit chec
![Spawn JupyterHub](../../docs/assets/demo1-spawn-jupyter.png)


## Data Preprocessing
## Data Preprocessing

Now let’s look at how we process raw data and prepare it for model training.
Now let’s look at how we process raw data and prepare it for model training.
The source code for preprocessing is available in the `src` directory preinstalled in the JupyterHub image. This directory follows the project structure laid out in the [aicoe-aiops project template](https://github.com/aicoe-aiops/project-template).

* Extraction
Expand All @@ -41,20 +41,20 @@ The source code for preprocessing is available in the `src` directory preinstall

* Curation

* In the text and table curation notebook, we will load the json files (one per pdf) and the corresponding csv files from the s3 bucket, and then add labels to it. For each text extract or table, we will assign label "1" to the correct corresponding text, and label "0" to a randomly selected text that does not correspond to the table.
* In the text and table curation notebook, we will load the json files (one per pdf) and the corresponding csv files from the s3 bucket, and then add labels to it. For each text extract or table, we will assign label "1" to the correct corresponding text, and label "0" to a randomly selected text that does not correspond to the table.

## Inference
## Inference

* Infer relevance
* The infer relevance notebook takes in extracted text from the preprocessing stage and for a predefined set of KPI questions, finds relevant paragraphs from the text. These paragraphs are then used to find the exact answers to the questions. The notebook uses a fine-tuned language model stored on s3. The output prediction csv files are saved back on s3.

* Infer KPI
* The infer kpi notebook takes in the results from the infer relevance stage and for the predefined set of KPI questions, it finds the exact answers from the relevant paragraphs. The notebook uses a fine-tuned language model stored on s3. The output prediction csv files are saved back on s3.

## Trino
## Trino

* Results table
* The create results table notebook takes the prediction output csv from infer KPI step and creates a Trino SQL table that can be used for querying and visualization in Superset.
* The create results table notebook takes the prediction output csv from infer KPI step and creates a Trino SQL table that can be used for querying and visualization in Superset.

## Elyra pipeline

Expand All @@ -68,4 +68,3 @@ The source code for preprocessing is available in the `src` directory preinstall
## Superset Visualization

* The Superset dashboard is the final step of demo 2. The automated Elyra inference pipeline answers KPI questions from raw pdfs and stores the results in the Trino table. The dashboard queries the table according to user selected filters and shows the answers. To interact with the results, find the [dashboard here](https://superset-secure-odh-superset.apps.odh-cl1.apps.os-climate.org/superset/dashboard/15).

4 changes: 1 addition & 3 deletions notebooks/demo2/config.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
"""Default runtime config."""
import src
import pathlib
import torch
import os

# General config
STAGE = "extract" # "extract" | "curate "
SEED = 42

if os.getenv("AUTOMATION"):
ROOT = pathlib.Path("/opt/app-root")
ROOT = pathlib.Path("/opt/app-root")
else:
ROOT = pathlib.Path(__file__).resolve().parent.parent.parent

Expand Down
Loading

0 comments on commit ef29347

Please sign in to comment.