Skip to content

Commit

Permalink
Merge pull request #83 from Hynn01/main
Browse files Browse the repository at this point in the history
Small fix for unnecessary-iteration-pandas checker and update documentation
  • Loading branch information
Hynn01 authored May 23, 2022
2 parents 39f9dce + 5bfc8ee commit 803575f
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 11 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ poetry run pytest .

- **W5516 | forward-pytorch | Net Forward Checker(PyTorch)**: It is recommended to use self.net() rather than self.net.forward() in PyTorch code. If self.net.forward() is used in the code, the rule is violated.

- **W5517 | gradient-clear-pytorch | Gradient Clear Checker(PyTorch)**: The loss_fn.backward() and optimizer.step() should be used together with optimizer.zero_grad(). If the ".backward()" is missing in the code, the rule is violated.
- **W5517 | gradient-clear-pytorch | Gradient Clear Checker(PyTorch)**: The loss_fn.backward() and optimizer.step() should be used together with optimizer.zero_grad(). If the `.zero_grad()` is missing in the code, the rule is violated.

- **W5518 | data-leakage-scikitlearn | Data Leakage Checker(ScikitLearn)**: All scikit-learn estimators should be used inside Pipelines, to prevent data leakage between training and test data.

Expand Down
38 changes: 36 additions & 2 deletions STEPS_TO_FOLLOW.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
### Here are the steps to follow for the evaluation :)

We recommend you wrap your project in a parent folder and run the following command on that folder. The output **txt** file, by default, will be generated at the folder where you run your command on.
We recommend you wrap your project (or jupyter notebook) in a parent folder and run the following command on that folder. The output **txt** file, by default, will be generated at the folder where you run your command on.

## For Python Project:

Expand All @@ -10,6 +10,8 @@ Install `dslinter` from the Python Package Index:
pip install dslinter
```
### STEP 2
A `__init__.py` file (can be empty) is expected at the <path_to_the_project> folder.

Copy the following command in your terminal, type in the path to your project, and press `enter` to run:

[For Linux/Mac OS Users]:
Expand Down Expand Up @@ -37,9 +39,41 @@ pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iterati
```

## For Notebook:

### STEP 1
For notebook, we need to convert it to Python file first and run `dslinter` on the Python file.
To convert the notebook to Python file, run:
```
jupyter nbconvert --to script <path_to_the_notebook>
```
Then following the two steps mentioned above for Python project.
### STEP 2
Install `dslinter` from the Python Package Index:
```
pip install dslinter
```
### STEP 3
Copy the following command in your terminal, type in the path to your project, and press `enter` to run:

[For Linux/Mac OS Users]:
```
pylint \
--load-plugins=dslinter \
--disable=all \
--enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,\
nan-numpy,chain-indexing-pandas,datatype-pandas,\
column-selection-pandas,merge-parameter-pandas,inplace-pandas,\
dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,\
hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,\
deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,\
randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,\
missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,\
forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,\
dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch \
--output-format=json:report.json,text:report.txt,colorized \
--reports=y \
<path_to_the_python_file>
```
[For Windows Users]:
```
pylint --load-plugins=dslinter --disable=all --enable=import,unnecessary-iteration-pandas,unnecessary-iteration-tensorflow,nan-numpy,chain-indexing-pandas,datatype-pandas,column-selection-pandas,merge-parameter-pandas,inplace-pandas,dataframe-conversion-pandas,scaler-missing-scikitlearn,hyperparameters-scikitlearn,hyperparameters-tensorflow,hyperparameters-pytorch,memory-release-tensorflow,deterministic-pytorch,randomness-control-numpy,randomness-control-scikitlearn,randomness-control-tensorflow,randomness-control-pytorch,randomness-control-dataloader-pytorch,missing-mask-tensorflow,missing-mask-pytorch,tensor-array-tensorflow,forward-pytorch,gradient-clear-pytorch,data-leakage-scikitlearn,dependent-threshold-scikitlearn,dependent-threshold-tensorflow,dependent-threshold-pytorch --output-format=json:report.json,text:report.txt,colorized --reports=y <path_to_the_python_file>
```
2 changes: 1 addition & 1 deletion dslinter/checkers/unnecessary_iteration_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def visit_call(self, node: astroid.Call):
"""
try:
if self._iterating_through_dataframe(node):
self.add_message("dataframe-iteration-modification-pandas", node=node)
self.add_message("unnecessary-iteration-pandas", node=node)
except: # pylint: disable=bare-except
ExceptionHandler.handle(self, node)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def test_iterating_through_dataframe(self):
"""
module_tree = astroid.parse(script)
call = module_tree.body[-1].iter
with self.assertAddsMessages(pylint.testutils.MessageTest(msg_id="dataframe-iteration-modification-pandas", node=call),):
with self.assertAddsMessages(pylint.testutils.MessageTest(msg_id="unnecessary-iteration-pandas", node=call),):
self.checker.visit_module(module_tree)
self.checker.visit_call(call)

Expand Down
12 changes: 6 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ skip = 'scripts'

[tool.poetry]
name = "dslinter"
version = "2.0.6"
version = "2.0.7"
description = "`dslinter` is a pylint plugin for linting data science and machine learning code. We plan to support the following Python libraries: TensorFlow, PyTorch, Scikit-Learn, Pandas, NumPy and SciPy."

license = "GPL-3.0 License"
Expand Down Expand Up @@ -52,11 +52,11 @@ toml = "^0.10"
# cleo = { git = "https://github.com/sdispater/cleo.git", branch = "master" }
# Optional dependencies (extras)
# pendulum = { version = "^1.4", optional = true }
pylint = { version = "2.12.2" }
astroid = { version = "2.9.3" }
mypy = { version = "0.931" }
data-science-types = { version = "0.2.23" }
pyspark-stubs = {version = "3.0.0.post3" }
pylint = { version = "~2.12.2" }
astroid = { version = "~2.9.3" }
mypy = { version = "~0.931" }
data-science-types = { version = "~0.2.23" }
pyspark-stubs = {version = "~3.0.0.post3" }

[tool.poetry.dev-dependencies]
pytest = "^3.0"
Expand Down

0 comments on commit 803575f

Please sign in to comment.