Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fix simple typo, puncutation -> punctuation #435

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/formatting_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ NOTE: We assume each column is a numerical column, unless you specify otherwise

#. ``attribute_name: 'output'`` The ``column_descriptions`` dictionary must specify one of your attributes as the output column. This is what the ``auto_ml`` predictor will try to predict. Importantly, the data you pass into ``.train()`` should have the correct values for this column, so we can teach the algorithms what is right and what is wrong.
#. ``attribute_name: 'categorical'`` All attribute names that hold a string in any of the rows after the header row will be encoded as categorical data. If, however, you have any numerical columns that you want encoded as categorical data, you can specify that here.
#. ``attribute_name: 'nlp'`` If any of your data is a text field that you'd like to run some Natural Language Processing on, specify that in the header row. Data stored in this attribute will be encoded using TF-IDF, along with some other feature engineering (count of some aggregations like total capital letters, puncutation characters, smiley faces, etc., as well as a sentiment prediction of that text).
#. ``attribute_name: 'nlp'`` If any of your data is a text field that you'd like to run some Natural Language Processing on, specify that in the header row. Data stored in this attribute will be encoded using TF-IDF, along with some other feature engineering (count of some aggregations like total capital letters, punctuation characters, smiley faces, etc., as well as a sentiment prediction of that text).
#. ``attribute_name: 'ignore'`` This column of data will be ignored.
#. ``attribute_name: 'date'`` Since ML algorithms don't know how to handle a Python datetime object, we will perform feature engineering on this object, creating new features like day_of_week, or minutes_into_day, etc. Then the original date field will be removed from the training data so the algorithsm don't throw a TypeError.

Expand Down