You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.
Hi,
I am running my capstone project and working on my dataset. When I tried to clean my dataset removing the outliers, I am getting this error.
I am attaching the code as below.
#Removing Outliers
#Tukey Method
import required libraries
from collections import Counter
Outlier detection
def detect_outliers(df,n,features):
outlier_indices = []
# iterate over features(columns)
for col in features:
# 1st quartile (25%)
Q1 = np.percentile(df[col], 25)
# 3rd quartile (75%)
Q3 = np.percentile(df[col],75)
# Interquartile range (IQR)
IQR = Q3 - Q1
# outlier step
outlier_step = 1.5 * IQR
# Determine a list of indices of outliers for feature col
outlier_list_col = df[(df[col] < Q1 - outlier_step) | (df[col] > Q3 + outlier_step )].index
# append the found outlier indices for col to the list of outlier indices
outlier_indices.extend(outlier_list_col)
# select observations containing more than 2 outliers
outlier_indices = Counter(outlier_indices)
multiple_outliers = list( k for k, v in outlier_indices.items() if v > n )
return multiple_outliers
Hi,
I am running my capstone project and working on my dataset. When I tried to clean my dataset removing the outliers, I am getting this error.
I am attaching the code as below.
#Removing Outliers
#Tukey Method
import required libraries
from collections import Counter
Outlier detection
def detect_outliers(df,n,features):
List of Outliers
Outliers_to_drop = detect_outliers(data1.drop('Class',axis=1),0,list(data1.drop('Class',axis=1)))
data1.drop('Class',axis=1).loc[Outliers_to_drop]
#Create New Dataset without Outliers
good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True)
good_data.info()
IndexError Traceback (most recent call last)
in
1 #Create New Dataset without Outliers
----> 2 good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True)
3 good_data.info()
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in getitem(self, key)
4289
4290 key = com.values_from_object(key)
-> 4291 result = getitem(key)
4292 if not is_scalar(result):
4293 return promote(result)
IndexError: index 5000 is out of bounds for axis 0 with size 5000
Can any one help me to fix this and code it properly.
The text was updated successfully, but these errors were encountered: