Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Chapter2] ValueError when calculating the correlations in corr_matrix = housing.corr() #614

Open
tobihh12 opened this issue May 19, 2023 · 4 comments

Comments

@tobihh12
Copy link

tobihh12 commented May 19, 2023

The following ValueError occurs when calling

corr_matrix = housing.corr()

ValueError: could not convert string to float: 'INLAND'

Obviously the DataFrame gets confused by the values in "ocean_proximity"

For me the solution was to change the 2 lines were the correlations are calculated to:

corr_matrix = housing.corr(numeric_only=True)

According to:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html

The default value of numeric_only for DataFrame.corr() was changed to False effective as of pandas 2.0.0

@suho-han
Copy link

I got the same problem too.
As you said numeric_only should be added.

Changed in version 2.0.0: The default value of numeric_only is now False.

Problem I got below

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[36], line 1
----> 1 corr_matrix = housing.corr()

File [c:\Users\hansu\anaconda3\envs\handson-ml2\Lib\site-packages\pandas\core\frame.py:10054](file:///C:/Users/hansu/anaconda3/envs/handson-ml2/Lib/site-packages/pandas/core/frame.py:10054), in DataFrame.corr(self, method, min_periods, numeric_only)
  10052 cols = data.columns
  10053 idx = cols.copy()
> 10054 mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)
  10056 if method == "pearson":
  10057     correl = libalgos.nancorr(mat, minp=min_periods)

File [c:\Users\hansu\anaconda3\envs\handson-ml2\Lib\site-packages\pandas\core\frame.py:1838](file:///C:/Users/hansu/anaconda3/envs/handson-ml2/Lib/site-packages/pandas/core/frame.py:1838), in DataFrame.to_numpy(self, dtype, copy, na_value)
   1836 if dtype is not None:
   1837     dtype = np.dtype(dtype)
-> 1838 result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)
   1839 if result.dtype is not dtype:
   1840     result = np.array(result, dtype=dtype, copy=False)

File [c:\Users\hansu\anaconda3\envs\handson-ml2\Lib\site-packages\pandas\core\internals\managers.py:1732](file:///C:/Users/hansu/anaconda3/envs/handson-ml2/Lib/site-packages/pandas/core/internals/managers.py:1732), in BlockManager.as_array(self, dtype, copy, na_value)
   1730         arr.flags.writeable = False
   1731 else:
-> 1732     arr = self._interleave(dtype=dtype, na_value=na_value)
   1733     # The underlying data was copied within _interleave, so no need
   1734     # to further copy if copy=True or setting na_value
...
-> 1794     result[rl.indexer] = arr
   1795     itemmask[rl.indexer] = 1
   1797 if not itemmask.all():

ValueError: could not convert string to float: 'INLAND'
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?b1c2f400-b237-4442-b924-315adf80e9c3) or open in a [text editor](command:workbench.action.openLargeOutput?b1c2f400-b237-4442-b924-315adf80e9c3). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...

@rocky-d
Copy link

rocky-d commented Feb 28, 2024

same with you guys.

@ateebkhan96
Copy link

In new version of python, numeric_only by default is set to False.

@rahullo
Copy link

rahullo commented Aug 22, 2024

I got the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants