Data-Driven Spark 1.8.0
Analysis and Visualization
- Implement strategy for missing values in pearson correlation matrix function
- Add three color scale to heat maps
- Allow manual adjustment of color scale range in heat maps
- Add mutual information matrix function (non-normalized and no binning of numerical data, yet)
Example Data Sets
- Flights data set having many numerical and nullable columns
Bugfixes
- Median function requires numerical RDD but was throwing NPE in case of non-numeric one instead of showing that it requires an implicit numeric