Data-Driven Spark 2.1.0
General
- Visualizations can now have a title
- Mutual information is rescaled by maximum entropy of both variables to allow comparison of multiple MI values
- Sturge's formula to compute optimal number of histogram bins when user does not provide a number
- Fixed description of RDD summarize function
Spark SQL
- Summary statistics function (
summarize
) for data frames - Bar chart for single data frame columns
- Pie chart for single data frame columns
- Histogram for single data frame columns
- Median for single data frame columns
- Dashboard now uses data frame
summarize
for column statistics - Dashboard provides useful titles for individual visualizations