Skip to content

Data-Driven Spark 2.1.0

Compare
Choose a tag to compare
@FRosner FRosner released this 12 Jul 20:28
· 169 commits to master since this release

General

  • Visualizations can now have a title
  • Mutual information is rescaled by maximum entropy of both variables to allow comparison of multiple MI values
  • Sturge's formula to compute optimal number of histogram bins when user does not provide a number
  • Fixed description of RDD summarize function

Spark SQL

  • Summary statistics function (summarize) for data frames
  • Bar chart for single data frame columns
  • Pie chart for single data frame columns
  • Histogram for single data frame columns
  • Median for single data frame columns
  • Dashboard now uses data frame summarize for column statistics
  • Dashboard provides useful titles for individual visualizations