Skip to content

Releases: FRosner/spawncamping-dds

Data-Driven Spark 4.0.0-beta

03 Feb 10:04
Compare
Choose a tag to compare
Pre-release

Core

  • Scan for all @Help annotated methods when DDS.help() is called
  • Review and fix short and long descriptions in @Help annotations

Build

  • Fixed problem where pull requests would fail due to some travis misconfiguration

Data-Driven Spark 4.0.0-alpha

02 Jan 14:44
Compare
Choose a tag to compare
Pre-release

This release introduces a completely new project structure. DDS now has sub modules (core, datasets and web-ui).

Core

  • new servables API
  • rework Z scale implementation of heatmap servable
  • support for Spark 1.5.x

Datasets

  • rework datasets creation to use SQLContext implicit conversions (toDF)
  • remove non-necessary Spark context argument from the DataFrame versions of the datasets
  • use java.sql.* instead of java.util.* for date and timestamps

Web-UI

  • make servable titles and history browser more informative

Data-Driven Spark 3.0.1

22 Sep 16:05
Compare
Choose a tag to compare

Bugfixes

  • Fix NPE when showing case classes that contain null values
  • Fix Travis CI build including coveralls runtime dependency by accident (S3 bucket)

Data-Driven Spark 3.0.0

10 Sep 12:39
Compare
Choose a tag to compare

Spark

  • Upgrade to Spark 1.4.0

Usability

  • Add visualization REST interface + visualization history browser (drop down menu); This actually allows multiple users to access the same UI and also to refresh the page. However, after refreshing you lose the settings you made (will be fixed in one of the next releases)

Additionally, there have been a lot of refactorings happening (build script, DDS core, Spark SQL functions).

Data-Driven Spark 2.3.1

02 Sep 12:21
Compare
Choose a tag to compare

Bugfixes

  • Fix a problem where require.js would sometimes not load d3 correctly. This caused the parallel coordinates to break.

Data-Driven Spark 2.3.0

18 Aug 12:23
Compare
Choose a tag to compare

Example Data Sets

  • Added a small data set description and source to the user guide for each data set
  • Added a GraphX example data set (Enron email network)

Bugfixes

  • Mutual information function crashed on double columns containing NaN values => now NaNs are binned separately
  • Fixed a problem where changing the heatmap scale changed black cells (null values) to white

Build and Architecture

  • Remove git call in build file that caused the build to crash on systems with older versions of git (<= 1.7.x)
  • Use require.js as dependency management system for front-end code

Data-Driven Spark 2.2.0

01 Aug 10:40
Compare
Choose a tag to compare

Analysis and Visualization

  • Heatmap draws black cells when values are NaN / null. This is especially useful when the normalized mutual information is not defined.
  • New key-value visualization for summary statistics
  • Nodes in force layout are movable
  • Add charge to force layout to visually separate connected components
  • Bin numerical columns before computing mutual information

Misc

  • Completely recreate main content div after each visualization
  • Compute running covariance, mean and variance for correlation aggregation for better numerical stability
  • Log build information (version, revision, time) at DDS object initialization

Data-Driven Spark 2.1.0

12 Jul 20:28
Compare
Choose a tag to compare

General

  • Visualizations can now have a title
  • Mutual information is rescaled by maximum entropy of both variables to allow comparison of multiple MI values
  • Sturge's formula to compute optimal number of histogram bins when user does not provide a number
  • Fixed description of RDD summarize function

Spark SQL

  • Summary statistics function (summarize) for data frames
  • Bar chart for single data frame columns
  • Pie chart for single data frame columns
  • Histogram for single data frame columns
  • Median for single data frame columns
  • Dashboard now uses data frame summarize for column statistics
  • Dashboard provides useful titles for individual visualizations

Data-Driven Spark 2.0.0

19 Jun 08:43
Compare
Choose a tag to compare

Build

  • Upgrade from Spark 1.2 to Spark 1.3
    • SchemaRDD to DataFrame
    • Resolve SLF4J class path conflicts
    • Avoid serialization bug in flights example data set in Spark shell
  • Change default Scala version for sbt build to 2.10 (was 2.11)

Analysis and Visualization

  • First version of dashboard function
    • Visualizations are now drawn independently from each other using a document-wide cache to store configuration under their content id as a key
    • Bootstrap CSS layout for columnar layout
    • Dashboard shows a sample, column dependencies and summary statistics for each column

Bugfixes

  • Changing the upper bound of heatmap scales caused heatmap to ignore the selected colors and redraw with default

Data-Driven Spark 1.8.0

03 Jun 09:35
Compare
Choose a tag to compare

Analysis and Visualization

  • Implement strategy for missing values in pearson correlation matrix function
  • Add three color scale to heat maps
  • Allow manual adjustment of color scale range in heat maps
  • Add mutual information matrix function (non-normalized and no binning of numerical data, yet)

Example Data Sets

  • Flights data set having many numerical and nullable columns

Bugfixes

  • Median function requires numerical RDD but was throwing NPE in case of non-numeric one instead of showing that it requires an implicit numeric