Releases: FRosner/spawncamping-dds
Releases · FRosner/spawncamping-dds
Data-Driven Spark 4.0.0-beta
Core
- Scan for all
@Help
annotated methods whenDDS.help()
is called - Review and fix short and long descriptions in
@Help
annotations
Build
- Fixed problem where pull requests would fail due to some travis misconfiguration
Data-Driven Spark 4.0.0-alpha
This release introduces a completely new project structure. DDS now has sub modules (core, datasets and web-ui).
Core
- new servables API
- rework Z scale implementation of heatmap servable
- support for Spark 1.5.x
Datasets
- rework datasets creation to use SQLContext implicit conversions (
toDF
) - remove non-necessary Spark context argument from the DataFrame versions of the datasets
- use
java.sql.*
instead ofjava.util.*
for date and timestamps
Web-UI
- make servable titles and history browser more informative
Data-Driven Spark 3.0.1
Bugfixes
- Fix NPE when showing case classes that contain null values
- Fix Travis CI build including coveralls runtime dependency by accident (S3 bucket)
Data-Driven Spark 3.0.0
Spark
- Upgrade to Spark 1.4.0
Usability
- Add visualization REST interface + visualization history browser (drop down menu); This actually allows multiple users to access the same UI and also to refresh the page. However, after refreshing you lose the settings you made (will be fixed in one of the next releases)
Additionally, there have been a lot of refactorings happening (build script, DDS core, Spark SQL functions).
Data-Driven Spark 2.3.1
Bugfixes
- Fix a problem where require.js would sometimes not load d3 correctly. This caused the parallel coordinates to break.
Data-Driven Spark 2.3.0
Example Data Sets
- Added a small data set description and source to the user guide for each data set
- Added a GraphX example data set (Enron email network)
Bugfixes
- Mutual information function crashed on double columns containing NaN values => now NaNs are binned separately
- Fixed a problem where changing the heatmap scale changed black cells (null values) to white
Build and Architecture
- Remove git call in build file that caused the build to crash on systems with older versions of git (<= 1.7.x)
- Use require.js as dependency management system for front-end code
Data-Driven Spark 2.2.0
Analysis and Visualization
- Heatmap draws black cells when values are NaN / null. This is especially useful when the normalized mutual information is not defined.
- New key-value visualization for summary statistics
- Nodes in force layout are movable
- Add charge to force layout to visually separate connected components
- Bin numerical columns before computing mutual information
Misc
- Completely recreate main content div after each visualization
- Compute running covariance, mean and variance for correlation aggregation for better numerical stability
- Log build information (version, revision, time) at DDS object initialization
Data-Driven Spark 2.1.0
General
- Visualizations can now have a title
- Mutual information is rescaled by maximum entropy of both variables to allow comparison of multiple MI values
- Sturge's formula to compute optimal number of histogram bins when user does not provide a number
- Fixed description of RDD summarize function
Spark SQL
- Summary statistics function (
summarize
) for data frames - Bar chart for single data frame columns
- Pie chart for single data frame columns
- Histogram for single data frame columns
- Median for single data frame columns
- Dashboard now uses data frame
summarize
for column statistics - Dashboard provides useful titles for individual visualizations
Data-Driven Spark 2.0.0
Build
- Upgrade from Spark 1.2 to Spark 1.3
- SchemaRDD to DataFrame
- Resolve SLF4J class path conflicts
- Avoid serialization bug in flights example data set in Spark shell
- Change default Scala version for
sbt build
to 2.10 (was 2.11)
Analysis and Visualization
- First version of dashboard function
- Visualizations are now drawn independently from each other using a document-wide cache to store configuration under their content id as a key
- Bootstrap CSS layout for columnar layout
- Dashboard shows a sample, column dependencies and summary statistics for each column
Bugfixes
- Changing the upper bound of heatmap scales caused heatmap to ignore the selected colors and redraw with default
Data-Driven Spark 1.8.0
Analysis and Visualization
- Implement strategy for missing values in pearson correlation matrix function
- Add three color scale to heat maps
- Allow manual adjustment of color scale range in heat maps
- Add mutual information matrix function (non-normalized and no binning of numerical data, yet)
Example Data Sets
- Flights data set having many numerical and nullable columns
Bugfixes
- Median function requires numerical RDD but was throwing NPE in case of non-numeric one instead of showing that it requires an implicit numeric