Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

Exploring various dimension reduction techniques

Kamil A. Kaczmarek edited this page Jul 10, 2018 · 2 revisions

water buffalo 🐃

Feature Extraction

  • factor analysis
  factor_analysis__n_components: 50
  • sparse random projection
  sparse_random_projection__n_components: 50
  • more row-wise aggregations
def aggregate_row(row):
    non_zero_values = row.iloc[row.nonzero()]
    aggs = {'non_zero_mean': non_zero_values.mean(),
            'non_zero_max': non_zero_values.max(),
            'non_zero_min': non_zero_values.min(),
            'non_zero_std': non_zero_values.std(),
            'non_zero_sum': non_zero_values.sum(),
            'non_zero_count': non_zero_values.count(),
            'non_zero_fraction': non_zero_values.count() / row.count()
            }
    return pd.Series(aggs)
  • not using raw features

Model and result

lightGBM new aggregations + projections (second best) 1.336 CV 1.39 LB

Pipeline diagram

pipeline-solution-5