Skip to content

Latest commit

 

History

History
77 lines (46 loc) · 4.52 KB

File metadata and controls

77 lines (46 loc) · 4.52 KB

Surface-Water-Quality-Data-Anomaly-Detection

Surface water quality data analysis and prediction of Potomac River, West Virginia, USA. Using time series forecasting, and anomaly detection : ARIMA, SARIMA, Isolation Forest, OCSVM and Gaussian Distribution

There exists an imperious need for development of schemes to analyse constantly monitored environmental data i.e. information about the various aspects of the ecosystem such as Surface Water Quality Parameters such as Dissolved Oxygen, Turbidity, Specific Conductance of water and analyse them for unnatural increase in their general values above predetermined standard levels to detect environmental anomalies that cause such increase. These parameters reflect the absolute state of the ecosystem of a particular geographical area, and thus help us to access any present or future discrepancies which can cause environmental degradation by direct or indirect activities of man in the geographical area.

This process is done using Time Series forecasting techniques ARIMA and Seasonal ARIMA and anomaly detection techniques which are Isolation Forest, Gaussian Distribution, OneclassSVM.

Working of Project:

Stationarity of Dataset

  1. 1. Augmented Dickey Fuller Test

    2. Rolling Mean Plot

Time Series Forecasting with : ARIMA

Result : ARIMA

Result : Seasonal ARIMA with window = 192(Daily number of observations)

Time Series Forecast Result Analysis


Isolation Forest Anomaly Detection

iTree Generation and Anomaly Score Calculation

Isolation Forest

Result : iForest Anomaly Detection


OneClassSVM

Result : OneClassSVM


Gaussian Distribution

Result


Anomaly Detection Result Analysis

The above graph shows that isolation forest may be detecting a lot more false positives than the other approaches or it might be over measuring the result. All other methods give similar result with anomaly percentage ranging from 9 to 20 %. The Anomaly graph predictions shown earlier indicate that most anomalies occur on 29 January, 2017 and also on 22 March, 2017. These anomalies can be acknowledged by the fact that these dates had actually shown intensity rainfalls on the monitoring site.