DevpostDescription.txt


## Built with: Python, Azure, plotly, pandas, flask, dash, numpy, sklearn, prophet, GitHub Actions, GIT

## Inspiration
According to the problem statement, Machine Learning can be utilized to solve the problem
of temperature fluctuations in microscopes. These fluctuations change the focus in the microscope, and
it results in poor quality of images. This problem complicates the usage of confocal 4D imaging,
which is applied in:
- histopathology;
- organs and bones modelling;
- nerve cells and brain slices modelling;
- live cell cultures observation;
- and many more important research fields...
From the data we know that the temperature fluctuations can be very strong for example:
- ("source_id": MICDEV03, "sensor_name": "LKM980_Main_Temperature_Outside", "datetime": "Jan 20, 2021, 15:05:17", "sensor_value": 22.1)
  ("source_id": MICDEV03, "sensor_name": "LKM980_Main_Temperature_Outside", "datetime": "Jan 20, 2021, 16:05:47", "sensor_value": 23.3)
According to "LSM 980 with Airyscan 2" specifications the temperature fluctuations should not be stronger as 0,5 degree per hour.

We decided to apply our knowledge in Machine Learning and Software Engineering to offer the
domain experts a tool to visualize the temperature data as well as detect and predict temperature fluctuations. "ELEGANCE" allows us to minimize the impact
of temperature fluctuations on the images quality.


## What it does
"ELEGANCE" uses the data from the sensors and creates a User Interface around it, where
a user can select which type of analysis or visualization he/she wants to perform. Our tool provides the
following analytics and visualization methods:
- "Interactive Visualization": visualizes the sensor's data, can filter by "Region", "Source ID",
"Sensor Name" and "Date".
- "Group Visualization": visualizes sensor's data of selected sensor type,
by one-click on "Source ID, Region" the data will be excluded from the graph,
by double-click on the selected "Source ID, Region" will stay the same.
- "Anomaly Detection": helps to find anomalies in the data. The method used for this task is Isolation Forests. Isolation forests, 
like Random Forests, one of the most widely used tree algorithms for machine learning, are generating several decision trees
which are based on the data sample fed to the al-gorithm. Once the decision trees are generated, the next task for the isolation forests
is to figure out the length of the path to ‘isolate’ an observation. The process of isolation of such observations is conducted by a random
feature selection and, subsequently, a random split value range selection between the selected feature’s minimum and maximum values [1]. 
Assuming there are n samples, the maximum depth of each tree in the forest is: ceil(log_2(n))

In our tool, it is possible to do anomaly detection by querying the arbitrary portion of the data. One can select:
	Source id, i.e. microscope,
	Sensor name,
	Contamination percentage, i.e. the expected ratio of the outlier data in the whole set
	Data range delta,
	Customized data period 
, and corresponding anomaly analysis for the desired is conducted on the go.

- "Time Series Analysis": performs a data prediction for selected devices. "Time Series Analysis": The tool used for this task is Fbprophet. 
In the core Fbprophet utilizes a an additive regression model that encompasses 4 main components in the data:
•	A piecewise linear or logistic growth curve trend.  The changes in trends are being detected automatically by selecting changepoints from the data.
•	A yearly seasonal component modeled using Fourier series.
•	A weekly seasonal component using dummy variables.
•	A user-provided list of important holidays.
These components are associated in the following way: y=a(t) + b(t) + c(t) + epsilon . Here a(t) represents the function of periodic changes, 
such as weekly seasonality, whereas b(t) represents the trend function, i.e. nonperiodic changes and  represents the holiday effects. 
Lastly,  the epsilon is the error term and represents the specific discrepancies that are not held by the model [2].
In our tool, it is possible to do anomaly detection by querying the arbitrary portion of the data. We exhibit how the data behave for the certain 
period of time, and provide the predictions as well as trends. One can select:
•	Source id, i.e. microscope,
•	Sensor name,
•	Number of days to predict
•	Customized data period 
, and corresponding time-series analysis for the desired query is conducted on the go. 
In Fbprophet, the model is being fit using the Stan. Stan performs the Maximum-a-Posteriori optimization for parameters extremely quickly (<1 second),
gives us the option to estimate parameter uncertainty using the Hamiltonian Monte Carlo algorithm.


Tool is hosted on the Azure Web App inside a Docker container. The data is also stored inside an
Azure Blob Data Lake Gen 2.


## How we built it
We used a wide range of Python Machine Learning and Data Visualization libraries, such as "plotly"
"dash", "pandas", "matplotlib", "dash_bootstrap_components", "sklearn". The tool is hosted on the Flask
server and then containerized inside a Docker container.
We follow Clean Code conventions to make code clean and understandable. The code was build inside a
GitHub repository. We created to branches: "main" and "develop". So we were able constantly to have
a working tool.
On the Azure we created Resource Group, Container Registry and Web App to save and deploy our Docker
containers. To make CI/CD automatic we created a GitHub action, which runs by push on "main", builds
a Docker container and moves it to the Azure Container Registry. On Web App the "Continuous Deployment"
is activated, so new container will be automatically deployed.


## Challenges we ran into
- Firstly, there are problems in understanding SM+SLM components interconnection.
A clear construction scheme would be helpful.
- Secondly, nobody of us had worked with Azure platform and gitHub actions before, for this reason
it took some hours until complete CI/CD flow worked properly.


## Accomplishments that we're proud of
- We created an app with great User Interface and a wide range of analytic methods.
- We created a great list of Hypotheses, which represent possible solution for temperature
fluctuation problems. These hypotheses can be converted to the solution methods. These can be
helpful for the ZEISS team.
- We went deep inside the literature to have a better understanding of challenge domain, to get
the interconnections between system (SM and SLM) components.
- We established an automatic CI/CD Pipeline.
- Despite a rush development we followed Versioning and Clean Code conventions.
- We did not have any problems with broken product.


## What we learned
- Azure Platform was absolutely new for all of us, during the HackTum we learned some basic of
deployment and Machine Learning tools.
- We learned how to do the GitHub actions and how to connect GitHub with Azure.
- We learned a lot of theory basics in the field of microscopes measurements.
- We learned a number of modern scientific approaches for the time-series data

## What's next for ELEGANCE
The next steps can be:
- a development of Azure ML workflow, which collects and analyzes data
in real time.
- check unproved hypotheses and develop solution methods according to them.


REFERENCES-

[1]. F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” 2008 Eighth IEEE International Conference on Data Mining, 2008.
[2]. S. J. Taylor and B. Letham, “Forecasting at scale,” PeerJ Preprints, 27-Sep-2017. [Online]. Available:https://doi.org/10.7287/peerj.preprints.3190v2.