This project focuses on predicting and differentiating between normal and malicious network traffic within healthcare environments. The tool leverages machine learning models to classify network activities and provide actionable insights for healthcare security and privacy.
The project is organized as follows:
├── app.py <!-- Main Streamlit file for the user interface -->
├── BaseModel_model_Input.py <!-- Defines Pydantic model for type-safe input -->
├── BaseModel_predList.py <!-- Intermediary Pydantic model for type-safe predictions -->
├── binaryClassifierModel.py <!-- Contains the binary classification model used in the project -->
├── main.py <!-- API file for FastAPI -->
├── models
│ └── model_2.pth <!-- Trained PyTorch model file -->
├── requirements.txt <!-- Lists project dependencies -->
└── scalers
├── frequency_map.pkl <!-- Frequency map for categorical features (used for encoding) -->
├── pca.pkl <!-- Trained PCA model for dimensionality reduction -->
└── scaler.pkl <!-- Trained scaler for feature scaling -->
└── datasets
├── labelled_22000.csv
└── unlabelled_22000.csv
We use the IoT Healthcare Security Dataset, which contains network traffic data from IoT medical devices. It includes:
- Normal Traffic (0): Routine operations, like accessing patient records or updating inventory.
- Malicious Traffic (1): Suspicious activities, such as unauthorized access or malware injection.
The dataset features TCP and MQTT protocol fields, timestamps, and frequency-based metrics, and about 50 features enabling detailed analysis and model training.
Our goal is to analyze, preprocess, and visualize this dataset to identify patterns and anomalies. Using machine learning techniques, specifically Artificial Neural Networks (ANN), we aim to classify network traffic as either normal or malicious, contributing to enhanced cybersecurity for IoT medical devices. Markdown
Metric | Value |
---|---|
Precision (Non-Attack) | 0.9768 |
Recall (Non-Attack) | 0.7935 |
F1-Score (Non-Attack) | 0.8757 |
Support (Non-Attack) | 108568.0 |
Precision (Attack) | 0.7769 |
Recall (Attack) | 0.9745 |
F1-Score (Attack) | 0.8646 |
Support (Attack) | 80126.0 |
Metric | Value |
---|---|
Specificity | 0.7935 |
False Positive Rate | 0.2065 |
False Negative Rate | 0.0255 |
For more in-depth information, including the Confusion Matrix and AUC-ROC analysis, please refer to the PDF provided in the project directory.
Follow these steps to set up and run the project:
-
Set Up the Environment
Ensure your device has Conda installed for easy environment management (Python environments work too).conda create --name health_sec_env python=3.10 -y conda activate health_sec_env
-
Install Dependencies Install the required packages using requirements.txt:
pip install -r requirements.txt
-
Run the FastAPI Server
uvicorn main:app --reload
-
Run the Streamlit UI
streamlit run app.py
-
Datasets Provided
In the Datasets folder, you will find:
- A labelled dataset with 22,000 datapoints.
- An unlabelled dataset with same 22,000 datapoints.
-
Perform Inference
In the running Streamlit app:
- Upload the CSV file in the designated upload section.
- Run the inference to get predictions.
- The model effectively detects TCP-based attacks in healthcare networks, achieving a high AUC score of >=0.90, indicating strong performance in distinguishing between normal and malicious traffic.
- With a sensitivity of 0.97, it correctly identifies 97% of actual attacks, minimizing false negatives.
- Data preprocessing using PCA enhanced the model by reducing dimensionality and improving efficiency.
- Data visualization provided key insights into patterns, aiding feature selection and model development.
This tool shows great potential for strengthening healthcare security against cyberattacks.