Remote Segmentation Interface (SegInt) Server (SegInt-R) is an on-premise auto-segmentation transaction server for research institutions with machine learning models. SegInt-R is designed to work in conjunction with Velocity, a universal bioinformatics solution that consolidates imaging information from virtually any imaging source. The server is operated locally in order for research teams to design and test their own machine learning models.
Back to Table of Contents.
A brief overview of the server architecture:
SegInt-R and Velocity communicate via endpoints specified in the Microsoft Radiomics Segmentation API as part of Project InnerEye. Messages are serialized using Google's protocol buffers.
Endpoints and associated data-processing are implemented in Django, using the default SQLite as a shared data layer. Once jobs are received, asynchronous segmentation jobs are scheduled using Celery, a real-time task scheduler. In this implementation, Celery workers are supported with a Redis backend as a message broker.
Back to Table of Contents.
SegInt-R works natively on Linux and was developed on Ubuntu 18.04 LTS. It can be installed directly after cloning from the repository or through the provided VMDK. This allows the user to build natively or run the service through virtualization, either on type-1 hypervisors such as Windows Hyper-V or type-2 hypervisors such as Virtualbox.
- Network - The server must be able to access the local network, and so must operate with a Bridged Adapter if virtualized.
- Disk Space - The server requires at least 15GB of disk space, not including models and jobs that must be stored locally.
- Memory - The server requires at least 2GB of memory to operate.
Back to Table of Contents.
The following are directions for installation within one's own Linux environment. Note: This section is written to be as beginner-friendly as possible.
First, ensure that Git is installed locally:
$ sudo apt update
$ sudo apt install git
You can confirm that Git has been installed correctly by running this command and receiving similar output:
$ git --version
git version 2.17.1
Once Git has been installed, clone the repository into the desired directory:
git clone <PATH TO REPOSITORY>
Navigate to the repository directory. Run the install script install.sh
:
bash install.sh
The install script will install the necessary python packages for the server within a virtual environment segint_venv
, as well as Redis. Required packages for the server are described in requirements.txt
, and packages for the segmentation layer are described in requirements_ml.txt
. The install script will also add aliases to your .bashrc
corresponding to run
and kill
scripts, respectively.
Back to Table of Contents.
For Windows users, SegInt-R is provided as an exported VM for usage in conjunction with Hyper-V, a native hypervisor in Windows. Once extracted from the archive, the VM can be easily installed using Windows' Hyper-V Manager application. Note: Hyper-V is a level 1 hypervisor, and thus is not recommended that other level 2 hypervisors (e.g. VirtualBox) are operating simultaneously.
Prior to importing the VM, it is necessary to create an external virtual switch. For this server, the switch can be simply named: "External Switch".
Once the switch has been created, import a virtual machine through the Action
dropdown context menu.
Locate the folder extracted from the VM archive.
Select the virtual machine ATS-Research
. You will need to decide if this VM will point to this extracted archive directory, or will create a new copy in a specified Hyper-V folder.
Once imported, running the server will require generic credentials:
username: atsr
password: atsr
It is highly recommended that the password is changed upon installation.
Back to Table of Contents.
While the virtual environment is automatically loaded upon server startup, sometimes it may be necessary to activate it manually in order to manage the server. In order to activate the virtual environment, navigate to main repository directory, and use the following command:
$ source segint_venv/bin/activate
To deactivate this virtual environment, use the following command anywhere:
$ deactivate
Back to Table of Contents.
Generally, an admin
superuser is included in the installation, with the following default credentials:
username: admin
password: segint00
If this is not the case, before server startup, it is highly recommended to create an administrative superuser. To do so, first navigate to the segint_research_django
sub-directory within the repository directory. Once there, create a superuser:
$ python3 manage.py createsuperuser
Or, if your virtual environment is active:
$ python manage.py createsuperuser
Enter your desired username and press enter.
Username: admin
You will then be prompted for your desired email address. You may leave this blank.
Email address: [email protected]
The final step is to enter a password. You will be asked for the password twice as confirmation.
Password: *********
Password (again): *********
Superuser created successfully.
Back to Table of Contents.
Prior to operation, it is necessary to determine the IPv4 address of the machine. This can be found with:
$ ip addr
You will be looking for full IPv4 address. You will not be using the localhost address.
Your Velocity installation will need to point towards this IP address in order to operate in conjunction.
Back to Table of Contents.
Once the server is up-and-running (See Server Startup), you may access the admin console via web browser from any computer on the local network. Use the credentials for the superuser (See Create Admin Superuser).
You will also need to know the IPv4 address of the server (See Determining Server IP Address). Access the console from a web browser at the following address:
http://<Server IPv4 Address>:8000/admin/
You should see the admin's login screen:
Once logged in, you will be able to access the admin panel, from where you may modify database entries:
Back to Table of Contents.
To run the server, navigate to repository directory, and input the following command.
$ segint_run
This will start the Django server, the Celery worker, and the Celery task scheduler.
Back to Table of Contents.
To stop all services related to SegInt-R, including the Django server and all associated Celery threads, input the following command in any console:
$ segint_kill
Note: This will kill all Django processes and all Celery processes on the machine!
Back to Table of Contents.
All trained machine learning models must be defined as part of a model family. For now, model family creation requires either:
- Manual upload within the database through the admin console
- Upload of fully-defined Protobuf ModelFamily message
See the roadmap for future plans regarding model creation and upload.
Back to Table of Contents.
If manually creating a model family and/or accompanying models and structures, access the Django admin console. Creating a new Model Family will display the following necessary fields:
Note here that the protobuf file field is optional. In fact, upon model family acquisition by Velocity, a protobuf message will be automatically generated and stored in the directory.
Once the model family has been specified and saved, go back to the admin console and create a new model version:
Ensure that the Model family
field points directly to the newly created (or existing) model family in the database.
The Model file
field designates a machine learning model file within the media/models/
directory. For instance, a Pytorch model file would be model.pt
, and would contain the weights, biases, etc. of the model. Uploading such a file will automatically make a copy within the correct subdirectory.
In addition, for libraries that require a separate class definition of the neural network architecture, the Model module
field should point to that .py
file. Again, this upload will automatically be placed within the media/models/
directory.
Model type
should descripted what type of library was used to trained the model, e.g. Tensorflow, Pytorch, etc. This will be enumerated in the future.
Lastly, once the model version has been saved, structures will need to be defined that correspond to what will be segmented by the model. Go back to the admin console again and create a new structure:
Similarly to the model version object, the structure here will have a Model version
field that needs to point to the newly created (or existing) model version database object.
Back to Table of Contents.
If uploading a fully-defined Protobuf model family definition, in the Django admin console, create a new model family. Instead of populating the fields, upload a serialized and saved Protobuf message, and save the model family database object.
When Velocity first acquires all model families from the server, the fields will auto-populate. However, one set of fields will not populate, as they are not contained with the Protobuf specification. After field auto-population, an admin will need to go to the Django console and manually point the model fields towards the correct files. This can be found in the model version for the newly updated family:
The Model file
field designates a machine learning model file within the media/models/
directory. For instance, a Pytorch model file would be model.pt
, and would contain the weights, biases, etc. of the model. Uploading such a file will automatically make a copy within the correct subdirectory.
In addition, for libraries that require a separate class definition of the neural network architecture, the Model module
field should point to that .py
file. Again, this upload will automatically be placed within the media/models/
directory.
Model type
should descripted what type of library was used to trained the model, e.g. Tensorflow, Pytorch, etc. This will be enumerated in the future.
Save the Model Version when done.
Back to Table of Contents.
This section will detail the architecture of the server platform for future extensibility.
The Django server application can be found within the directory segint_research_django
. The overall directory structure is listed below:
- segint_research_django
- media
- protobuf
- segint_api
- segint_research_django
The media
sub-directory contains on-disk storage for any database elements:
- segint_research_django
- media
- modelfamily
- models
- results
- segmentation
Briefly, the sub-directories within media
functions as on-disk storage for:
modelfamily
- Serialized Protobuf messages containing all information regarding a ModelFamily.models
- Any saved ML model files and associated class definitions.results
- Serialized Protobuf messages containing segmentation results.segmentation
- Serialized Protobuf messages containing a segmentation job posted by Velocity.
Back to Table of Contents.
The protobuf
app folder contains all Protobuf definitions for communication between Velocity and the server, as well as compiled Python definitions. Of particular note are:
Model.proto
- Definitions for SEGINT APIPrimitives3D.proto
- Primitives that support Model.proto
For more information regarding Protobuf and Python, see the tutorial.
Back to Table of Contents.
This Django application contains the bulk of the server code. The sub-directories within segint_api
is listed:
- segint_research_django
- segint_api
- admin.py
- apps.py
- models.py
- tasks.py
- tests.py
- urls.py
- views.py
This application follows a traditional Django application structure, with endpoint URLS defined in urls.py
, GET and POST tasks defined in views.py
, and SQL database models defined in models.py
.
Of particular note is that tasks.py
contains the asynchronous Celery segmentation code. In order to extend segmentation functionality to additional machine learning libraries, new segmentation functions will need to be defined there. (See Machine Learning Library Extensions below).
Back to Table of Contents.
This is the main Django server directory, and contains the following:
- segint_research_django
- segint_research_django
- asgi.py
- celery.py
- settings.py
- urls.py
- views.py
- wsgi.py
Of particular noteworthiness is celery.py
, which contains the celery program specification.
Back to Table of Contents.
A Django test suite is provided for testing of the server. Django's tests use the Python standard library module unittest
. All provided testcases classes subclass from django.test.TestCase
, which is itself a subclass of unittest.TestCase
.
Currently, integration tests are provided in segint_research_django/tests.py
. Unit tests may be written separately in any other Django application as tests.py
. The TestRunner will run any such file upon testing.
To run the tests, ensure that the virtual enviroment is activated (See Activating the Virtual Environment). In addition, the celery beat
scheduler won't be able to point to the temporary testing database, so asynchronous tasks will need to be run synchronously. To make this temporary change, you will need to change the CELERY_ALWAYS_EAGER
flag to True
in settings.py
. Ensure that you change this back after testing.
Then, within the main project directory, use the following command:
$ python manage.py test
If you wish to measure code coverage, first ensure that coverage.py
is installed in the virtual environment. Note: coverage isn't provided as part of the requirements for the server.
$ pip install coverage
Then, in the main project directory, you may run the tests:
$ coverage run --source='.' manage.py test
This runs your tests and collects coverage data of the executed files. You may see a report of the coverage data by using the following command:
$ coverage report -m
Back to Table of Contents.
SegInt-R receives scan data from Velocity in the form of a Protobuf message containing a list of input channels (See ModelInputChannel in Model.proto
). Each channel contains a calibrated three-dimensional volume Protobuf message (See CalibratedVolume3D in Primitives3D.proto
). Within the data processing layer, the volume message is recasted as a three-dimensional numpy.ndarray
of 32-bit signed integers.
channel_data - ndarray of shape (depth, height, width)
The values of the ndarray represent calibrated values (e.g. Hounsfield units). Generally speaking, depth
corresponds to axial length, while height
and width
are standardized at 512 voxel-units.
Segmentation models should be trained under these assumptions for input. Output should correspond to a similarly-shaped array of boolean value type where 0
represents background and 1
represents foreground.
Back to Table of Contents.
As of the current version, PyTorch is supported. In order to extend functionality to other machine learning libraries, the following files will need to be modified:
- segint_research_django
- segint_api
- models.py
- tasks.py
- views.py
tasks.py
contains asynchronous Celery tasks for segmentation jobs. Below is a prototypical segmentation task:
@task(name="start_pytorch_segmentation_single_structure")
def start_pytorch_segmentation_single_structure(model_id, job_id):
'''
Single structure pytorch segmentation.
Parameters:
model_id - str - Model ID to use for segmentation job
job_id - str - Segmentation job ID
Returns: None
'''
logger.info("\nStarting Pytorch with job_id {} and model_id {}".format(job_id, model_id))
# Find the job
try:
seg_job = SegmentationJob.objects.get(model_id=model_id, segmentation_id=job_id)
m_v = ModelVersion.objects.filter(model_version_id=model_id)[0]
structure = Structure.objects.filter(model_version=m_v)[0]
except:
logger.info("\nCan't find job!")
return
# Segmentation schema.
model_in = acquire_model_input(seg_job)
channels_data = parse_model_in(model_in)
segment_result = volumetric_pytorch_segment(m_v, channels_data)
model_out = construct_model_out(m_v, structure, segment_result)
save_to_disk(seg_job, model_out)
Let's go over the anatomy of such a function.
- The decorator
@task(name="start_pytorch_segmentation_single_structure")
marks this function as a Celery asynchronous task. The decorator name should match the function name. This will be called in yourviews.py
to start an asynchronous task. - Find the database objects that are needed for the segmentation job.
- Segment according to a specific schema. Helper functions are provided in
tasks.py
. All extensible segmentation tasks should follow this schema:- Acquire job model input
- Parse job model_input as channel data
- Model evaluation/segmentation upon channel data
- Construct model output using segmentation results
- Save to disk.
Of particular note is the "Model evaluation/segmentation" step. This will require custom code that evaluates the model input numpy array with the specific model. Note that this is different for each ML library.
Once the asynchronous segmentation task has been defined, views.py
will also require modification. Within the post_segmentation()
function, a state machine is defined for every possible model type. A new state will need to be added to the machine to call the new asynchronous function. To do so, invoke delay()
. An example using the above task is shown below.
start_pytorch_segmentation_single_structure.delay(model_id, seg_job.segmentation_id)
Lastly, in order to add your new ML library segmentation as an option within database, you will need to modify the ModelType enumeration within the ModelVersion
class in models.py
.
Back to Table of Contents.
- Model Creation and Upload Form
- Scikit-Learn extension
- Caffe2 extension
- XGBoost extension
- Multi-structure Segmentation
- Auth Tokens
Back to Table of Contents.
Python
Django
Celery
Redis
SQLite
Back to Table of Contents.
Michael Liang
[email protected]
Software Engineer II
Back to Table of Contents.