Skip to content

Latest commit

 

History

History
126 lines (88 loc) · 8.42 KB

README.md

File metadata and controls

126 lines (88 loc) · 8.42 KB

PatientFlow: Code and explanatory notebooks for predicting short-term hospital bed capacity using real-time data

pre-commit Tests status Linting status Documentation status License

Welcome to the PatientFlow repo, which is designed to support hospital bed management through predictive modelling. The repository shows methods for forecasting short-term bed capacity, a crucial aspect of hospital operations that impacts patient care and resource allocation.

Please note that you are looking at this repo prior to its first release. It is incomplete.

Objectives

  1. Develop code that was originally written for University College London Hospital (UCLH) into a reusable resource following the principles of Reproducible Analytical Pipelines
  2. Share the resource with analysts, bed managers and other interested parties in the NHS and other hospital systems
  3. Provide training materials to inform and educate anyone who wishes to adopt a similar approach

Main Features of our modelling approach

  • User led: This work is the result of close collaboration with operations directors and bed managers in the Coordination Centre, University College London Hospital (UCLH), over four years. What is modelled directly reflects how they work and what is most useful to them.
  • Focused on short-term predictions: We demonstrate the creation and evaluation of predictive models. The output from these models is a prediction of how many beds with be needed by patients within a short time horizon of (say) 8 hours. (Later we plan to add modules that also predict supply and net bed position over the same period.)
  • Assumes real-time data is available: Our focus is on how hospitals can make use of real-time data to make informed decisions on the ground. All the modelling here assumes that a hospital has some capacity to run models using real-time (or near to real-time) data in its electronic health record, even if this data is minimal.

Main Features of this repository

  • Reproducible - We follow the principles of Reproducible Analytical Pipelines, with the aim that the code can be easily adopted in other settings
  • Accessible - All the elements are based on simple techniques and methods in Health Data Science and Operational Research. The narrative in the notebooks is intended to be accessible to someone without any knowledge of programming; it should still be possible to follow the approach. We intend that anyone with some knowledge of Python could understand and adapt the code for their use.
  • Practical: A synthetic dataset, derived from real patient data, is included within this repo in the data-synthetic folder. This can be used to step through the modelling process if you want to run the notebooks yourself. So even if your hospital is not set up to do real-time prediction yet, you can still follow the same steps we took. (Note that, if you use the synthetic dataset, the integrity of relationships between variables is not maintained and you will obtain articifically inflated model performance.) UCLH have agreed we can release an anomymised version of real patient data, but not within the repo. To gain access to this, please contact Dr Zella King, contact details below.
  • Interactive: The repository includes an accompanying set of notebooks with code written on Python, with commentary. If you clone the repo into your own workspace and have an environment within which to run Jupyter notebooks, you will be able to interact with the code and see it running.

Getting started

  • Exploration: Start with the notebooks README to get an outline of the notebooks, and read the patientflow README to understand our intentions for the Python package
  • Installation: Follow the instructions below to set up the environment and install necessary dependencies in your own environment
  • Configuration: Repurpose config.yaml to configure the package to your own data and user requirements

About

This project was inspired by the py-pi template developed by Tom Monks, and is developed in collaboration with the Centre for Advanced Research Computing, University College London.

Project Team

Dr Zella King, Clinical Operational Research Unit (CORU), UCL ([email protected]) Jon Gillham, Institute of Health Informatics, UCL Professor Sonya Crowe, CORU Professor Martin Utley, CORU

Research Software Engineering Contact

Centre for Advanced Research Computing, University College London ([email protected])

Prerequisites

patientflow requires Python 3.10.

Installation

patientflow is not yet available on PyPI. To install the latest development version, clone it first (so that you have access to the synthetic data and the notebooks) and then install it.

git clone https://github.com/zmek/patientflow.git
cd patientflow
pip install -e ".[test]" #this will install the code in test mode

Navigate to the patientflow folder and run tests to confirm that the installation worked correctly. This command will only work from the root repository. (To date, this has only been tested on Linux and Mac OS machines. If you are running Windows, there may be errors we don't know about.)

pytest

If you get errors running the pytest command, there may be other installations needed on your local machine. (We have found copying the error messages into ChatGPT or Claude very helpful for diagnosing and troubleshooting these errors.)

Training models with data provided

The data provided (which is synthetic) can be used to demonstrate training the models. To run training you have two options

  • step through the notebooks (for this to work you'll either need copy the two csv files from data-syntheticinto your data-public folder or contact us for real patient data)
  • run a Python script using following commands (by default this will run with the synthetic data in its current location; you can change the data_folder_name parameter if you have the real data in data-public)
cd src
python -m patientflow.train --data_folder_name=data-synthetic --uclh=False

There are two arguments

  • data_folder_name - specifies where to find the data. This should be in a folder named data-xxx directly below the root of the repository
  • uclh - tells the package whether the data is the original UCLH data (in which case certain additional features available, including the patient's age in years) or not

Roadmap

  • Initial Research
  • Minimum viable product <-- You are Here
  • Alpha Release
  • Feature-Complete Release

Acknowledgements

This work was funded by a grant from the UCL Impact Funding. We are grateful to the Information Governance team and the Caldicott Guardian at UCLH for agreeing that we can release real patient data.