Move your Tracker / Event / Agg data in time by X days
CREATE FUNCTION UpdateDataTRKandEVTDate (number_days integer) RETURNS boolean
LANGUAGE plpgsql
AS
$$
DECLARE
TABLE_RECORD RECORD;
BEGIN
-- Only required for Tracker: Update enrollment date
UPDATE programinstance
SET enrollmentdate = enrollmentdate::TIMESTAMP + (number_days::TEXT || ' DAY')::INTERVAL,
lastupdated = NOW();
-- Only required for Tracker: Update TEA values for enrollments
UPDATE trackedentityattributevalue teav
SET value = DATE(value::TIMESTAMP + (number_days::TEXT || ' DAY')::INTERVAL)
FROM trackedentityattribute tea
WHERE tea.trackedentityattributeid = teav.trackedentityattributeid AND
tea.valuetype = 'DATE';
-- Update event dates
UPDATE programstageinstance
SET executiondate = executiondate::TIMESTAMP + (number_days::TEXT || ' DAY')::INTERVAL,
duedate = duedate::TIMESTAMP + (number_days::TEXT || ' DAY')::INTERVAL,
completeddate = completeddate::TIMESTAMP + (number_days::TEXT || ' DAY')::INTERVAL,
lastupdated = NOW();
-- Update DE values for events
FOR TABLE_RECORD IN
select psi.programstageinstanceid,
('{'||js.key||',value}')::TEXT[] as path,
('"'||TEXT(DATE(to_timestamp(replace(js_value.value::TEXT, '"', ''), 'YYYY-MM-DD') + (number_days::TEXT || ' DAY')::INTERVAL))||'"')::jsonb as value
from programstageinstance psi, jsonb_each(eventdatavalues::jsonb) as js, jsonb_each(js.value) as js_value
where js.key IN (SELECT uid FROM dataelement WHERE valuetype = 'DATE') AND js_value.key = 'value'
LOOP
UPDATE programstageinstance psi
SET eventdatavalues = jsonb_set(eventdatavalues, TABLE_RECORD.path, TABLE_RECORD.value)
WHERE psi.programstageinstanceid = TABLE_RECORD.programstageinstanceid;
END LOOP;
RETURN TRUE;
END
$$;
This function shiftes all dates of enrollment and event data by number_days in the future
Execution example:
SELECT UpdateDataTRKandEVTDate(1);
Some key things here:
- This function changes the dates for enrollment and, for each event, the event date, due date and completed date. lastUpdated is updated accordingly to reflect this update.
- For trackedEntityAttributes and dataElement values, it updates all the data stored for a metadata of valueType = DATE. Nothing is done for the moment for DATETIME, but changing valuetype = 'DATE' to valuetype IN ('DATE', 'DATETIME') as well as the casting to DATE should be a goos starting point.
- eventdatavalues as stored as a jsonb object. For jsonb, there is a function called jsonb_set which receives teh following parameters: target jsonb, path text[], new_value jsonb The purpose of this query
select psi.programstageinstanceid,
('{'||js.key||',value}')::TEXT[] as path,
('"'||TEXT(DATE(to_timestamp(replace(js_value.value::TEXT, '"', ''), 'YYYY-MM-DD') + (number_days::TEXT || ' DAY')::INTERVAL))||'"')::jsonb as value
from programstageinstance psi, jsonb_each(eventdatavalues::jsonb) as js, jsonb_each(js.value) as js_value
where js.key IN (SELECT uid FROM dataelement WHERE valuetype = 'DATE') AND js_value.key = 'value'
is to produce effectively the parameters required to update each dataelementvalue of type DATE for every programstageinstance. The target will simply be eventdatavalues jsonb object. The path will look something like this: '{DE_UID,value}', i.e. "I want to update the value for key = DE_UID" The new_value is the current DATE value for this DE + (number_days::TEXT || ' DAY')::INTERVAL
After the table with all updates is built, an update for each row runs performing the required updates.
Please be aware that for a size of 100K events in the DB, this query may take 5mins.
Also, remember to run analytics to see the changes in your dashboards :)
This function receives the period_type to consider and it shiftes the data in datavalues table (aggregate data) to the next period available.
Execution example:
SELECT UpdateDataAGGDate('Daily');
CREATE FUNCTION UpdateDataAGGDate (period_type text) RETURNS boolean
LANGUAGE plpgsql
AS
$$
DECLARE
BEGIN
WITH tbl_shift_date as
(
SELECT
dataelementid,
sourceid,
categoryoptioncomboid,
attributeoptioncomboid,
value,
dv.periodid as old_periodid,
pt.name as frequency,
pp.periodtypeid, pp.startdate,
CASE WHEN pt.name = 'Daily' THEN DATE(pp.startdate + INTERVAL '1 DAY')
WHEN pt.name = 'Weekly' THEN DATE(pp.startdate + INTERVAL '7 DAY')
WHEN pt.name = 'Monthly' THEN DATE(date_trunc('month', pp.startdate + INTERVAL '1 MONTH'))
WHEN pt.name = 'Quarterly' THEN DATE(date_trunc('quarter', pp.startdate + INTERVAL '3 MONTH'))
WHEN pt.name = 'Yearly' THEN DATE(date_trunc('year', pp.startdate + INTERVAL '1 YEAR'))
END as new_date
FROM datavalue dv
LEFT JOIN period pp on dv.periodid = pp.periodid
LEFT JOIN periodtype pt on pp.periodtypeid = pt.periodtypeid
ORDER BY pp.startdate DESC
)
UPDATE datavalue dv
SET periodid = new_period.periodid
FROM tbl_shift_date
INNER JOIN period new_period on tbl_shift_date.periodtypeid = new_period.periodtypeid AND new_period.startdate = tbl_shift_date.new_date
WHERE
dv.dataelementid = tbl_shift_date.dataelementid AND
dv.sourceid = tbl_shift_date.sourceid AND
dv.categoryoptioncomboid = tbl_shift_date.categoryoptioncomboid AND
dv.attributeoptioncomboid = tbl_shift_date.attributeoptioncomboid AND
dv.value = tbl_shift_date.value AND -- Not part of the unique key
dv.periodid = tbl_shift_date.old_periodid
INNER JOIN period new_period on tbl_shift_date.periodtypeid = new_period.periodtypeid AND new_period.startdate = tbl_shift_date.new_date
WHERE
tbl_shift_date.frequency = period_type;
RETURN TRUE;
END
$$;
Some key things here:
- For the moment Daily, Weekly, Monthly, Quarterly and Yearly are supported. If you want to add support for an aditional period type, you simply need to add a new WHEN statement in the CASE, e.g.:
WHEN pp.name = 'WeeklyThursday' THEN ...
- In order to be able to change the current period to the next one, the table period must have that period already created. Let's say you run a select for period type Monthly on the table and this is what you get:
periodid | startdate | enddate
----------+------------+------------
78204 | 2021-01-01 | 2021-01-31
78205 | 2021-02-01 | 2021-02-28
78206 | 2021-03-01 | 2021-03-31
78207 | 2021-04-01 | 2021-04-30
78208 | 2021-05-01 | 2021-05-31
80342 | 2020-01-01 | 2020-01-31
80343 | 2020-02-01 | 2020-02-29
80344 | 2020-03-01 | 2020-03-31
80345 | 2020-04-01 | 2020-04-30
80346 | 2020-05-01 | 2020-05-31
80347 | 2020-06-01 | 2020-06-30
80348 | 2020-07-01 | 2020-07-31
80349 | 2020-08-01 | 2020-08-31
80350 | 2020-09-01 | 2020-09-30
80351 | 2020-10-01 | 2020-10-31
80352 | 2020-11-01 | 2020-11-30
80353 | 2020-12-01 | 2020-12-31
As you can see, the last period available is May 2021 (2021-05-01 to 2021-05-31). If you attempt shifting the data to the next month by calling UpdateDataAGGDate('Monthly'), the data for May will fail to be updated. In order to create those entries, you need to either enter some data in a monthly dataSet for the periods you are missing, or import that data.
To simplify this, there is a python script in this folder called: populate_period_table.py
You will need to make sure you have the package dhis2 installed:
pip install dhis2
Create auth.json file containing the credentials of the default server to use. The script relies on a username 'robot' with SuperUser role to have an account in the server.
{
"dhis": {
"baseurl": "https://who-dev.dhis2.org/tracker_dev",
"username": "robot",
"password": "TOPSECRET"
}
}
Call the script as follows:
python populate_period_table.py period_type start_date end_date instance_url
period_type can be Daily, Weekly, Monthly, Quarterly OR Yearly
start_date in the format YYYY-MM-DD
end_date in the format YYYY-MM-DD
instance_url if you want to run it in an instance different from the one specified in auth.json (user credentials are the ones given in auth.json)
For example:
python populate_period_table.py Monthly 2021-01-01 2021-12-31 https://who-demos.dhis2.org/covid-19
When we run the same SELECT on the period table, we can see that the missing monthly periods for 2021 have been created:
periodid | startdate | enddate
----------+------------+------------
109465 | 2021-06-01 | 2021-06-30
109466 | 2021-07-01 | 2021-07-31
109467 | 2021-08-01 | 2021-08-31
109468 | 2021-09-01 | 2021-09-30
109469 | 2021-10-01 | 2021-10-31
109470 | 2021-11-01 | 2021-11-30
109471 | 2021-12-01 | 2021-12-31
78204 | 2021-01-01 | 2021-01-31
78205 | 2021-02-01 | 2021-02-28
78206 | 2021-03-01 | 2021-03-31
78207 | 2021-04-01 | 2021-04-30
78208 | 2021-05-01 | 2021-05-31
80342 | 2020-01-01 | 2020-01-31
80343 | 2020-02-01 | 2020-02-29
80344 | 2020-03-01 | 2020-03-31
80345 | 2020-04-01 | 2020-04-30
80346 | 2020-05-01 | 2020-05-31
80347 | 2020-06-01 | 2020-06-30
80348 | 2020-07-01 | 2020-07-31
80349 | 2020-08-01 | 2020-08-31
80350 | 2020-09-01 | 2020-09-30
80351 | 2020-10-01 | 2020-10-31
80352 | 2020-11-01 | 2020-11-30
80353 | 2020-12-01 | 2020-12-31
An example below on how to make sure your data is updated when it corresponds by setting up a cronjob. We assume that you have data already in your system up to today's date. If that is not the case, you need to invoke the aforementioned functions to move it so it covers any date in the past until today.
First thing is to edit your crontab for user postgres, which is the one who has access to the DB. You need to run the following as superuser:
sudo crontab -e -u postgres
Add the following lines:
0 0 * * * psql -d covid-19 -a -f /var/lib/postgresql/UpdateDataAGGDate.sql 2>&1 >/dev/null | ts >> ~/shift_dates.log
0 0 * * * psql -d covid-19 -a -f /var/lib/postgresql/UpdateDataTRKandEVTDate.sql 2>&1 >/dev/null | ts >> ~/shift_dates.log
1 0 * * * psql -d covid-19 -c "SELECT UpdateDataTRKandEVTDate(1);" 2>&1 >/dev/null | ts >> ~/shift_dates.log
15 0 * * * psql -d covid-19 -c "SELECT UpdateDataAGGDate('Daily');" 2>&1 >/dev/null | ts >> ~/shift_dates.log
20 0 * * MON psql -d covid-19 -c "SELECT UpdateDataAGGDate('Weekly');" 2>&1 >/dev/null | ts >> ~/shift_dates.log
25 0 1 * * psql -d covid-19 -c "SELECT UpdateDataAGGDate('Monthly');" 2>&1 >/dev/null | ts >> ~/shift_dates.log
30 0 1 */3 * psql -d covid-19 -c "SELECT UpdateDataAGGDate('Quarterly');" 2>&1 >/dev/null | ts >> ~/shift_dates.log
The first two jobs make sure the functions which we are going to need are installed. It will run exactly at midnight everyday.
The next line, will run one minute later everyday, and it shiftes tracker/event data by 1 day
The last 4 lines take care of the aggregate data.
- First job will move daily data to the next day everyday at 0.15. Since the job to shift tracker/event data is quite demanding, it is better to give it a reasonable time to finish before shifting aggregate data.
- Next one runs every Monday, and it shiftes weekly data to the next period (next week)
- The following one runs every 1st day of each month, and it shiftes monthly data to the next period (next month)
- Then one job which runs at the first day of every quarter to move quarterly data to the next quarter, and so on...
Once your crontab is modified, please save and restart the service to make sure your changes take effect
sudo systemctl restart cron
Note: I have used ts to add a timestamp to the log. It can be installed by running "sudo apt install moreutils", but using it is optional. 2>&1 >/dev/null makes sure we only get the errors in the log, shift_dates.log which is saved in the home folder of user postgres