DISCUSSION: Add column(s) to vocabularies table to identify vocabularies that came from a specific database #112

marc-outins · 2018-06-06T22:13:22Z

Since we are going to adding database specific vocabularies I think we should store what database they came from in the vocabularies table.

aguynamedryan · 2018-06-06T22:15:43Z

So, say we have two SEER databases, ALL and CLL, which share some of the same vocabularies, what do we put in this column?

marc-outins · 2018-06-06T22:19:18Z

SEER since ALL, CLL, etc. is just a cut of the whole SEER Medicare database. One question is do we distinguish between SEER and SEER Medicare. Currently vocabs that come from the SEER medicare data I'm giving ids of SEER_ .

aguynamedryan · 2018-06-06T22:20:51Z

So is this column storing the data vendor's name? Or the dataset name?

marc-outins · 2018-06-06T22:25:36Z

whatever we want to call it, definitely didn't mean we would have different vocabularies for each version of a database if said vocabulary is the same across databases. But it would be nice to know the source of the vocabulary.

markdanese · 2018-06-06T22:42:16Z

I think we need to simply name the vocabulary properly and have a description or source field. SEER is confusing because some things come from SEER, some are NAACCR, some are adapted from NAACCR, and some are adapted from other sources (e.g., AJCC). In other words, just because a vocabulary is in SEER doesn't mean it comes from SEER. I think this would be better called "source", which may, or may not, relate to the database precisely. I think source could be more of a description to say something like "NAACCR grade adapted by SEER version 2".

markdanese · 2018-06-06T22:44:55Z

A related example would be something like CMS place of service codes which may be in Medicare or other databases. But the source description is "CMS place of service". In other words, I think we should say what it is, and where it comes from in a way to identify it clearly.

marc-outins · 2018-06-06T22:50:04Z

In the CMS place of service situation the vocabularies table would contain a vocabulary for CMS place of service and the "Source" column would be CMS and we are doing that with any vocab use in SEER that is defined somewhere else. I'm talking about vocabularies that are defined by the organization who cut the data. So for SEER Medicare there is a variable marst1-10 which contain marital status as defined by SEER. So I create a vocab called SEER_MARST.

markdanese · 2018-06-06T23:14:23Z

In that case, I would call the source SEER (or NCI, but I prefer SEER since NCI might have more than 1 version). Having the vocab name include the source, when relevant, is a good reminder. Although I wonder if we should be more precise about this.

Consider a variable for Sex, which will be in every database and defined differently in many, but not all. Do we name them differently for each datasource? In other words do we give them more generic names like "Sex_M_F" and "Sex_0_1" and "Sex_Male_Female"? Or do we call them all "Sex" and then use a source field to distinguish them? Or do we need a "type" field to categorize them?

I don't think there is a perfect answer here. I think we need something that can be clearly implemented and searched. So, I guess I am leaning toward something like "Vocabulary Name" (SEER_MARST or CMS_SEX), Source (SEER or CMS), Type (Marital Status or Sex), and "Description" (SEER marital status variable or CMS sex variable).

Obviously in these examples, there is redundancy, but in other situations it will be helpful. I am thinking of Vocabulary = "SEER_grade", Source = "SEER", Type = "Cancer grade", and Description = "NAACCR grade adapted by SEER".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISCUSSION: Add column(s) to vocabularies table to identify vocabularies that came from a specific database #112

DISCUSSION: Add column(s) to vocabularies table to identify vocabularies that came from a specific database #112

marc-outins commented Jun 6, 2018

aguynamedryan commented Jun 6, 2018

marc-outins commented Jun 6, 2018

aguynamedryan commented Jun 6, 2018

marc-outins commented Jun 6, 2018

markdanese commented Jun 6, 2018 •

edited

Loading

markdanese commented Jun 6, 2018

marc-outins commented Jun 6, 2018

markdanese commented Jun 6, 2018 •

edited

Loading

DISCUSSION: Add column(s) to vocabularies table to identify vocabularies that came from a specific database #112

DISCUSSION: Add column(s) to vocabularies table to identify vocabularies that came from a specific database #112

Comments

marc-outins commented Jun 6, 2018

aguynamedryan commented Jun 6, 2018

marc-outins commented Jun 6, 2018

aguynamedryan commented Jun 6, 2018

marc-outins commented Jun 6, 2018

markdanese commented Jun 6, 2018 • edited Loading

markdanese commented Jun 6, 2018

marc-outins commented Jun 6, 2018

markdanese commented Jun 6, 2018 • edited Loading

markdanese commented Jun 6, 2018 •

edited

Loading

markdanese commented Jun 6, 2018 •

edited

Loading