-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Units for experimental Young's modulus data #153
Comments
Then there are two independent steps to be done
|
As far as I see it we have four options:
Could you just chose one of them or mention another if I forgot an option? @eriktamsen @joergfunger Edit: And another question, the validation of the units, if they make sense and so on, is then part of the Shacl validation and not the extraction/mapping, right? |
not sure I understand 1/2/3, 4 is not optimal. From my point of view, the unit should always be in the metadata.json (in case this is existing in the raw data, then use that value, if it is not there, we will fix this within the metadata extraction script which is specific to a defined raw file with an implicit assumptions on what units have been used). Then we will map the units similar to all other metadata files in the KG templates. |
What I ment with raw units/ converted units is that the original script by Ilias contained a function to check f.e. lengths above 100 and assuming then that the unit is cm and converting it to m and so on. |
I think this automatic conversion creates more problems than it solves. I would rather have an error message (e.g. in the SHACL validation when certain assumptions (see above) are not fulfilled. |
So Aida and I just had a meeting with Mattheo and talked about the implementation of units in the template.
|
A predefinition is not optimal, because e.g. in the US, everyone is still using kpi (kilopond per square inch). If we force them to use a different unit, it just would not work. |
Whichever way the units will be integrated in the KG, we need to specify them in the extracted metadata. |
I guess ns3:Q174789 is mm in PMD units (are they using qudt, or reinventing PMD units?). Why is the placeholder http://... and not just Diameter_Unit? And as you suggested, there could by a python function (with a table, e.g. a dict) that maps the string in the json file (key, e.g. N/mm2 or MPa) to the ns3:Q17478 (value in the dict). In order to know, that this function has to be used to transform the string, I suggested to either use different placeholder markers, e.g. # or have a "labeled" placeholder (see above). |
I think they are reinventing PMD units, however as I looked into the QUDT maybe we should use this Ontology for the units since it has far more units than the current PMD? |
In addition the QUDT already has a "dictionary" for the units: |
@mattheokru Why is that different now for the placeholders for the units compared to the other strings. The others also do not have the https ...? |
I started working now on the conversion script, so it can be integrated into the mapping script.
So I think this would be helpful. |
Look for GM: |
Yes, it makes sense to give that a unit, since the unit of the raw data file (columns) has to be stored somewhere. Theoretically, they could also be added as metadata to the file, but I think adding it to the graph here makes more sense. |
DateTimeStamp is probably the starting point, and ExperimentTime is the duration of the experiment (the offset). In the optimal setting, this information would be stored in the description of the classes in the ontology, such that questions like that are self-explaining. @StephanPirskawetz is that included? Both should have units, the first one including a complete date , the second one only a duration). As for the unit of duration, @StephanPirskawetz will have to answer that, but I think s is correct. |
ExperimentDate and -Time look like this:
@StephanPirskawetz what is the unit for CompressionForce and for Transducer? |
@alFrie: Ich hoffe ich verstehe deine Frage an Stephan richtig. |
@alFrie you can also check the units in ExampleDataCPTOMapping.xlsx from Mattheo, the one i sent you. |
@alFrie The Experiment Date is the date and time the experiment took place (that's why this is a date time stamp), the Experiment Time is suppose the be the duration of the test (hence the unit seconds). |
The desciption of the Classes/Individuals unfortunatly gets removed if you export or work with them in Ontopanel/drawio. You would have to import the Base ontologies into Ontopanel entitymanager to see the descriptions |
@mattheokru Okay, this is not what the metadata extraction script is doing. (This part of the code was not written by me!) It selects the following values: Yellow Box = Measurement Duration And that was the reason why I am confused over the unit of the ExperimentDate/Time - it both is a datetimestamp. @joergfunger / @eriktamsen Which one is now the real "MeasurementDuration"? The yellow box in my image ("Zeit") or the column in Mattheo's image ("Laufzeit")?. This column would be part of the data, not the metadata, if I'm correct. |
@joergfunger And what is a unit of a date? You want something like "dd.MM.yyyy hh:mm:ss"? (The "date" has no unit placeholder by the way, only the "time" has one). Also shouldn't we just keep date and time as one thing, as datetime, instead of saving it into two different instances? |
@eriktamsen Dann brauchen wir dafür aber extra Platzhalter, denn diese Units haben doch nichts mit den anderen Metadaten, sondern mit den Messdaten zu tun? Was für Platzhalter sollen das dann sein? |
Diese MTS-Formate und Zeiten sind etwas eigen. Rot umrandet sind Datum und Zeit, zu der der Programmteil "Bediener Information gestartet wurde. Zeit davor ist die Zeit, die vergangen ist, seitdem das ganze Programm gestartet wurde. Nach gelb umrandet: Start des Programmteils Belastungsfunktion. Das ist die wichtige Zeit und die sollte in die Metadaten übernommen werden. Alles anderen Zeiten sind unwichtig, auch die rot und die gelb umrandete (wieder die Zeit, die seit dem Programmstart vergangen ist). |
Danke @StephanPirskawetz !
So hatte ich das auch verstanden. Und wie gesagt gibt es dafür ja auch gar keine Platzhalter. Deshalb würde ich die Daten in der blauen Box / Zeile 15 nicht extrahieren.
Also die im neuen Bild grün umrandete Box ist nun die wichtige, ja? Willst du die als eine Variable (also Datum und Uhrzeit gemeinsam) oder getrennt als "ExperimentTime" und "ExperimentDate" wie bisher abgespeichert haben? |
Ja, grün ist wichtig! Ob gemeinsam oder getrennt weiß ich nicht. Es gibt als Datentypen allerdings nur "dateTime". Wenn man mit dateTime rechnen kann, was ich vermute, dann ist das besser. Wenn auch im Mix KG so ein dateTime vorkommt, kann man die vermutlich einfach voneinander abziehen und bekommt so das Probenalter zur Zeit der Prüfung. Das wäre praktisch. Ist mit getrennten Variablen wahrscheinlich schwieriger. |
P.S. |
Okay, danke @StephanPirskawetz ! @mattheokru that would mean we need to
|
@StephanPirskawetz Der DateTime-stamp aus dem der Rezeptur enthält nur ein Datum. Das sollte doch dennoch helfen, oder? |
Ja, wir können die Zeit einfach auf 12:00 Uhr setzen. Falls es wichtig ist (Prüfungen früher als nach 28 Tagen), muss die Zeit der Wasserzugabe mit angegeben werden. |
Quoting a message of myself #118 (comment):
See you end of May! |
Also die grüne Box ist für uns auch wichtig.
@StephanPirskawetz: Meiner Meinung nach ist das so nicht richtig. Für unsere Simulation müssen wir wissen mit welchen Einheiten wir arbeiten. Selbst wenn du sagt, dass die BAM Daten alle die gleichen Einheiten haben, dann sollte das aber trotzdem irgendwo maschinenlesbar hinterlegt werden. Warum sollte das nicht einfach teil der Metadaten sein? |
@mattheokru Hast du das hier gesehen und umgesetzt? Wenn ja, kann ich leider die geupdateten KGs nirgends finden.
|
@alFrie Ja sollte da sein. Da das alles etwas chaos war habe ich einen Branch erstellt für die KG's. So ist alles an einem Platz und nicht in verschiedenen Branches/Pull Requests verteilt. Das macht es einfacher den Überblick zu behalten. Der Branch ist hier: https://github.com/BAMresearch/LebeDigital/tree/Knowledge_Graphs_Update |
While working on the calibration PR, I had a look at the extracted experimental data for the Young's modulus test.
The units are part of the name of the columns in the CSV file. This is not ideal, or usable.
As I have lost the overview of what is happening in which branch and issue, I am not sure if this has already been adressed.
As far as I can remember, all relevant fields in the ontology have units linked.
I assume therefore the data extraction script needs to be improved, to solve this, so that the units are availiable in a machine readble format. I would suggest to add them to the metadata yaml (or json if that has been changed).
It is possible, that the units dont change as our data extraction is only for one specific machine. If this is the case, this would still require the information to be availiable. Meaning, e.g. just hard coding this data in the meta data file.
@AidaZt, @ThiloMuth, @StephanPirskawetz: Please check if any change is required and if so, make a plan of how to best implement this.
The text was updated successfully, but these errors were encountered: