Conditional Random Fields model for named entity recognition in Turkish news texts which is implemented in Python.
Sample input format (tab seperated) is described below:
Word | POS | Annotation |
---|---|---|
Tek | Adj | O |
çatı | Noun | O |
altında | Noun | O |
dokuz | Num | O |
ayrı | Adj | O |
salonda | Noun | O |
gerçekleştirilecek | Verb | O |
Şenlik | Noun | O |
kapsamında | Noun | O |
doksanın | Noun | O |
üzerinde | Noun | O |
etkinlik | Noun | O |
yer | Noun | O |
alacak | Verb | O |
You can also use the trained model ("crf_v2.joblib") to label your test dataset. The output of the model consists of "word - predicted annotation - pos" triple where each item is seperated with tab.
Sample output of the model is given below:
Word | Predicted_Annotation | POS |
---|---|---|
Istanbul | LOCATION | Noun |
yüzde | PERCENT | Noun |
2013 | DATE | Num |
Meclis ˙ | ORGANIZATION | Noun |
lira ˙ | MONEY | Noun |
simdi | TIME | Adv |
In order to evaluate the performance of the model, you can execute "CRF_Eval.java". It calculates CONLL F1-score, precision and recall for each annotation type using sequence alignment algorithm.
If you use this model in an academic publication, please refer to: https://ieeexplore.ieee.org/document/8806523