-
Notifications
You must be signed in to change notification settings - Fork 211
Morphology Notes
- is "balkabakçık" correct?
0.14.0: Correct. Analysis = [balkabağı:Noun] balkabak:Noun+A3sg|çık:Dim→Noun+A3sg
- "balkabaklılar" -> current mechanism will fail. After derivation, some root attributes should be removed. Check this after implementing Noun-->lI--->Adj work.
0.14.0: Correct. Two analyses. [balkabağı:Noun] balkabak:Noun+A3sg|lı:With→Adj|Zero→Verb+Pres+lar:A3pl [balkabağı:Noun] balkabak:Noun+A3sg|lı:With→Adj|Zero→Noun+lar:A3pl
- imlicitPlural and implicitDative etc. attributes should be added to master dictionary items.
0.14.0: This is mostly done. However, allowing surface plural suffixes for imlicitPlural words may be a good idea.
-
Consider adding Conditions.containsAny() and Conditions.containsNone() for a group of conditions.
-
Instead of using "su" in Morphotactics, A "RootGroup" identifier can be used. Same exceptions may apply to "akarsu" as well. Or, groups can be Sets of Dictionary objects. So that Conditions.containsAny etc can be used.
-
Consider making "-cağız" a separate morpheme. Not Dim.
-
Should we remove words that exist in TDK with
-li
suffix? "zehirli, sulu" etc. They create ambiguity. -
We may not need
head
in SearchPath -
Noun->Zero Verb derivation correct and false positive tests -> elmalarımdı elmamdım elmandım elmamdır. how about implicitplural derivation etc?
-
"JustLike" does not sound right. "odun-su". Resemblance?
-
"elmamımdır", "elmalarlar" seems incorrect but Oflazer accepts.
-
"odun-lar-dır" vs "odun-dur-lar" Oflazer gives same analysis for both. (elma+Noun+A3sg+Pnon+Nom^DB+Verb+Zero+Pres+A3pl+Cop) Should we allow "odun-dur-lar"?
-
"elmalar" now also has a Verb result. Such as
elma:Noun + A3sg + Pnon + Nom + Zero + Verb + Pres + lar:A3pl
Example: "Bunlar o elmalar.". However this actually generates a lot of ambiguity. Should we disallow?
-
should we accept bot "yeşilimsi" and "yeşilsi" (currently we do)
-
oflazer does not have
onlar
as a pronoun. it useso
and A3pl -
Oflzer accepts
bunla = [(bu:bu) (Pron,Demons;A3sg+Pnon+Inst:nla)] bununla = [(bu:bu) (Pron,Demons;A3sg+Pnon+Inst:nunla)]
should there be a bunla
?
-
Oflazer parses
bunlu
andbunsuz
as separate adjectives. -
herkes is implicitly plural. Perhaps we should also accept
herkesler
as same. -
Should RootAttrubute and Root contitions also check if there is no derivation?
-
Oflazer does not offer A3pl+Pnon analysis for herkesler and öbürküler. We allow it. is "öbürkülerim" ok?
-
Oflazer has pron+Quant and Noun forms for "kimse". kimseler only accepted for Noun or Pron->Verb+A3pl. Here I decided to connect "kimse" pronoun to Name-A3sg and Name-A3pl states. This way all name suffixes will apply but it will be a pronoun.
-
should we parse
nemiz, neniz (neyimiz, neyiniz)
-
kimlerimde
etc. sounds bad. Should we restrict? -
Adjective "öbür" should not get any suffixes.
-
Added "falan, falanca" as personal pronoun. Oflazer does not.
-
Added "berili" as qunatitive pronoun. Oflazer does not.
-
Sak, uses "-leri" surface as A3pl+P3pl in "kendileri". Perhaps we should use that. Now we have "ler-i" A3pl+P3pl
I make changes for Reflexive and some Quantitive pronouns but not all are covered.
-
We do not allow zero->Verb+..+lar for pronouns. Oflazer does.
-
Oflazer does not anayze elmaymışımdır. we do.
-
Oflazer does not analyze elmaymışsa. we do
-
akarsu also behaves like su but maybe both
akarsum
andakarsuyum
are accepted. -
Oflazer uses Prog1 (Progressive 1) for -iyor and Prog2 fo
-makta
should they be combined? Combining may create ambiguity for generation. Perhaps generation should offer multiple solutions anyway. -
Oflzer uses
Pos
for Positive in verbs. I decided not to use it for now to reduce clutter. Perhaps we should get rid of Pnon as well. -
Some verbs can have "-ir" suffix to make it Causative. Piş-ir, kaç-ır, şiş-ir, uç-ur. There are two ways of handling those.
- Add it as a separate verb without tagging implicit Causative.
- like Oflazer, make it causative. [pişirdi piş+Verb^DB+Verb+Caus+Pos+Past+A3sg]
-
Both "okuyun" and "okuyunuz" have Imp+P2pl analysis. Bu semantically they are a little different Second one is "kind". Should we introduce another morpheme for it?
-
okuyamayabilir oflazer offers two Able derivation here. Able->Neg->Able. Perhaps second Able can be considered as "probability" but surely this will create ambiguity for regular "Able", like "okuyabilir"
-
"yazıyormuşsak" should we allow? Oflazer does not.
-
Should we use different morphemes for Past and Narr after tenses (Hikaye, Rivayet)?
-
A3pl causes 4 analysis in Oflazer system:
"evleri"
ev+Noun+A3pl+Pnon+Acc ev+Noun+A3pl+P3sg+Nom ev+Noun+A3sg+P3pl+Nom ev+Noun+A3pl+P3pl+Nom
Should we keep all? We currently do.
-
A3pl generates 4 analysis for compounds as well. But they are a little different
"zeytinyağları" zeytinyağı+Noun+A3sg+P3pl+Nom zeytinyağı+Noun+A3pl+P3sg+Nom zeytinyağı+Noun+A3pl+P3pl+Nom zeytinyağı+Noun+A3pl+Pnon+Nom // This one seems to be the different one. Perhaps here because
-
Inf1 -> Noun -> Verb may contain false positives.
aramaktayım -> has ara:Verb + mak:Inf1 + Noun + A3sg + Pnon + ta:Loc + Zero + Verb + Pres + yım:A1sg yazmalıyım -> yaz:Verb + ma:Inf2 + Noun + A3sg + Pnon + Nom + lı:With + Adj + Zero + Verb + Pres + yım:A1sg yazmalıyız -> yaz:Verb + ma:Inf2 + Noun + A3sg + Pnon + Nom + lı:With + Adj + Zero + Verb + Pres + yız:A1pl
false positive? Oflazer accepts.
yazmalılar:
yaz:Verb + malı:Neces + lar:A3pl
yaz:Verb + ma:Inf2 + Noun + A3sg + Pnon + Nom + lı:With + Adj + Zero + Verb + Pres + lar:A3pl
yaz:Verb + ma:Inf2 + Noun + A3sg + Pnon + Nom + lı:With + Adj + Zero + Noun + lar:A3pl + Pnon + Nom
three solution is too much. Ofalzer offers:
yaz+Verb+Pos+Neces+A3pl
yaz+Verb+Pos^DB+Noun+Inf2+A3sg+Pnon+Nom^DB+Adj+With^DB+Noun+Zero+A3pl+Pnon+Nom
for "okumayacak" oflazer offers: okumayacak oku+Verb+Neg+Fut+A3sg okumayacak oku+Verb+Neg^DB+Adj+FutPart+Pnon <-- This is odd. Oflzer normally does not use possesive suffixes with Adjectives. First a noun derivation occurs. Same goes for "okumayacağım" okumayacağım oku+Verb+Neg+Fut+A1sg okumayacağım oku+Verb+Neg^DB+Adj+FutPart+P1sg okumayacağım oku+Verb+Neg^DB+Noun+FutPart+A3sg+P1sg+Nom
-
Do not allow Verb+PastPart->Noun->Verb+Pres+A1pl for "okumadığım"
-
After adding PhoneticAttribute CannotTerminate we may remove some ExpectWovel conditions from Morphotactics
-
Oflazer does not solve Adj+Pnon for FuturePart.
-
Ness seems not a good name.
-
withoutHavingDoneSo, withoutBeingAbleToHaveDoneSo terrible names.
-
domatesinki
domates+Noun+A3sg+Pnon+Gen^DB+Pron+Rel+A3sg+Pnon+Nom <-- this is interesting. Not yet added.
-
for okuyacaklardır,okumazlardır Oflazer gives single soluiton
-
Oflzer offers same output for gelecektirler and geleceklerdir. => +Verb+Pos+Fut+A3pl+Cop
Zemberek changes orders of A3pl and Cop
-
Equ suffix is dodgy. evimce.. evlerince etc.
-
Oflazer rejects "elmayalar" but accepts "elmayayım"
-
"yazacaklardırlar" we now accept it with yaz:Verb + acak:FutPart + Noun + lar:A3pl + Pnon + Nom + Zero + Verb + Pres + dır:Cop + lar:A3pl should we?
-
Oflazer does not have Related (-sAl) suffix. It contains words with that suffix such as "bilimsel" If we will use Oflazer conversion, this will require some attention.
-
Should we parse okuyasıma, okuyasımda? Oflazer does. +Verb+Pos^DB+Noun+FeelLike+A3sg+P1sg+Dat
-
suikasti or suikastı? TDK says second. Second is very common
-
okumuşken should probably has one analysis only.
-
yazasınız should have 2 solutions but has 3
-
should we accept elmaymışlar or elmasındalarmış? Oflazer doesnt
-
some typos appear AorPart+...+Loc such as "darılmazda" should be "darılmaz da". "çıkılırda"
-
remove some proper nouns that causes ambiguity and false positives.
Li, Hayir
etc. -
we need to make "bizim" same as oflazer. Or, solve bizimki etc.
-
OFlazer does not use suffix for
gel-im, gid-im
. Includes them in the dictionary. Should we? -
Should we accept "akşamdaki", "andaki", "yıldaki"
-
Some adverbs can derive to verbs such as: şöyleydi şöyleyim
-
Some can derive but only for some suffixes: tesadüfendi, tesadüfenmiş, tesadüfendir
-
False analysis
olan
o:Adj + lan:Acquire + Verb + Imp + A2sg -
her->noun root is incorrect.
-
diğer:Noun incorrect
-
bu: Noun incorrect, add Det
-
her: Noun incorrect? add Det
-
size: Noun incorrect
-
hemde
withhe-m-de
analysis is almost always incorrect, perhaps possessive suffixes should be prohibited for some words like letter pronunciations. -
Omuzum
-> terrible analysiso-muz-um
-
olabilirimde -> almost always incorrect
-
should we accept:
birbirlerimize birbirlerimizin birbirlerinize birbirlerinizin
-
Consider adding "'" to analysis.
-
We handle Reciprocal a little different. if a verb has
Reciprocal
attribute, we assume it has an enpty Reciprocal suffix. However, this may cause some extra ambiguity because we also allow actual reciprocal suffix to root forms. For example:dövüşmek [A:Reciprocal] dövüştük -> dövüş + Recip + ... dövüştük -> döv + üş:Recip + ...
Therefore probably we need to ass "NonReciprocal" attribute to "dövmek" Also some verbs should not get Reciprocal suffix and needs to be marked with "NonReciprocal"
-
consider adding "giller" as a derivative suffix.
-
Second is not correct. alçaklıktır [alçak:Adj] alçak:Adj|lık:Ness→Noun+A3sg+Pnon+Nom|Zero→Verb+Pres+A3sg+tır:Cop [alçak:Adj] alçak:Adj|Zero→Noun+A3sg+Pnon+Nom|lık:Ness→Noun+A3sg+Pnon+Nom|Zero→Verb+Pres+A3sg+tır:Cop
-
bununla -> not working. Oflazer: bunla bunla bu+Pron+Demons+A3sg+Pnon+Ins bununla bununla bu+Pron+Demons+A3sg+Pnon+Ins
-
iseniz [imek:Verb] i:Verb+se:Desr+niz:A2pl <-- wrong. +Verb+Pos+Cond+A2pl
needs to connect to noun->verb root.
-
Camii'ne <--- P3sg+Dat missing.
[Camii:Noun,Prop] camii:Noun+A3sg+n:P2sg+e:Dat- [camii:Noun] camii:Noun+A3sg+n:P2sg+e:Dat-
-
Equ should not be allowed for proper nouns.
-
-lan should not be allowed for proper nouns.
-
Terim'in should have a Proper analysis.
-
all words that starts with capital letter should be a proper noun candidate.
-
üzerinden seems not to have P3sg analysis.
-
Rethink Reciprocal morphotactics as it generates painful ambiguity.
-
"vazgeçilmezi" seems ok but not sure if it can be generalized.
beklemedeyiz bilmezden
-
odalar
[o:Adj] o:Adj|Zero→Noun+A3sg+da:Loc|Zero→Verb+Pres+lar:A3pl --> WRONG
-
Consider adding abbreviations with dot directly. Or add a rootattribute
-
for "üzeri" Oflazer only offers "üzer+Noun+A3sg+P3sg+Nom" but zemebrek has 2 parses
[üzeri:Noun] üzeri:Noun+A3sg [üzeri:Noun] üzeri:Noun+A3sg+P3sg
-
also similarly zemberek offers 3 parses of zeytinyağına
zeytinyağına [zeytinyağı:Noun] zeytinyağı:Noun+A3sg+na:Dat [zeytinyağı:Noun] zeytinyağı:Noun+A3sg+n:P2sg+a:Dat [zeytinyağı:Noun] zeytinyağı:Noun+A3sg+P3sg+na:Dat
but Oflazer offers 2
zeytinyağına
zeytinyağına zeytinyağı+Noun+A3sg+P2sg+Dat
zeytinyağına zeytinyağı+Noun+A3sg+P3sg+Dat
- Provide a common sanitation method for corpora sentences. replace \s, u00a0 and u200b with single space, remove u00ad if contains combining diacritics chars [u+0300-u+036f], fix or ignore sentence.
- Turkish location names dictionary may be generating false positives as it contains many weird names. Consider not using it.
- annemler, annenler etc can be resoved with implicitPlural, ImplicitP1sg and ImplicitP2sg attributes. However, for making morphotactics work easier, A third property (FamilyMember?) can be added.