-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double hyphens in compounds words in Czech, Portuguese, etc. #1963
Comments
According to your profile, @jodros you are from Brasil, perhaps you can comment on these rules for Portuguese? If this is not widespread, we may need settings... Typography is hard ;) |
Right, but I still don't know how the hyphenation algorithm works, where could I start? |
As far as you know, if line is broken at the dash in Portuguese "anti-inflamatório", should it yield: case 1 (nothing fancy)
Or case 2 (repeated hyphens):
In the second case, we'd need to generalize the solution adopted for Polish. It seems we have to do it for other languages too, but trying to understand which languages are concerned and whether it's a widespread rule in these languages --- so as to propose the correct generalization. |
This comment was marked as off-topic.
This comment was marked as off-topic.
As for the general logic (simplified, but it took me a while to get a grasp of it -- a bit off-topic here, but worth trying to explain anyway):
To recap, then:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Any thoughts on whether we'll know enough to add fixes for other languages soon or should I go ahead with v0.14.15 with the Catalan/Polish/Turkish/French features we have queued up already? |
It depends when you want to ship 0.14.15 -- I'm willing to work on the topic, but I don't think it needs to be rushed -- after all, SILE's presence on GitHub just passed 10+ years, and no one came asking for these... So I bet we can take some time to think on how to do it properly. In the same vein, #1242 (deriving from a fix where I needed to deactivate the French unicode segmenter) could possibly be addressed too in a nicer way. In the meantime, we have a "quick workaround" if anyone urgently needs the repeating dashes in any language. Just insert the ugly hack after your first target language change, and voilà!
(Checked with Portuguese, Czech and Spanish) |
I have an upcoming publication project that wants to use the alternate Turkish apostrophe handling and it is always much nicer to do production work in a shipped stable version of SILE. At this point the release machinery is working pretty well and it isn't too much hassle to make small patch releases with incremental improvements. |
Interesting feedback here typst/typst#3235 (comment) adding (lower) Sorbian and Croatian to the list, and confirming Czech and Slovak. Sorbian is a minority language (< 50000 people), it doesn't have a 2-letter language codes. Unless mistaken the 3-letter codes are |
Food for thought: I am not sure we should use settings to enable/disable such features, at the cost of checking them many times, when they wouldn't change much normally. A possible alternative would be encode this in the BCP47 language name, as an extension. So far, unless mistaken, BCP47 only has two official extensions, For instance:
|
A minor point: I believe that string of characters after the |
This is not a comment on using BCP-47 private extensions because I haven't looked into that... But yes @Omikhleia sometimes where we want to use a setting is too hot a loop to actually be checking it given that they can change almost any time. But something we haven't really utilized yet but could if we need to is callbacks: there is no reason we can't rig up |
@alerque Yes, active hooks on settings is also a possibility I had in mind too. I'm always reluctant on such hooks / callbacks (because ordering is unclear and side-effects are not always intended), but it may have to be considered. As an additional food for though: I suspect those language would not repeat the hyphens when breaking URLs (and thus would have to bypass it, as does the current |
@Omikhleia I've just checked for examples in a reference grammar1 in the part about hyphens, and indeed all the examples testify in favor of case 2. Footnotes
|
For Basque (which we support, code = Both seem to contradict the repetition of hyphens (marratxoa) mentioned in "Lerro-bukaerako marratxoa hitz-elkarketarena izanez gero, ez dago marratxo hori errepikatu beharrik hurrengo lerroaren hasieran." --> Google translated: "If the hyphen at the end of the line belongs to the combination of words, there is no need to repeat that hyphen at the beginning of the next line." And the second document even illustrates the wrong usage (marked with an asterisk) and the correct one. --> So no for Basque, in the general case. (I did see various posts on the web from people asking how to do it, but official recommendations seem to disfavor it) |
As noted in #1960, it seems Czech also repeats hyphens when breaking a compounds word. Some other languages might do the same, see below.
The same solution would likely apply.
But I only found single references to this feature in TeX StackExchange discussions -- We may need some more normative documents and references before generalizing such a feature...
The text was updated successfully, but these errors were encountered: