Make MarkdownFields translatable #102

jeriox · 2022-07-05T21:00:01Z

Currently, when using wagtail-localize, a MarkdownField cannot be translated in an easy way, as the whole content of the field is put into one translation segment. For a long page with a markdown body, this is not feasible. I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately.
I wrote a hacky solution for that some time ago, but it breaks with the current version. I'd be happy if we could find a way to support that properly.

My old code for reference:

import html2text
from django.db.models import TextField
from wagtail_localize.segments import (
    OverridableSegmentValue,
    StringSegmentValue,
    TemplateSegmentValue,
)
from wagtail_localize.segments.extract import quote_path_component
from wagtail_localize.segments.ingest import organise_template_segments
from wagtail_localize.strings import extract_strings, restore_strings

from wagtailmarkdown.utils import render_markdown
from wagtailmarkdown.widgets import MarkdownTextarea


class MarkdownField(TextField):
    def formfield(self, **kwargs):
        defaults = {"widget": MarkdownTextarea}
        defaults.update(kwargs)
        return super(MarkdownField, self).formfield(**defaults)

    def get_translatable_segments(self, value):
        template, strings = extract_strings(render_markdown(value))

        # Find all unique href values
        hrefs = set()
        for string, attrs in strings:
            for tag_attrs in attrs.values():
                if "href" in tag_attrs:
                    hrefs.add(tag_attrs["href"])

        return (
            [TemplateSegmentValue("", "html", template, len(strings))]
            + [StringSegmentValue("", string, attrs=attrs) for string, attrs in strings]
            + [OverridableSegmentValue(quote_path_component(href), href) for href in sorted(hrefs)]
        )

    def restore_translated_segments(self, value, field_segments):
        format, template, strings = organise_template_segments(field_segments)
        return html2text.html2text(restore_strings(template, strings))

zerolab · 2022-07-06T08:27:52Z

Hey @jeriox,

thank you for sharing this. Had a few requests for making this localize-compatible, so the code snippet is very handy!

jeriox · 2022-08-29T12:58:51Z

I got it working again with the code above, we will use that for now. Still feels a bit hacky to me, so we'd be happy if there was a better alternative built in :)

zerolab · 2022-08-29T13:29:36Z

This would need a bit of thinking. e.g.

I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately.

Where do you draw the line and split things? is it at every link? every paragraph? every heading? given we can allow raw html in there too, how should we handle that?

jeriox · 2022-08-29T13:43:54Z

This would need a bit of thinking. e.g.

I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately.

Where do you draw the line and split things? is it at every link? every paragraph? every heading? given we can allow raw html in there too, how should we handle that?

Currently, my approach works as follows: as there is already a lot of thought going into how to split up StreamFields, I tried to reuse that as much as possible. Therefor, I render the markdown to HTML and use the existings extract_strings() method. This also ensures that links are treated appropriatly. For the other direction, using html2text works quite well. I didn't test with raw HTML though. I think that every paragraph and every heading is a good split, as it ensures that one doesn't need to re-translate it if the page didn't change.

jeriox · 2024-10-15T17:59:45Z

Hey @zerolab, did you have a chance to look at this any further?

zerolab · 2024-10-15T18:12:11Z

@jeriox to be honest this completely flew under my radar 🙈

I think your version is better than what we currently have (i.e. nothing). Do you have the capacity to submit a PR? We'd want the logic in get_translatable_segments and restore_translated_segments to live in its own module (say wagtail_localize.py and be conditionally loaded if localize is installed

jeriox · 2024-10-16T16:28:02Z

I think your version is better than what we currently have (i.e. nothing).

While this is true, I'm not sure if it is good enough to include it in the library. We have been using this solution in our project for two years now, and there are several problems:

references to other headings on the same page (e.g. #about) get lost during translation
as the content gets broken down into very small parts (e.g. single entries in a list), we struggle a lot with Repeated text fragments misleadingly show "missing" translations wagtail/wagtail-localize#624
images suffer from Localize the Wagtail Image library wagtail/wagtail-localize#378
inline formatting sometimes produces additional spaces during translation

If those are okay for you, I can open a PR. We'd like to do the splitting on our own instead of relying on converting to HTML, but we didn't have the capacity to do so yet

zerolab · 2024-10-16T16:34:33Z

Thank you for the additional context on real-life usage. Absolutely fantastic to know.

What if we make the get_translateable_segments bit pluggabable (i.e. you can change it to your own project's method that does what you want it to do?

The images question is outside of wagtail-markdown's purview, I'm afraid. We definitely need to solve this more centrally.

jeriox · 2024-10-16T16:42:23Z

I'm not sure that it needs to be specifically plugabble, as you could just subclass the provided MarkdownField and change the get_translateable_segments if you are not happy with it, this is the same approach that we are currently using to implement it in the first place.

So I guess we could just include my current solution as the default, especially if images are out of scope anyways and maybe the issue with duplicate segments gets fixed centrally as well. The other things are just small issues IMO and could just be mentioned in the docs

asennoussi · 2024-11-24T06:00:16Z

I'm interested in any updates around this topic.
I also have articles with markdown and the translation input is a text area that doesn't support new lines so it makes it really not handy to translate. (I have to add the
myself but haven't tried it myself either)

jeriox mentioned this issue Jul 5, 2022

Use wagtail-markdown instead of custom code fsr-de/myHPI#116

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make MarkdownFields translatable #102

Make MarkdownFields translatable #102

jeriox commented Jul 5, 2022 •

edited by zerolab

Loading

zerolab commented Jul 6, 2022

jeriox commented Aug 29, 2022

zerolab commented Aug 29, 2022

jeriox commented Aug 29, 2022

jeriox commented Oct 15, 2024

zerolab commented Oct 15, 2024

jeriox commented Oct 16, 2024

zerolab commented Oct 16, 2024

jeriox commented Oct 16, 2024 •

edited

Loading

asennoussi commented Nov 24, 2024

Make MarkdownFields translatable #102

Make MarkdownFields translatable #102

Comments

jeriox commented Jul 5, 2022 • edited by zerolab Loading

zerolab commented Jul 6, 2022

jeriox commented Aug 29, 2022

zerolab commented Aug 29, 2022

jeriox commented Aug 29, 2022

jeriox commented Oct 15, 2024

zerolab commented Oct 15, 2024

jeriox commented Oct 16, 2024

zerolab commented Oct 16, 2024

jeriox commented Oct 16, 2024 • edited Loading

asennoussi commented Nov 24, 2024

jeriox commented Jul 5, 2022 •

edited by zerolab

Loading

jeriox commented Oct 16, 2024 •

edited

Loading