-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Magiclink support for custom autolinked references #2550
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,8 @@ implement (if desired) via the provided [classes](#css). User's are free to refe | |
|
||
MagicLink is an extension that provides a number of useful link related features. MagicLink can auto-link HTML, FTP, and | ||
email links. It can auto-convert repository links (GitHub, GitLab, and Bitbucket) and display them in a more concise, | ||
shorthand format. MagicLink can also be configured to directly auto-link the aforementioned shorthand format. | ||
shorthand format. MagicLink can be configured to directly auto-link the aforementioned shorthand format. | ||
MagicLink can also auto-link simple user-defined patterns for use cases like ticketing systems. | ||
|
||
If you happen to have some conflicts with syntax for a specific case, you can always revert to the old auto-link format | ||
as well: `#!md <https://www.link.com>`. If enabled, repository link shortening will be applied to the angle bracketed | ||
|
@@ -479,6 +480,51 @@ and will copy them for your custom repository. If this is not sufficient, you ca | |
`shortener_user_exclude` for your custom repository provider using your specified `name`. If you manually set excludes | ||
in this manner, no excludes from the same `type` will be copied over. | ||
|
||
## Simple User-Defined References | ||
|
||
You can integrate with additional services like ticketing systems or shortlink providers (sometimes called `go/` links) by defining simple patterns that will be autolinked. | ||
|
||
The `simplerefs` configuration supports matching a static prefix followed by an alphanumeric identifier. We look for patterns of form `{ref_prefix}{identifier}` with the following restrictions: | ||
|
||
- `ref_prefix` is a configured literal value consisting of a letter followed by alphanumeric characters or punctuation from the set `_/-`; it will usually end with punctuation like `TICKET-` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should also note, is there some restriction of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see why not as long as it doesn't overlap with other prefixes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this something that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For a I just checked Github autolink references, which allowed me to configure a prefix There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's fine, but I don't think we have to completely mirror what GitHub does, but knowing these details is useful. |
||
- `identifier` is an alphanumeric string containing at least one character; it will be captured and injected into a configured target URL | ||
- The pattern must be preceeded by whitespace or be at the beginning of the line | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think restriction by whitespace is not good. What if I want to do something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Certainly. I think this should match behavior for other link types in magiclink. Perhaps it's best to leave this implied, since the docs don't discuss word boundary details for other link types. |
||
- The pattern must be followed by whitespace or punctuation from the set `.,!?:` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is too specific and too restrictive. |
||
|
||
Suppose we have a Jira project called `TICKET` where the issues are referred to like `TICKET-123` and a shortlink service where labels look like `go/myproject`. Our configuration will be: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not 100% sold on Worst case, it can be considered in the future if I cannot come to a definitive answer now. Granted, I do understand that these would only capture refs that specifically match the prefix, not all paths with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should further note, that I still don't even know exactly what a While it does appear that the URL uses There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's a pattern very much like a link shortening service like tinyurl, but internal to a company and generally people give meaningful names for the links rather than having them auto-generated. There are some vendors like https://www.golinks.io/ that provide this as a service. And companies will sometimes provide DNS support or a browser plugin so that you can literally type |
||
|
||
``` | ||
'simplerefs': [ | ||
{ | ||
# Anchor tags will have class name "magiclink-simplerefs-ticket" | ||
'ref_prefix': 'TICKET-', | ||
# The target must contain "<id>" to indicate where to inject the captured identifier | ||
'target_url': 'https://ticket.test.com/TICKET-<id>' | ||
}, | ||
{ | ||
# Anchor tags will have class name "magiclink-simplerefs-go" | ||
'ref_prefix': 'go/', | ||
'target_url': 'https://go.test.com/<id>' | ||
} | ||
] | ||
``` | ||
|
||
Each `ref_prefix` is paired with a `target_url` that must contain an `<id>` placeholder where the captured identifier will be injected. Although matching is case-insensitive, `simplerefs` will preserve casing of the input text. Note that the `ref_prefix` is also used to derive a class name attached to the links it produces; the class name will be the value of `ref_prefix` downcased and with punctuation removed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think all punctuation should be removed. I imagine if we allowed |
||
|
||
To make this concrete, consider the following cases using the above config: | ||
|
||
Markdown Input | HTML Output | ||
-------------- | ----------- | ||
`TICKET-123` | `<a class="magiclink magiclink-simpleref magiclink-simpleref-ticket" href="https://ticket.test.com/TICKET-123">TICKET-123</a>` | ||
`ticket-123` | `<a class="magiclink magiclink-simpleref magiclink-simpleref-ticket" href="https://ticket.test.com/TICKET-123">ticket-123</a>` | ||
`go/myproject` | `<a class="magiclink magiclink-simpleref magiclink-simpleref-go" href="https://go.test.com/myproject">go/myproject</a>` | ||
`go/my-project`| `go/my-project` (not matched due to `-` in identifier position) | ||
|
||
Users should be aware of the following restrictions: | ||
|
||
- Do not configure overlapping prefixes; for example, `TICKET` and `TICKETNUM` prefixes would both match `TICKETNUM123` so only one rule can be applied | ||
- Do not configure prefixes that differ only in punctuation; `TICKET-` and `TICKET:` would both have the same derived class name, which will throw a configuration error | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
## CSS | ||
|
||
For normal links, no classes are added to the anchor tags. For repository links, `magiclink` will be added as a class. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -86,6 +86,15 @@ | |
# Internal mention patterns | ||
RE_INT_MENTIONS = r'(?P<mention>(?<![a-zA-Z])@{})\b' | ||
|
||
# Custom references | ||
RE_SIMPLEREF_TEMPLATE = r'''(?xi) | ||
(?:(?<=\b)|(?<=_))(?: | ||
(?P<prefix>%s) # prefix; to be injected | ||
(?P<id>\w+) # identifier | ||
) | ||
''' | ||
|
||
|
||
def create_ext_mentions(name, provider_type): | ||
"""Create external mentions by provider type.""" | ||
|
||
|
@@ -1036,6 +1045,33 @@ def handleMatch(self, m, data): | |
return el, m.start(0), m.end(0) | ||
|
||
|
||
class MagiclinkSimpleRefPattern(InlineProcessor): | ||
"""Return a link Element given a custom prefix.""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: better docstring here |
||
|
||
ANCESTOR_EXCLUDES = ('a',) | ||
|
||
def __init__(self, pattern, md, shortname, target_url): | ||
"""Initialize.""" | ||
|
||
self.shortname = shortname | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO: better term than "shortname". This is the lower-cased version of the prefix that we inject into a class name and inject into the name we use when registering the inlineProcessor. |
||
self.target_url = target_url | ||
InlineProcessor.__init__(self, pattern, md) | ||
|
||
def handleMatch(self, m, data): | ||
"""Return link.""" | ||
|
||
el = etree.Element("a") | ||
prefix = m.group(1) | ||
identifier = m.group(2) | ||
el.set('href', self.target_url.replace('<id>', identifier)) | ||
el.text = prefix + identifier | ||
|
||
el.set('class', f'magiclink magiclink-simpleref magiclink-simpleref-{self.shortname}') | ||
|
||
return el, m.start(0), m.end(0) | ||
|
||
|
||
|
||
class MagiclinkExtension(Extension): | ||
"""Add auto link and link transformation extensions to Markdown class.""" | ||
|
||
|
@@ -1097,6 +1133,10 @@ def __init__(self, *args, **kwargs): | |
'custom': [ | ||
{}, | ||
"Custom repositories hosts - Default {}" | ||
], | ||
'simplerefs': [ | ||
[], | ||
"User-defined reference patterns - Default []" | ||
] | ||
} | ||
super().__init__(*args, **kwargs) | ||
|
@@ -1115,6 +1155,17 @@ def setup_autolinks(self, md, config): | |
|
||
md.inlinePatterns.register(MagiclinkMailPattern(RE_MAIL, md), "magic-mail", 84.9) | ||
|
||
def setup_simplerefs(self, md, config): | ||
"""Setup user-defined simple reference replacements.""" | ||
|
||
for simpleref_config in config['simplerefs']: | ||
ref_prefix = simpleref_config['ref_prefix'] | ||
target_url = simpleref_config['target_url'] | ||
pattern_re = RE_SIMPLEREF_TEMPLATE % ref_prefix | ||
shortname = re.sub(r'\W+', '', ref_prefix).lower() | ||
pattern = MagiclinkSimpleRefPattern(pattern_re, md, shortname, target_url) | ||
md.inlinePatterns.register(pattern, "simpleref-" + shortname, 120) | ||
|
||
def setup_shorthand(self, md): | ||
"""Setup shorthand.""" | ||
|
||
|
@@ -1308,6 +1359,7 @@ def extendMarkdown(self, md): | |
self.provider = 'github' | ||
|
||
self.setup_autolinks(md, config) | ||
self.setup_simplerefs(md, config) | ||
|
||
if self.git_short or self.social_short: | ||
self.ext_mentions = [] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -399,3 +399,65 @@ def test_deprecated_twitter_shortener(self): | |
self.assertTrue(len(w) == 1) | ||
self.assertTrue(issubclass(w[-1].category, DeprecationWarning)) | ||
self.assertTrue(found) | ||
|
||
class TestMagicLinkSimpleRefs(util.MdCase): | ||
"""Test cases for autolinking of user-defined simple references.""" | ||
|
||
extension = [ | ||
'pymdownx.magiclink' | ||
] | ||
|
||
extension_configs = { | ||
'pymdownx.magiclink': { | ||
'simplerefs': [ | ||
{ | ||
'ref_prefix': 'TICKET-', | ||
'target_url': 'https://ticket.test.com/TICKET-<id>' | ||
}, | ||
{ | ||
'ref_prefix': 'go/', | ||
'target_url': 'https://go.test.com/<id>' | ||
}, | ||
] | ||
} | ||
} | ||
|
||
def test_numeric(self): | ||
"""Test numeric identifiers.""" | ||
|
||
self.check_markdown( | ||
'TICKET-123', | ||
'<p><a class="magiclink magiclink-simpleref magiclink-simpleref-ticket" href="https://ticket.test.com/TICKET-123">TICKET-123</a></p>' # noqa: E501 | ||
) | ||
|
||
def test_word_boundary(self): | ||
"""Test numeric identifiers.""" | ||
|
||
self.check_markdown( | ||
'Hello, TICKET-123!', | ||
'<p>Hello, <a class="magiclink magiclink-simpleref magiclink-simpleref-ticket" href="https://ticket.test.com/TICKET-123">TICKET-123</a>!</p>' # noqa: E501 | ||
) | ||
|
||
def test_alphanumeric(self): | ||
"""Test alphanumeric identifiers.""" | ||
|
||
self.check_markdown( | ||
'go/abc123', | ||
'<p><a class="magiclink magiclink-simpleref magiclink-simpleref-go" href="https://go.test.com/abc123">go/abc123</a></p>' # noqa: E501 | ||
) | ||
|
||
def test_underscores(self): | ||
"""Test underscore counts as a word character.""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We originally discussed supporting "alphanumeric" identifiers, but These two test cases are validating "go/ link"-style links which would benefit from supporting So, we could consider expanding the set to "alphanumeric characters, plus Or, we could make this feature significantly more generic and powerful by letting users specify regex patterns themselves. And we'd rely on documentation to give users solid examples for the simpler cases. I'm interested in your perspective on the user base here and whether they'd generally prefer to have the power+responsibility of a more generic interface. GitHub's autolink reference feature needs to run whenever rendering the UI, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be more specific, when I said alphanumeric, I was thinking Now that I've had some more time to think, I do think in general ( I think each proposed ID should be validated with some regex such as this.
I want to be careful here as we are now running into scope increase. The original feature discussed was simply a method of providing a short name to translate into a link, now we are talking about mirroring various different issue tracking systems, and I'm not sure that is something I want or am willing to do. I'm not saying there isn't value in such suggestions, but at some point, I wonder if a user has more complicated auto-linking requirements if it would be better that they just write a specialized plugin at that point. Regardless, for now, I'd like to keep the scope reasonable or we need to go back and decide what it is we really want to do so that we can approach it the right way. Depending on what the end goal is, it may greatly affect how I'd like to go about the implementation. If there is a bigger vision here we need to take a step back and define what our goals are, and maybe this PR is not the answer. Maybe we need a different approach. If we want a more robust system for various repos that is similar to our current support, then we may need to consider refactoring and generalizing the existing system to make GitHub, Bitbucket, and GitLab handling even more generic so that Jira or any other system can be plugged in (within reason). Different systems have different username, project, and issue requirements. Some may be more difficult to identify; as patterns become more complex, so do false-positive matching and conflicts with other matching. So there are going to be some restrictions of what we can reasonably do in a generic sense. Maybe we can't always support every kind of link like we do for GitHub in Jira (or some other service). For instance, some systems may use the user email as the identifier in the URL, but may also display the user name in other aspects. Without the server making such translations and us tapping into their API, that is impossible for us. Also, we already translate emails into email links. When I was focused on the 3 big free services, the intent was that those would be all I was supporting to aid the majority of the open-source user base. They were generally kind of similar and monkied each other in various ways. Corporate installs of these often are allowed more freedom to do things a bit differently. I know our corporate username requirements for Bitbucket is different than that of the open-source version of the site. When I developed this extension, I wasn't targeting the corporate world specifically.
My thoughts are pulled between two things. I do like to give the user power but I also like to keep my maintenance cost (in regards to time) at a reasonable level as well. So my decisions are often influenced by these two things. In short, if there is a bigger vision beyond a simple short name to URL mapping, let's iron out what it is we want and are willing to do so that we can decide on a sustainable approach. If what we are doing now is too limited, we need to take a step back and reconsider now before we implement something that we want to rewrite later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is great feedback and I appreciate the depth.
I will think about this. I agree with you that we either proceed with a simple feature that allows only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do think it's worth moving forward with a fairly restrictive model of user-defined autolink references. I've updated the feature name here to "simplerefs" to better capture the spirit of this model.
I'm not entirely clear whether you're referring to the For For captured identifiers, we need to allow pure numeric values to support the Jira case, so can't require it starts with a letter. So this could be straight alphanumeric There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Then let's take a step back and let's try and create a proposal spec of what it is we want. My intention was never to support every repo in existence, and the plugin is fairly complicated as is. So when accepting new features, I have to consider that I will be maintaining the solution long after the PR is developed and accepted. Or maybe I am misunderstanding what you mean by this statement?
Numbers are captured separately from prefixes. Numbers are numbers and I'm not talking about numbers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I imagine something like this: import re
RE_VALIDATE = re.compile(r'^[a-z][-a-z0-9_]+$')
patterns = [
'test1',
'test-2',
'-test3',
'test4-'
]
validated = '|'.join(re.escape(p) for p in patterns if RE_VALIDATE.match(p))
pattern = re.compile(fr"\b({validated})([0-9]+)\b")
BUFFER = """
This is a test to find things like test-21234, test11234, and test4-1234, but not -test31234.
"""
for found in pattern.findall(BUFFER):
print(found)
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are a number of questions that come to mind looking at this. Do we want normalize the final results to all uppercase? I know we talked about matching them case insensitive, but should we normalize case on display in the final document? Should we have an implied There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, I think we should preserve the casing of the text as written in the document. Assuming uppercase feels pretty Jira-specific.
No, I think ending punctuation like that should be part of the configured prefix, as it is with GitHub's autolink references interface.
Lol. I think we should call the captured "number" an "identifier" (so So we end up capturing text in the form It might make sense for me to go ahead and add proposed documentation in this PR and have that serve as the proposal spec. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm okay with that as I want to understand what the actual proposal is opposed to seeing evolve through iterations of commits. |
||
|
||
self.check_markdown( | ||
'go/abc_123', | ||
'<p><a class="magiclink magiclink-simpleref magiclink-simpleref-go" href="https://go.test.com/abc_123">go/abc_123</a></p>' # noqa: E501 | ||
) | ||
|
||
def test_hyphen(self): | ||
"""Test hyphen breaks matching.""" | ||
|
||
self.check_markdown( | ||
'go/abc-123', | ||
'<p><a class="magiclink magiclink-simpleref magiclink-simpleref-go" href="https://go.test.com/abc">go/abc</a>-123</p>' # noqa: E501 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@facelessuser - When you get a chance to take a look, here is proposed documentation for
simplerefs
which we can consider a spec for the feature.