Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema Regex issues #47

Open
utx0 opened this issue Apr 27, 2019 · 7 comments
Open

JSON Schema Regex issues #47

utx0 opened this issue Apr 27, 2019 · 7 comments
Labels
bug Something isn't working

Comments

@utx0
Copy link

utx0 commented Apr 27, 2019

I am currently working with a JSON Schema that is shared with a frontend application for building and validating a data structure that then get's passed to my backend go service which also uses the schema for data validation. Now the issue I'm having is the schema in question is using the regex pattern keyword for data validation and this works fine when its in the js world, however when using this schema with this jsonschema package I get the following error:

panic: error unmarshaling properties from json: error unmarshaling items from json: json: cannot unmarshal object into Go value of type []*jsonschema.Schema

goroutine 1 [running]:
github.com/qri-io/jsonschema.Must(0xc000166000, 0x776, 0x776)
        /Users/metta/golang/pkg/mod/github.com/qri-io/[email protected]/schema.go:24 +0xbf

Now from my research go uses a different regex engine under the hood. And I am wondering how other deal with this when sharing a schema between different languages such as go and js?

@b5
Copy link
Member

b5 commented Apr 27, 2019

Thankfully the spec has guidance on regex interop. Specifically:

given the high disparity in regular expression constructs support, schema authors SHOULD limit themselves to the following regular expression tokens:

individual Unicode characters, as defined by the JSON specification [RFC7159];
simple character classes ([abc]), range character classes ([a-z]);
complemented character classes ([^abc], [^a-z]);
simple quantifiers: "+" (one or more), "*" (zero or more), "?" (zero or one), and their lazy versions ("+?", "*?", "??");
range quantifiers: "{x}" (exactly x occurrences), "{x,y}" (at least x, at most y, occurrences), {x,} (x occurrences or more), and their lazy versions;
the beginning-of-input ("^") and end-of-input ("$") anchors;
simple grouping ("(...)") and alternation ("|").

So, basically, when writing a schema, try to keep the regex simple. But we can get a little more specific b/c you're dealing in go-to-js regex conversion. Javascript is supposed to use ECMA-262 regex, and go's regexp library is built to satisfy RE2. Wikipedia has a great comparison between the two:

Looks like the two biggest difference on the "basic features" are backreferences and look-ahead. So for your purposes it would be worth keeping those in mind. JS doesn't support any of the fancy (scary?) features on the second table, so we don't need to worry about those.

I consider it a bug that you didn't get a good, human-readable error when regex parsing fails due to these limitations. If you have any interest in filing a PR that adds a test and a better error message, I'd be deeply appreciative. A great place to start would be returning a better error message here:

return fmt.Errorf("invalid regex expression")

@b5 b5 added the bug Something isn't working label Apr 27, 2019
@dolmen
Copy link

dolmen commented Apr 28, 2020

The implementation uses package regexp to validate pattern properties. But the regexp package does not implement the full regexp syntax of ECMA-262 used in the JSON Schema spec.
This means that jsonschema is not compliant.

This problem is not even documented in the README or in package documentation.

@b5
Copy link
Member

b5 commented Apr 28, 2020

I agree that this package is not 100% compliant with the spec, but not for this reason. The section from the spec you're quoting:

This string SHOULD be a valid regular expression, according to the ECMA 262 regular expression dialect.

I interpret the use of the word SHOULD in the above sentence to mean that both json schema implementers and spec authors SHOULD aim for ECMA-262. If fully implementing ECMA-262 were required to be considered part of the spec, I would expect the above sentence to use MUST instead of SHOULD.

Doing a full ECMA-262 spec implementation would require an extra import that isn't in the go standard library. We haven't had anyone ask for it yet, and I prefer to stick to standard library imports whenever possible, but if this is a major sticking point for people using this package, I'd be happy to talk through it.

This problem is not even documented in the README or in package documentation.

We'd gladly accept a pull request that documents the difference in regex parsing.

@handrews
Copy link

Yeah, regexes are a bit of an interoperability nightmare. As an example, you'll note that . (dot) is not included in that list of tokens that people should limit themselves to, yet it's one of the most commonly used regex tokens.

There's no great solution, the language in the spec is intended to indicate that JSON Schema inherits the same interoperability problems, and we pick one recommended standard with the understanding that none of the standards are universally supported.

@joshuacc
Copy link

joshuacc commented Sep 1, 2022

I also ran into this issue when using negative lookbehind in the regex pattern, and it took me a while to understand what the problem was.

Any chance we could get clearer error message when this occurs?

@dolmen
Copy link

dolmen commented Apr 26, 2023

The Otto project (a JavaScript interpreter) has a function that fixes some incompatibilities between Go regexp and JS regexp.

But that will not add lookbehind.

@dolmen
Copy link

dolmen commented Apr 26, 2023

Module github.com/dlclark/regexp2 has an ECMAScript mode. But it is important for safety (protect against DoS) to set timeouts, so the jsonschema API must allow to set regexp time limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants