-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide partial support for string "format" constraint #16
base: master
Are you sure you want to change the base?
Conversation
"email" format validation may be added with one extra line in the WHEN 'email' THEN IF target !~* '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$' THEN RAISE; END IF; It would not be as strict as the rest though. The regular expression might be brushed up too. |
Nice work. I've thought about adding this myself and can see it might be useful (especially for uuid). Have you performance tested this though? Something in the back of my head thinks that adding exception handling adds a bit of overhead. |
Fair enough, exception handling does add some overhead. Yet given the relative complexity of regular expressions and ISO8601 validation (620+ timezone names and abbreviations) I think that this is a price worth paying. The performance loss is not that significant. Benchmark on a very modest laptop - 100K |
Thanks for checking. What are the with exception handling and without exception handling benchmark times? It's worth mentioning because people might validate large amounts of data (if triggers run on updates for example) which might have unintended consequences based on the current version. |
Using the regexp in https://dba.stackexchange.com/a/165923 following HTML5 email spec would surely be better: WHEN 'email' THEN IF target !~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' THEN RAISE; END IF; it does not follow RFC 5321, section 4.1.2 specified in json-schema format, but as written in HTML5 email spec:
|
Validating 100K |
Uses the HTML5-style regex suggested by [Pierre Baumard](https://github.com/pbaumard)
Nice & clean. I have added it to the PR, quoting your comment/suggestion. |
postgres-json-schema--0.1.1.sql
Outdated
IF schema ? 'format' AND jsonb_typeof(data) = 'string' THEN | ||
DECLARE | ||
target text := (data #>> '{}'); | ||
EMAIL_RX constant text := '^[\w.!#$%&''*+/=?^`{|}~-]+@[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?(?:\.[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?)*$'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case sensitive check is faster, so why not use the full case sensitive regex here and target !~ EMAIL_RX
below?
EMAIL_RX constant text := '^[\w.!#$%&''*+/=?^`{|}~-]+@[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?(?:\.[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?)*$'; | |
EMAIL_RX constant text := '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine. Longer and maybe more difficult to read but 10% are always 10%.
postgres-json-schema--0.1.1.sql
Outdated
WHEN 'ipv6' THEN PERFORM target::inet; IF target NOT LIKE '%:%' THEN RAISE; END IF; | ||
WHEN 'ipv4' THEN PERFORM target::inet; IF target LIKE '%:%' THEN RAISE; END IF; | ||
WHEN 'regex' THEN PERFORM '' ~ target; | ||
WHEN 'email' THEN IF target !~* EMAIL_RX THEN RAISE; END IF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See regex comment above, a case sensitive check would be faster:
WHEN 'email' THEN IF target !~* EMAIL_RX THEN RAISE; END IF; | |
WHEN 'email' THEN IF target !~ EMAIL_RX THEN RAISE; END IF; |
Provide partial support for string "format" constraint
Formats that correspond to native PostgreSQL data types are implemented. These are
Email format validation by a regex suggested by pbaumard.
Consistent with current behaviour unsupported options validate positive.