[json schema] Do not throw errors when parsing non annotated parameters #1157

h-mayorquin · 2024-12-13T15:41:12Z

This comes from the IBL project.

Summary:
If someone creates their own interface (inherits from BaseDataInterface) but does not annotate the __init__ function of the json validation will throw an obscure error:

self = <pydantic.json_schema.GenerateJsonSchema object at 0x77dfd6ff52d0>, schema = {'cls': <class 'inspect._empty'>, 'metadata': {}, 'type': 'is-instance'}
error_info = "core_schema.IsInstanceSchema (<class 'inspect._empty'>)"

    def handle_invalid_for_json_schema(self, schema: CoreSchemaOrField, error_info: str) -> JsonSchemaValue:
>       raise PydanticInvalidForJsonSchema(f'Cannot generate a JsonSchema for {error_info}')
E       pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.IsInstanceSchema (<class 'inspect._empty'>)
E       
E       For further information visit https://errors.pydantic.dev/2.7/u/invalid-for-json-schema

../../miniconda3/envs/work/lib/python3.11/site-packages/pydantic/json_schema.py:2152: PydanticInvalidForJsonSchema
================================================================================================ short test summary info =================================================================================================
FAILED tests/test_minimal/test_utils/test_get_json_schema_from_method_signature.py::test_get_json_schema_from_method_signature_unannotated - pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.IsInstanceSchema (<class 'inspect._empty'>)
==============================================================

The fix, in my view, is to omit validation when the user does not annotate their signature parameters as there is nothing to validate. I added a fix and a test here.

bendichter · 2024-12-16T19:10:51Z

I don't really see the point of leaving an arg un-annotated, especially if one is going to use the get_json_schema_from_method_signature method. I agree that the error message is obscure, but if that is the problem I think the best solution would be to improve the error message. If they really have a compelling use-case for leaving an arg fully unconstrained then I guess I would be OK with supporting that by throwing a warning instead of an error.

h-mayorquin · 2024-12-16T19:27:14Z

I don't really see the point of leaving an arg un-annotated, especially if one is going to use the get_json_schema_from_method_signature method. I agree that the error message is obscure, but if that is the problem I think the best solution would be to improve the error message. If they really have a compelling use-case for leaving an arg fully unconstrained then I guess I would be OK with supporting that by throwing a warning instead of an error.

Important context: the method get_json_schema_from_method_signature is always used when initializing interfaces. Currently, there is no option to avoid source validation, either at the interface or converter level, so they need to use it.

The argument to leave arguments unannotated is the one that you have made before: lowering the barriers to contribution for users by not adding hurdles. Forcing them to use type annotations to inherit from BaseDataInterface is a contentious hurdle. My impression is that the Python community is still quite divided about using annotations or not.

If you still prefer to force the user to use annotations, then the error can be something like:

"The arguments x, y, and z lack type annotations. Add them to initialize the interface."

What do you think?

bendichter · 2024-12-17T06:43:39Z

the method get_json_schema_from_method_signature is always used when initializing interfaces

Why are we always calling this function? Is this necessary?

bendichter · 2024-12-17T06:44:28Z

If we want to enforce typing, we could use beartype: https://github.com/beartype/beartype

h-mayorquin · 2024-12-17T15:12:05Z

the method get_json_schema_from_method_signature is always used when initializing interfaces

Why are we always calling this function? Is this necessary?

To validate source data. To me it seems like a nice feature to have and it has been like that since the nwb-conversion-tools times. At least for the NWBConverter.

If we were to disable source validation under which conditions we would?

I personally would prefer the outcome of this PR. Once this PR is merged it will work like this for interfaces and converters:

If someone does annotate their interfaces __init__ then we do the source schema validation.
If someone does not or only annotates some of them then we respect their intent and we don't validate the ones without type annotations (that is, we don't force them to annotate or add typing.Any)

beartype looks nice. We have something similar with Pydantic validate_call:
https://docs.pydantic.dev/latest/api/validate_call/

bendichter · 2024-12-18T08:55:15Z

I think these issues are separable:

We want to generate a json schema for input args. This is useful for interfacing with other applications like NWB GUIDE
We want to validate inputs to the init function.

We don't necessarily need to do validation with the json schema, when working within Python. There are some nuances in the conversion from args to json schema that can cause complications here.

It currently requires annotation for every arg, which is cumbersome
There are some inconsistencies in requirements between Python and JSON. For example there is this pydantic issue where JSON schema dates require timezones and Pydantic-validated dates do not.

The pydantic library has come a long way since we originally wrote this code, and now facilitates a much simpler approach. Now we can change the code so that instead of attempting to create a json schema and then validate every call with it, we use the method you mentioned, @validated_call. In so doing, we are removing the get_json_schema_from_method_signature method from this workflow, and we can have stricter requirements for this without worrying about creating a burden for users.

h-mayorquin · 2024-12-18T16:27:32Z

Oh, I see how you are thinking. I fully agree with this reasoning that they are two distinct responsibilities! I was under the wrong impression that you wanted to have a JSON validation mechanism for source data of some sort. My bad. I think that Pydantic (or Beartype) does a better job and offers many advantages: easier to opt out, automatic handling of non-annotated parameters, etc.

For your information, we already have a @validate_call mechanism for most interfaces. When this was introduced, I argued for removing the JSON source checks here, but I failed to convince Cody. Perhaps some of the context from that discussion is important here.

So, what do you think of the following plan?

The current PR should throw a more explicit error instead of omitting validation of untyped parameters as it currently does.
In another PR we should remove source validation with JSON on the interfaces and converters.

The first point makes sense because the get_json_schema_from_method_signature will then only be called explicitly by users like the NWBGuide, and then an error is better aligned with the intent than an omission in that case.

For the second point, we are already performing validate_call at the interface level, so removing what is arguably redundant functionality makes sense.

bendichter · 2024-12-18T18:18:05Z

Sounds great!

h-mayorquin · 2024-12-19T14:54:08Z

I implemented your feedback, @bendichter, and changed from omission to error and modified the test. Thanks for the discussion.

codecov · 2024-12-20T21:14:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.75%. Comparing base (96dfdff) to head (5bd670b).
Report is 12 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1157      +/-   ##
==========================================
+ Coverage   90.69%   90.75%   +0.06%     
==========================================
  Files         129      129              
  Lines        8189     8287      +98     
==========================================
+ Hits         7427     7521      +94     
- Misses        762      766       +4

Flag	Coverage Δ
unittests	`90.75% <100.00%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...nterfaces/ecephys/basesortingextractorinterface.py	`80.19% <100.00%> (ø)`
src/neuroconv/utils/json_schema.py	`95.06% <100.00%> (+0.51%)`	⬆️

... and 3 files with indirect coverage changes

h-mayorquin added 2 commits December 13, 2024 08:28

small addition

a90f295

add test and fix

508c8ea

h-mayorquin self-assigned this Dec 13, 2024

h-mayorquin added bug schema labels Dec 13, 2024

h-mayorquin mentioned this pull request Dec 13, 2024

pydantic validation failure catalystneuro/IBL-to-nwb#101

Closed

changelog

540dadb

h-mayorquin marked this pull request as ready for review December 13, 2024 16:42

h-mayorquin added 2 commits December 13, 2024 11:33

remove extra line for debugging

c1cb960

Merge branch 'main' into add_annotation_to_vebose_on_sorting

8fad6a9

h-mayorquin requested a review from bendichter December 16, 2024 18:53

h-mayorquin added 2 commits December 19, 2024 08:33

Merge branch 'main' into add_annotation_to_vebose_on_sorting

76a1f5c

change from omission to error

e3c107d

h-mayorquin mentioned this pull request Dec 19, 2024

Remove json schema validation from interfaces and converters #1166

Open

fix changelog

5bd670b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[json schema] Do not throw errors when parsing non annotated parameters #1157

[json schema] Do not throw errors when parsing non annotated parameters #1157

h-mayorquin commented Dec 13, 2024 •

edited

Loading

bendichter commented Dec 16, 2024 •

edited

Loading

h-mayorquin commented Dec 16, 2024 •

edited

Loading

bendichter commented Dec 17, 2024

bendichter commented Dec 17, 2024

h-mayorquin commented Dec 17, 2024 •

edited

Loading

bendichter commented Dec 18, 2024

h-mayorquin commented Dec 18, 2024

bendichter commented Dec 18, 2024

h-mayorquin commented Dec 19, 2024

codecov bot commented Dec 20, 2024

[json schema] Do not throw errors when parsing non annotated parameters #1157

Are you sure you want to change the base?

[json schema] Do not throw errors when parsing non annotated parameters #1157

Conversation

h-mayorquin commented Dec 13, 2024 • edited Loading

bendichter commented Dec 16, 2024 • edited Loading

h-mayorquin commented Dec 16, 2024 • edited Loading

bendichter commented Dec 17, 2024

bendichter commented Dec 17, 2024

h-mayorquin commented Dec 17, 2024 • edited Loading

bendichter commented Dec 18, 2024

h-mayorquin commented Dec 18, 2024

bendichter commented Dec 18, 2024

h-mayorquin commented Dec 19, 2024

codecov bot commented Dec 20, 2024

Codecov Report

h-mayorquin commented Dec 13, 2024 •

edited

Loading

bendichter commented Dec 16, 2024 •

edited

Loading

h-mayorquin commented Dec 16, 2024 •

edited

Loading

h-mayorquin commented Dec 17, 2024 •

edited

Loading