Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema parsing - Permit opaque key-values in Property schema #44

Open
robfig opened this issue Apr 4, 2019 · 10 comments
Open

Schema parsing - Permit opaque key-values in Property schema #44

robfig opened this issue Apr 4, 2019 · 10 comments

Comments

@robfig
Copy link

robfig commented Apr 4, 2019

Currently, this schema is valid according to https://www.jsonschemavalidator.net/

{
    "$id": "https://www.github.com/schemas/robfig",
    "properties": {
        "name": {
            "type": "string",
            "opaque": "something"
        }
    }
}

But it can not be unmarshaled, receiving this error:

error unmarshaling properties from json: error unmarshaling opaque from json: json: cannot unmarshal string into Go value of type jsonschema._schema

That is due to this code, which assumes any extra key must be a Definition:

	for prop, rawmsg := range valprops {
		var val Validator
		if mk, ok := DefaultValidators[prop]; ok {
			val = mk()
		} else {
			switch prop {
			// skip any already-parsed props
			case "$schema", "$id", "title", "description", "default", "examples", "readOnly", "writeOnly", "$comment", "$ref", "definitions", "format":
				continue
			default:
				// assume non-specified props are "extra definitions"
				if sch.extraDefinitions == nil {
					sch.extraDefinitions = Definitions{}
				}
				s := new(Schema)
				if err := json.Unmarshal(rawmsg, s); err != nil {
					return fmt.Errorf("error unmarshaling %s from json: %s", prop, err.Error())
				}
				sch.extraDefinitions[prop] = s
				continue
			}
		}
		if err := json.Unmarshal(rawmsg, val); err != nil {
			return fmt.Errorf("error unmarshaling %s from json: %s", prop, err.Error())
		}
		sch.Validators[prop] = val
	}

The relevant portion of the spec says that definitions must be collected under a "definitions" key
https://json-schema.org/latest/json-schema-validation.html#rfc.section.9

So, it seems like the the code should be updated to pull out just the "definitions" key, and then apply the "parse the values of each key under that as a Schema".

If you agree, I will add unit tests for this case and make a pull request.

@handrews
Copy link

handrews commented Apr 4, 2019

@robfig That's not quite what that part of the spec means. The definitions key is the standard place to put definitions, but you are not forced to do so.

The problem here is that the spec is clear that implementations SHOULD ignore keywords they do not support.

I had thought that there was a MUST NOT error, but I guess not. I might add that in the next draft. I think it is SHOULD be ignored rather than MUST because it would be perfectly valid for an implementation to treat such keywords as extra data like title or description. Possibly unexpected or unwanted, but it wouldn't break things.

@robfig
Copy link
Author

robfig commented Apr 5, 2019

So you're saying that additional definitions are not required to be collected under a "definitions" keyword, but the example from the link:

{
    "type": "array",
    "items": { "$ref": "#/definitions/positiveInteger" },
    "definitions": {
        "positiveInteger": {
            "type": "integer",
            "exclusiveMinimum": 0
        }
    }
}

could equivalently be

{
    "type": "array",
    "items": { "$ref": "#/definitions/positiveInteger" },
    "positiveInteger": {
        "type": "integer",
        "exclusiveMinimum": 0
    }
}

?

I'm having trouble reconciling that meaning with the linked section on json-schema.org, but it seems like you know what you're talking about.

In that case, I suppose the right way to allow opaque key/values at this level in the document would be to continue to attempt to parse the values as a definition, but ignore any errors that result instead of treating them as fatal.

How does that sound?

robfig pushed a commit to robfig/jsonschema that referenced this issue Apr 5, 2019
add a unit test to verify the behavior

fixes qri-io#44
@handrews
Copy link

handrews commented Apr 5, 2019

it seems like you know what you're talking about.

Well, I'm the "handrews" in draft-handrews-json-schema-01 😝

The example you gave saying that those things would be equivalent is not correct, though.

$ref takes a URI, and URIs like #/definitions/positiveInteger are JSON Pointers. So that form of $ref must match the JSON object structure. $ref also MUST point to a JSON Schema (and not, say, a string or number or array, or an object with some other semantics).

I'm not sure why you are focused on definitions here, to be honest. Adding a custom extension keyword such as opaque really has nothing to do with where definitions live or how you reference them.

@b5
Copy link
Member

b5 commented Apr 5, 2019

Thanks for filing this @robfig. Totally agreed the error you've encountered is a bug that needs fixing.

I think @handrews has articulated the we should try to tackle here:

The problem here is that the spec is clear that implementations SHOULD ignore keywords they do not support.

I think you're totally right to point to the poor assumption this lib currently makes in assuming everything is a definition. We should be doing what the spec says and ignore unsupported keywords.

However, one behavior I'd like to keep is not just ignoring unrecognized keywords, but also giving them back if a user were to ever json.Marshal a jsonschema.RootSchema. Put another way, the de-serialization process shouldn't be semantically lossy. We would obviously clobber whitesapce from input data, but values encoded should also be decoded.

Thanks so much for your PR, the unit test you've provided is great, but I agree the solution needs to be a little more robust. I think we should refactor the lib to carry deserialize unknown values to opaque values instead of "extra definitions", and re-add any opaque values on json.Marshal. Sound good?

@robfig
Copy link
Author

robfig commented Apr 5, 2019

The mismatch in the second example between $ref and the location of the definition was unintentional. I meant it to be this:

{
    "type": "array",
    "items": { "$ref": "#/positiveInteger" },
    "positiveInteger": {
        "type": "integer",
        "exclusiveMinimum": 0
    }
}

I'm focused on definitions here because that's what this implementation is treating my opaque keyword as, instead of ignoring it as I would prefer. In your last comment, you said that it was allowed to do so.. that might be true, but it's also allowed to not fail to parse a schema with an opaque keyword, and that's arguably the more helpful behavior.

@b5
Copy link
Member

b5 commented Apr 5, 2019

@robfig I'd be happy to work with you on the change, I can write up a rough outline of a PR we'd accept if you're ok with implementing the change

@b5
Copy link
Member

b5 commented Apr 5, 2019

hahah looks like we came to the same conclusion @robfig 😄

@eqinox76
Copy link

Hey all,
i was provided with a json schema from another customer which is relatively big and has severe typos in several locations, e.g."$comments". Those are running into the here mentioned definitions:

error unmarshaling properties from json: error unmarshaling $comments from json: json: cannot unmarshal string into Go value of type jsonschema._schema

The json schema should be fixed but to get something fast of the ground it would be handy to have some kind of relaxed syntax check option.

Thanks for your work and the jsonschema lib in any case!

@peterbourgon
Copy link

peterbourgon commented Dec 16, 2019

I'm also blocked on this issue for a project. Can I help move things forward somehow?

However, one behavior I'd like to keep is not just ignoring unrecognized keywords, but also giving them back if a user were to ever json.Marshal a jsonschema.RootSchema. Put another way, the de-serialization process shouldn't be semantically lossy.

Is it reasonable for a user to expect that json.Marshaling a parsed jsonschema.RootSchema would be lossless? Is there a well-defined use case that would require this behavior? The standard library, for example, offers no such guarantees, nor provides easy ways for users to implement it themselves.

@handrews
Copy link

@b5 for some reason I didn't notice this part before:

However, one behavior I'd like to keep is not just ignoring unrecognized keywords, but also giving them back if a user were to ever json.Marshal a jsonschema.RootSchema. Put another way, the de-serialization process shouldn't be semantically lossy. We would obviously clobber whitesapce from input data, but values encoded should also be decoded.

Now that draft 2019-09 (formerly known as draft-08 but we switched to dates for meta-schema versioning) is out, you should take a look at the recommended output formats, specifically for collecting annotations (the output format is also for error reporting, and I think most of the examples focus on that- we should add more annotation examples).

We considered saying that unknown keywords SHOULD be collected as annotations, but annotation-collecting is an optional feature (we didn't want to suddenly make all existing validators non-conforming by requiring a whole new concept to be implemented). But that's basically what it sounds like you want to do, so I'd recommend taking a look at how 2019-09 specifies that should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants