-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ckanext-datajson - Error decoding JSON object - spatial? #3549
Comments
This is one the the errors captured in ticket #3532 that is not properly handled and crash gather process. |
See additional logs in #3597 |
Here's a full example of backtrace related to this error: 2023-01-27 22:50:02,087 ERROR [ckanext.datajson.datajson] Failed to create package south-carolina-coastal-erosion-study-data-report-for-observations-october-2003-april-2004 from https://data.doi.gov/data.json
None - {'spatial': ["Error decoding JSON object: Expecting ',' delimiter or ']': line 1 column 41 (char 40)"]}
File "/home/vcap/deps/1/src/ckanext-spatial/ckanext/spatial/plugin/__init__.py", line 127, in check_spatial_extra
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/simplejson/__init__.py", line 525, in loads
geometry = json.loads(extra.value)
return _default_decoder.decode(s)
obj, end = self.raw_decode(s)
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/simplejson/decoder.py", line 372, in decode
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/simplejson/decoder.py", line 402, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
Traceback (most recent call last):
During handling of the above exception, another exception occurred:
File "/home/vcap/deps/1/bin/ckan", line 8, in <module>
sys.exit(ckan())
return self.main(*args, **kwargs)
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 782, in main
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 829, in __call__
rv = self.invoke(ctx)
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return callback(*args, **kwargs)
return ctx.invoke(self.callback, **ctx.params)
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/click/core.py", line 610, in invoke
File "/home/vcap/deps/1/src/ckanext-harvest/ckanext/harvest/cli.py", line 249, in fetch_consumer
utils.fetch_consumer()
File "/home/vcap/deps/1/src/ckanext-harvest/ckanext/harvest/utils.py", line 355, in fetch_consumer
File "/home/vcap/deps/1/src/ckanext-harvest/ckanext/harvest/queue.py", line 497, in fetch_callback
fetch_callback(consumer, method, header, body)
File "/home/vcap/deps/1/src/ckanext-harvest/ckanext/harvest/queue.py", line 515, in fetch_and_import_stages
success_import = harvester.import_stage(obj)
pkg = get_action('package_create')(self.context(), pkg)
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/ckanext/datajson/datajson.py", line 779, in import_stage
result = _action(context, data_dict, **kw)
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/ckan/logic/__init__.py", line 504, in wrapped
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/ckanext/geodatagov/logic.py", line 524, in package_create
item.create(pkg)
return up_func(context, data_dict)
File "/home/vcap/deps/1/python/lib/python3.8/site-packages/ckan/logic/action/create.py", line 207, in package_create
self.check_spatial_extra(package)
raise tk.ValidationError(error_dict, error_summary=package_error_summary(error_dict))
File "/home/vcap/deps/1/src/ckanext-spatial/ckanext/spatial/plugin/__init__.py", line 130, in check_spatial_extra
ckan.logic.ValidationError: None - {'spatial': ["Error decoding JSON object: Expecting ',' delimiter or ']': line 1 column 41 (char 40)"]}
Exit status 1 I would say that this is a pretty annoying and important bug to fix as agencies are unaware of spatial data not being harvested properly since it is not reported to them from the harvester. |
It looks like we have fixed and provided long term tests for this logic (though I think it would be better served by splitting these 6 assertions into separate tests): https://github.com/GSA/ckanext-geodatagov/blob/main/ckanext/geodatagov/tests/test_update_geo.py#L20-L38 |
Recent occurrences of this issue in NR. |
Unfortunately with the fix merged above on 3/17, this is still an issue... I even verified that the deploy was successful...
P.S. If it's any consolation, there might be less errors? 🤷♀️ |
Might be worth printing the object in the error case? |
That sounds like a good idea. I can do that since you're on O&M. |
With debug statements added, we'll wait for the next occurrence and then handle the errors that we find. |
So.... I don't know how to process these logs, but it seems like some coordinates are just not in an array? Like.. they're just numbers that are command separated and since they're not a formal array, that's why json can't load them? @FuhuXia @jbrown-xentity @jbrown-xentity @btylerburton @Jin-Sun-tts This is my best guess... |
That use case is actually a valid use case. See documentation on DCAT-US here, and a sample we test in geodatagov here. Now we don't have a e2e test, where we validate all of this together. We could add that to catalog, but I feel like this test already does that. |
@jbrown-xentity I think I'm missing something. Given that (1) the input data is valid, (2) the transform logic is working properly, (3) the |
Summary: this class of error still occurs. See logs here (and search |
Just to clarify, the intent with this ticket is to fix the issues in |
Correct. |
More errors have been captured and handled. Will revisit on Monday and subsequently the next week if it resurfaces because of another error. I strongly believe this error is a characteristic of the ckan harvesting logic, this would not re-appear in the same way in all harvesting systems. There have been comments added to the code as to why it was necessary to do the pre-processing of the data, if this does need to be handled in a newer version of harvesting, there are clues as to what to consider. |
Can we close this out, @nickumia-reisys , or do you want to leave open until we confirm we've covered all the cases? |
Not sure how necessary it is to fix this, but wanted to track somewhere...
Current deployment, are able to harvest invalid spatial metadata with ckanext-datajson. See example here, spatial value
POLYGON ((-125 49, -67 49, -67 25, -125 25))
does not conform to dcat-us spec.New CKAN2.9 on cloud.gov throws an error, see harvest logs on 11/19 like
ckan.logic.ValidationError: None - {'spatial': ['Error decoding JSON object: Expecting value: line 1 column 39 (char 38)']}
. It looks to be trying to format the string into a geojson type object (or something else?), but of course that will fail.How to reproduce
Expected behavior
Harvest succeeds (see latest prod harvest here)
Actual behavior
Harvest fails with errors
Sketch
The text was updated successfully, but these errors were encountered: