You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First: thanks for the effort on this tool! Is coming in handy for me on a current project as we embrace the spec in earnest.
It seems like the optional feature-level installation via pip is intended to pick and choose which data sources to make available for testing connections. This makes sense, but I wonder if the central contract mgmt and this connectivity functionality can be further delineated without too much trouble.
I noticed this when doing pip install datacontract-cli (no features specified) and then trying to run datacontract:
(venv) √ contracts 15:38:27 % datacontract lint customer.yaml
Traceback (most recent call last):
File "dir/bin/datacontract", line 5, in <module>
from datacontract.cli import app
File "dir/lib/python3.11/site-packages/datacontract/cli.py", line 15, in <module>
from datacontract import web
File "dir/lib/python3.11/site-packages/datacontract/web.py", line 7, in <module>
from datacontract.data_contract import DataContract, ExportFormat
File "dir/lib/python3.11/site-packages/datacontract/data_contract.py", line 16, in <module>
from datacontract.engines.soda.check_soda_execute import check_soda_execute
File "dir/lib/python3.11/site-packages/datacontract/engines/soda/check_soda_execute.py", line 11, in <module>
from datacontract.engines.soda.connections.duckdb import get_duckdb_connection
File "dir/lib/python3.11/site-packages/datacontract/engines/soda/connections/duckdb.py", line 3, in <module>
from deltalake import DeltaTable
ModuleNotFoundError: No module named 'deltalake'
It looks like the above may be fixed in the next release, based on release notes. But for me, I then manually ran pip install deltalake and that got me past this error to this one:
(venv) √ contracts 15:39:10 % datacontract lint customer.yaml
Traceback (most recent call last):
File "dir/bin/datacontract", line 5, in <module>
from datacontract.cli import app
File "dir/lib/python3.11/site-packages/datacontract/cli.py", line 15, in <module>
from datacontract import web
File "dir/lib/python3.11/site-packages/datacontract/web.py", line 7, in <module>
from datacontract.data_contract import DataContract, ExportFormat
File "dir/lib/python3.11/site-packages/datacontract/data_contract.py", line 16, in <module>
from datacontract.engines.soda.check_soda_execute import check_soda_execute
File "dir/lib/python3.11/site-packages/datacontract/engines/soda/check_soda_execute.py", line 12, in <module>
from datacontract.engines.soda.connections.kafka import create_spark_session, read_kafka_topic
File "dir/lib/python3.11/site-packages/datacontract/engines/soda/connections/kafka.py", line 3, in <module>
from pyspark.sql import SparkSession
ModuleNotFoundError: No module named 'pyspark'
'check_soda_execute' seems to have dependencies on connection-related modules under 'soda'. This seems to violate the implied segregation of contract mgmt and connectivity functionality.
If I pass no source-centric feature flags, it seems like no sources/connections config should be needed to run the tool. This would also assist greatly in running a lighter-weight version of datacontract-cli as a centralized service, where it would only be performing contract management. We are currently planning on running it as a centralized service, anyway; it's just pretty meaty with connection-related application code it will never use.
Perhaps 'check_soda_execute' should truly be a [graceful] check or else conditionally called based on command line flag, availability of 'server' information in the contract, etc.
Thanks again.
The text was updated successfully, but these errors were encountered:
This looks right (full disclosure: I'm still ramping up on your tool).
The purist in me was thinking only datacontract test --examples should be a part of this test case and what it represents (i.e., contract mgmt functionality), but think I see why you're saying what you're saying (plus I'm sure I'm also projecting). Today the 'local' testing seems implicit to the base install as I look at pyproject.toml (along with sql/json import/export), and that's lightweight enough and, of course, local.
I this this makes sense.
First: thanks for the effort on this tool! Is coming in handy for me on a current project as we embrace the spec in earnest.
It seems like the optional feature-level installation via pip is intended to pick and choose which data sources to make available for testing connections. This makes sense, but I wonder if the central contract mgmt and this connectivity functionality can be further delineated without too much trouble.
I noticed this when doing
pip install datacontract-cli
(no features specified) and then trying to run datacontract:It looks like the above may be fixed in the next release, based on release notes. But for me, I then manually ran
pip install deltalake
and that got me past this error to this one:'check_soda_execute' seems to have dependencies on connection-related modules under 'soda'. This seems to violate the implied segregation of contract mgmt and connectivity functionality.
If I pass no source-centric feature flags, it seems like no sources/connections config should be needed to run the tool. This would also assist greatly in running a lighter-weight version of datacontract-cli as a centralized service, where it would only be performing contract management. We are currently planning on running it as a centralized service, anyway; it's just pretty meaty with connection-related application code it will never use.
Perhaps 'check_soda_execute' should truly be a [graceful] check or else conditionally called based on command line flag, availability of 'server' information in the contract, etc.
Thanks again.
The text was updated successfully, but these errors were encountered: