-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added database existence checks and clear_tables function #593
Conversation
I added a parameter to the RelationalDBStore init for creating the database. The creation step has been updated to include a 'drop database' step. This insures that the database used in the test is always up-to-date with the latest schema. |
LGTM @chisholm - can you take a quick look? |
I think "creating" and "instantiating" a database seems awfully similar. Having those two parameters together could be confusing. "create_db" is also missing from the data store docstring. One of the design principles I think we should keep in mind is to allow the source/sink to be usable by themselves, not just in concert with a store. After all, we're supposed to be able to mix 'n match sources and sinks. With current code, it looks like you can have your database automatically created if you use a store, but not if you use a sink. I think whatever the store can do as far as modifying/mutating database operations (creation, insertion, etc) should also be doable with the sink. You also might want to be careful with dropping an existing database. Perhaps there should be a drop-if-exists option which defaults to False, rather than unconditionally deleting it. The most common use case would probably be "make sure it's there", not "blow it away if it's there". So what would be a better constructor signature than create and instantiate true/false... hmm. We want to allow some finer control over what gets automatically created. The choices are "database + tables", "tables only", "nothing"? And another setting regarding what to do if they already exist, with choices "drop" and "nothing"? Maybe we need a three-valued (maybe enum-valued?) parameter describing what to create, and a separate drop_if_exists option which applies to everything to be created (both databases and tables)? |
@robthew - any thoughts on Andy's comments? |
It seems like we could end up with a lot of duplicated code in the store/sink/source. What's the use case for needing database creation in the source? In the sink we are inserting data, so the database needs to be created if it doesn't exist already, but when would you need to go through a database creation step when using a source by itself? It seems like you couldn't use a source and get data without first putting data in with a sink. |
I don't think mutable operations (creating, inserting, etc) should be in the source, only the sink/store. The relational data source docstring says as much: cti-python-stix2/stix2/datastore/relational_db/relational_db.py Lines 187 to 190 in dce689a
Oops... should say "store/sink", not "source/sink" :-P That's my typo! There's no need for duplication between store/sink if the store can delegate to the sink. That's how it is now, for schema (in the postgres sense)/table creation. The sink has code for that; the store does not. There is some duplication with regard to table schema creation (the SQLAlchemy table objects and metadata) so that the source/sink are usable independently as well as paired together in a store, without having to duplicate the table schema creation work. So, relational db store can create that once and reuse for both its source and sink, but source/sink also need to be able to create that on their own when necessary. But that's factored out to a separate reusable module so the duplication is just a couple lines of code, not that big a deal. |
I adjusted the database creation options. The 'instantiate_database' option is True by default, and will create the database if it doesn't exist, and create the schemas and tables. The 'force_recreate' is False by default and will drop an existing database and recreate it. This should only be needed when changes have been made to the schema. On a separate note - Postgres doesn't like dashes in database names. Should the 'stix-data-sink' name changed to use underscores? |
LGTM |
I didn't actually run into the hyphen problem with database names; pgadmin4 lets you create/drop the database without problems. But I guess if you were hand-writing SQL, you'd need quotes. Sure, underscores are a bit simpler to use. |
316edb3
into
oasis-open:relational-data-sink
This uses the python library 'sqlalchemy_utils', so you will need to perform an install:
pip install sqlalchemy_utils