Added database existence checks and clear_tables function #593

robthew · 2024-04-09T19:52:42Z

This uses the python library 'sqlalchemy_utils', so you will need to perform an install: pip install sqlalchemy_utils

robthew · 2024-04-10T18:03:46Z

I added a parameter to the RelationalDBStore init for creating the database.

The creation step has been updated to include a 'drop database' step. This insures that the database used in the test is always up-to-date with the latest schema.

rpiazza · 2024-04-10T18:15:02Z

LGTM

@chisholm - can you take a quick look?

chisholm · 2024-04-10T21:16:06Z

I think "creating" and "instantiating" a database seems awfully similar. Having those two parameters together could be confusing. "create_db" is also missing from the data store docstring.

One of the design principles I think we should keep in mind is to allow the source/sink to be usable by themselves, not just in concert with a store. After all, we're supposed to be able to mix 'n match sources and sinks. With current code, it looks like you can have your database automatically created if you use a store, but not if you use a sink. I think whatever the store can do as far as modifying/mutating database operations (creation, insertion, etc) should also be doable with the sink.

You also might want to be careful with dropping an existing database. Perhaps there should be a drop-if-exists option which defaults to False, rather than unconditionally deleting it. The most common use case would probably be "make sure it's there", not "blow it away if it's there".

So what would be a better constructor signature than create and instantiate true/false... hmm. We want to allow some finer control over what gets automatically created. The choices are "database + tables", "tables only", "nothing"? And another setting regarding what to do if they already exist, with choices "drop" and "nothing"? Maybe we need a three-valued (maybe enum-valued?) parameter describing what to create, and a separate drop_if_exists option which applies to everything to be created (both databases and tables)?

rpiazza · 2024-04-12T14:03:48Z

@robthew - any thoughts on Andy's comments?

robthew · 2024-04-12T18:30:38Z

It seems like we could end up with a lot of duplicated code in the store/sink/source. What's the use case for needing database creation in the source? In the sink we are inserting data, so the database needs to be created if it doesn't exist already, but when would you need to go through a database creation step when using a source by itself? It seems like you couldn't use a source and get data without first putting data in with a sink.

chisholm · 2024-04-12T20:33:04Z

I don't think mutable operations (creating, inserting, etc) should be in the source, only the sink/store. The relational data source docstring says as much:

cti-python-stix2/stix2/datastore/relational_db/relational_db.py

Lines 187 to 190 in dce689a

    
                   Initialize this source.  Only one of stix_object_classes and metadata 
        
                   should be given: if the latter is given, assume table schemas are 
        
                   already created.  Instances of this class do not create the actual 
        
                   database tables; see the source/sink for that.

Oops... should say "store/sink", not "source/sink" :-P That's my typo!

There's no need for duplication between store/sink if the store can delegate to the sink. That's how it is now, for schema (in the postgres sense)/table creation. The sink has code for that; the store does not.

There is some duplication with regard to table schema creation (the SQLAlchemy table objects and metadata) so that the source/sink are usable independently as well as paired together in a store, without having to duplicate the table schema creation work. So, relational db store can create that once and reuse for both its source and sink, but source/sink also need to be able to create that on their own when necessary. But that's factored out to a separate reusable module so the duplication is just a couple lines of code, not that big a deal.

robthew · 2024-04-15T18:29:13Z

I adjusted the database creation options. The 'instantiate_database' option is True by default, and will create the database if it doesn't exist, and create the schemas and tables. The 'force_recreate' is False by default and will drop an existing database and recreate it. This should only be needed when changes have been made to the schema.

On a separate note - Postgres doesn't like dashes in database names. Should the 'stix-data-sink' name changed to use underscores?

rpiazza · 2024-04-15T19:21:20Z

LGTM

chisholm · 2024-04-15T20:05:17Z

I didn't actually run into the hyphen problem with database names; pgadmin4 lets you create/drop the database without problems. But I guess if you were hand-writing SQL, you'd need quotes. Sure, underscores are a bit simpler to use.

robthew added 2 commits April 9, 2024 15:47

Added database existence checks and clear_tables function

3e91526

added parameters for database creation

f402280

Fixed db URL

9ef379c

Adjusted database creation options

d74253e

rpiazza merged commit 316edb3 into oasis-open:relational-data-sink Apr 16, 2024
1 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added database existence checks and clear_tables function #593

Added database existence checks and clear_tables function #593

robthew commented Apr 9, 2024

robthew commented Apr 10, 2024

rpiazza commented Apr 10, 2024

chisholm commented Apr 10, 2024

rpiazza commented Apr 12, 2024

robthew commented Apr 12, 2024

chisholm commented Apr 12, 2024

robthew commented Apr 15, 2024

rpiazza commented Apr 15, 2024

chisholm commented Apr 15, 2024

Added database existence checks and clear_tables function #593

Added database existence checks and clear_tables function #593

Conversation

robthew commented Apr 9, 2024

robthew commented Apr 10, 2024

rpiazza commented Apr 10, 2024

chisholm commented Apr 10, 2024

rpiazza commented Apr 12, 2024

robthew commented Apr 12, 2024

chisholm commented Apr 12, 2024

robthew commented Apr 15, 2024

rpiazza commented Apr 15, 2024

chisholm commented Apr 15, 2024