Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideal database #13

Open
finnbear opened this issue Jan 31, 2020 · 12 comments
Open

Ideal database #13

finnbear opened this issue Jan 31, 2020 · 12 comments
Labels
RFC Request For Comments
Milestone

Comments

@finnbear
Copy link
Member

Please discuss here.

  • Relational
    • SQL
    • SQLite
  • Document
    • Mongo DB
    • Custom JSON <- current option
  • ORM
@ChadBailey
Copy link

ChadBailey commented Jan 31, 2020

IMO, let's just start building as memory resident then if we find a compelling reason to move to a database let's drive that decision of which to choose off of the particular challenge that comes up

@joaodforce
Copy link
Contributor

joaodforce commented Jan 31, 2020

It cannot be memory resident, the tracked bills must be persistent, the Custom JSON option is the best for the medium term.
Because it makes it very simple to host, we basically need to store a very small object array.

We also consider building a flag into code to limit it to one request per server, this way we resolver concurrency issues, and also reduce the load on the BOT.

@ChadBailey
Copy link

Sorry, I really just meant the comparisons being done against an object held in memory. I didn't mean to not dump that to a .json file.

So, the way I was suggesting is: [new data] -> add to memory -> dump memory to .json file. on initial load -> load .json file to memory

@joaodforce
Copy link
Contributor

Sorry, I really just meant the comparisons being done against an object held in memory. I didn't mean to not dump that to a .json file.

So, the way I was suggesting is: [new data] -> add to memory -> dump memory to .json file. on initial load -> load .json file to memory

Oh sorry I must had misread it, but still I was clearing up doubts about the other options.

We will test how it performs, we don't expect this bot to be very resource heavy, but if you look at the code this is basically how it is being done right now.

Loading the tracked bills from the file, checking/updating their status from the API, updating the file on disk, and then messaging the channels if there are updates.

@ChadBailey
Copy link

oh, i see... sorry i have not had a chance to get up to speed on the way it's working now. are there any challenges with the current implementation? any reason you might expect this to need to be changed soon?

@joaodforce
Copy link
Contributor

oh, i see... sorry i have not had a chance to get up to speed on the way it's working now. are there any challenges with the current implementation? any reason you might expect this to need to be changed soon?

The first change we foresaw on this approach was concurrency, if like two request to update the file came at once, the resulting file would be written incorrectly.

With that problem in mind we first thought about requesting a lock for the file manipulation, but then we have an even better idea, to limit the requests to 1 per server. That way we keep the bot load down, and resolve the concurrency issues.

And given the scope of the project, there won't be a need for any database, because we are not looking for storing information and all the status changes from the bills.
We plan to keep it concise and only stream the data as it changes.

@ChadBailey
Copy link

I see what you mean about the concurrency problem... but that's actually an issue that extends beyond merely writing to file - that issue exists before the data touches the physical storage layer. said differently, if you have a concurrency problem when writing you probably have a concurrency bug in your code.

Probably the easiest way to address this is by implementing a simple FIFO queue such as this https://stackoverflow.com/questions/1590247/how-do-you-implement-a-stack-and-a-queue-in-javascript

Sorry I'm commenting before actually analyzing the code, i just felt that was an important thing to point out since it may impact your current architecture.

@caldane
Copy link

caldane commented Jan 31, 2020

Why is a DB layer not viable at this point? I am not sure I understand that part of the equation and if I understood why it isn't viable then I would better understand the problem.

@x47188 x47188 added the RFC Request For Comments label Jan 31, 2020
@finnbear
Copy link
Member Author

finnbear commented Feb 1, 2020

There is something nice about the entire database/config being stored in a human-readable JSON file.

@ghost
Copy link

ghost commented Feb 1, 2020

As much as I love/prefer working with traditional SQL, the only benefit I see to using it would be the aforementioned solution to concurrency problems. Not all concurrency issues are bugs; if they were, enterprise storage arrays wouldn't exist. Anyway, there is very little opportunity to de-duplicate fields within the JSON file. We would get some savings from the states (breaking that into its own table; I mean MA, TX, TN, etc), and a lot of savings from not having the same keywords 1,000 times over and over. But we could get around that problem far more easily by just passing the resulting JSON file through LZMA before we write it to disk and we'd get nearly the same space savings as we'd get from a normal database.

If we don't already, it would be a good idea to keep a separate database file (JSON or otherwise) of timers for when to refresh API objects. The API doesn't provide a cache timer within the JSON result (or at least I didn't see it when I checked) so we would need to write down our own 'cached until' fields so we'd know when to re-check for data.

@TheDevMinerTV
Copy link
Member

TheDevMinerTV commented Feb 1, 2020

I would recommend LokiJS because it saves the data as regular JSON files, is synchronous, supports "collections", lightweight and easy to use.

@x47188 x47188 modified the milestones: MVP, V2 Feb 2, 2020
@caldane
Copy link

caldane commented Feb 3, 2020

I get that it being human readable is nice, but I have been told it is not viable. Document DB's are just as readable as JSON files, so that doesn't seem to be a valid reason to stay away from them. And double-clicking a file vs double-clicking a collection does seem to add complexity either. So if I understood the reason that DB's are not a viable solution I think I could help out with coming up with a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments
Projects
None yet
Development

No branches or pull requests

6 participants