-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combine load_data and open_list_archives ? #512
Comments
How is the LISTSERV data stored locally after it has been collected, @Christovis ? Does your LISTSERV code use either of these methods, or does it use its own alternative version of them? Does the W3C scraper use mbox or csv format, finally? Maybe we should settle on a canonical data format for email archives of all kinds. Or, otherwise, we should maybe separate the storage of "raw" email data, when available, and the nicely preprocessed into schematized CSV format that we support deeper analysis on. |
> How is the LISTSERV data stored locally after it has been collected, @Christovis ? > Does your LISTSERV code use either of these methods, or does it use its own alternative version of them? > Does the W3C scraper use mbox or csv format, finally? |
Thanks @Christovis . This all makes a lot of sense. Another question: How does the LISTSERV functionality load the data from csv or mbox? If you think the design you've used in LISTSERV is good we could move everything over to that way. |
> How does the LISTSERV functionality load the data from csv or mbox? > If you think the design you've used in LISTSERV is good we could move everything over to that way. |
How much of the the ListservArchive functionality is specific to Listserv-originating data, and how much of it could be used on any email data stored in csv or mbox? |
At the moment I believe we could have an:
This idea is implemented in PR #534. |
@sbenthall given the new code structure that is emerging (at least for 3GPP, W3C, IEEE) is this issue still relevant or does it need to be rephrased? |
Well, I think it would be best if the Mailman code was refactored to fit into the new code structure from #534 I'm not sure what that means for |
https://github.com/datactive/bigbang/pull/500/files#r753826294
The text was updated successfully, but these errors were encountered: