Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Echodataflow Redesign #134

Open
Sohambutala opened this issue Oct 25, 2024 · 0 comments
Open

Echodataflow Redesign #134

Sohambutala opened this issue Oct 25, 2024 · 0 comments

Comments

@Sohambutala
Copy link
Collaborator

Sohambutala commented Oct 25, 2024

  1. Handling Windows and Multiple Sources

In the Source Pydantic model, extract_source needs to support globbing files relevant to the specified window. Currently, it returns a list of files, but it needs to be enhanced to return a dictionary of lists to support multiple source configurations along with window options. This dictionary will hold separate lists of files for each source and window.

The parse_raw_paths function should be updated to work with this dictionary, grouping files based on the information provided in a text file, a zip file, or window options. Then, club_raw_files can further process these grouped files and combine them into a single Zarr file when window options are specified, which will be stored locally. The file paths will be updated to point to this new local Zarr source.

Below are some scenarios to consider:

Case 1: Shimada, single source without a window

In this scenario, the club_raw_files function will not perform any file combinations. Instead, it will add metadata by extracting the filenames and return a dictionary of dictionaries. In real-time situations, there will be a single group, and groups will only be formed when grouping information is provided.

Case 2: 18 kHz and 5 Frequencies with window options

Here, club_raw_files will receive a dictionary of dictionaries representing different windows, each containing the relevant files. club_raw_files will convert these files to tensors, merge them, and update the paths accordingly.

Case 3: 18 kHz, 5 Frequencies, and Scores with window options

This case is similar to Case 2, with the additional step of integrating the scores after the files have been merged.

  1. Task Library

The tasklib module can be used to define and organize various tasks. A more flexible approach would be to dynamically fetch the task from the user's library and wrap it with a Prefect Task:

task_fn = dynamic_function_call(task.module, task.name)

task_fn = Task(task_fn)

This allows tasks to be dynamically loaded and executed, making the system more modular and extensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant