generated from cormorack/python-project-template
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #22 from OSOceanAcoustics/dev
Revamped Echoflow Design
- Loading branch information
Showing
85 changed files
with
10,105 additions
and
1,395 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
#!/bin/bash | ||
|
||
# Step 1: Create a Python Virtual Environment | ||
python3 -m venv $HOME/env/echoflow-prod | ||
source $HOME/env/echoflow-prod/bin/activate | ||
|
||
# Step 2: Clone the Echoflow Repository | ||
cd $HOME/ | ||
git clone https://github.com/OSOceanAcoustics/echoflow.git | ||
cd $HOME/echoflow | ||
|
||
# Step 3: Checkout the Dev Branch and Update (Optional) - Skip if using Prod/main branch | ||
git checkout dev | ||
git pull origin dev | ||
|
||
# Step 4: Install the Echoflow Project in Editable Mode | ||
pip install -e . | ||
|
||
# Step 5: Log in to Prefect Cloud and Set Your API Key - Change to step 5b if using prefect locally | ||
echo "Enter Prefect API key: " | ||
read prefectKey | ||
prefect cloud login -k $prefectKey | ||
|
||
# Step 5b: Setup prefect locally | ||
# prefect profile create echoflow-local | ||
|
||
# Step 6: Set Up the Prefect Worker as a Systemd Service | ||
echo "Enter Work Pool Name: " | ||
read workPool | ||
cd /etc/systemd/system | ||
|
||
# Create and edit the prefect-worker.service file | ||
sudo cat <<EOL > prefect-worker.service | ||
[Unit] | ||
Description=Prefect-Worker | ||
[Service] | ||
User=$(whoami) | ||
WorkingDirectory=$HOME/echoflow | ||
ExecStart=$(which prefect) agent start --pool $workPool | ||
Restart=always | ||
[Install] | ||
WantedBy=multi-user.target | ||
EOL | ||
|
||
# Step 7: Restart to to make systemd aware of the new service | ||
sudo systemctl daemon-reload | ||
|
||
# Optionally, enable the service to start at boot | ||
sudo systemctl enable prefect-worker.service | ||
|
||
# Step 8: Start the Prefect Worker Service | ||
sudo systemctl start prefect-worker.service | ||
|
||
echo "Setup completed. The Echoflow worker is now running. Send tasks to $workPool using Prefect UI or CLI." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
# Echoflow Configuration and Credential Blocks | ||
|
||
Echoflow leverages the concept of "blocks" from Prefect, which serve as containers for storing various types of data, including credentials and sensitive information. Currently, Echoflow supports two types of blocks: Azure Cosmos DB Credentials Block and AWS Credentials Block. These blocks allow you to securely store sensitive data while benefiting from Prefect's robust integration capabilities. | ||
|
||
For a deeper understanding of blocks, you can refer to the [Prefect documentation](https://docs.prefect.io/2.11.5/concepts/blocks/). | ||
|
||
## Types of Blocks in Echoflow | ||
|
||
In the context of Echoflow, there are two main categories of blocks: | ||
|
||
### 1. Echoflow Configuration Blocks | ||
|
||
These blocks serve as repositories for references to credential blocks, as well as repositories for the various Prefect profiles that have been established using Echoflow's functions. | ||
|
||
### 2. Credential Blocks | ||
|
||
Credential blocks store sensitive information, such as authentication keys and tokens, securely. Echoflow integrates with Prefect's capabilities to ensure that sensitive data is protected. | ||
|
||
## Creating Credential Blocks | ||
|
||
Credential blocks can be conveniently created using an `.ini` file. By leveraging Prefect's integration, Echoflow ensures that the credentials stored in these blocks are handled securely. To create a credential block, you can follow these steps: | ||
|
||
1. Open the `credentials.ini` file, which is located under the `.echoflow` directory in your home directory. | ||
```bash | ||
# Terminal command | ||
cd ~/.echoflow | ||
``` | ||
2. Place the necessary credential information within the `credentials.ini` file. | ||
```bash | ||
# Terminal command | ||
nano credentials.ini # Or use any of your favourite editors | ||
``` | ||
3. Store the updated `.ini` file in the `.echoflow` directory, which resides in your home directory. | ||
4. Utilize [echoflow load-credentials](../../echoflow/stages/subflows/echoflow.py#load_credential_configuration) command to generate a new credential block, leveraging the content from the `.ini` file. | ||
```bash | ||
echoflow load-credentials | ||
``` | ||
5. Add the name of the block in pipeline or datastore yaml configuration files under `storage_options` section with the appropriate storage type (refer [StorageType](../../echoflow/config/models/datastore.py#StorageType)). | ||
|
||
```yaml | ||
# Example | ||
storage_options: | ||
block_name: echoflow-aws-credentials # Name of the block containing credentials | ||
type: AWS # Specify the storage type using StorageType enum | ||
``` | ||
By providing the block name and storage type, ensure that the correct block is used for storage operations, and maintain clarity regarding the chosen storage type. | ||
Once a credential block is created, it can be managed through the Prefect Dashboard. Additionally, if needed, you can use the `echoflow load-credentials` command with the `--sync` argument to ensure your blocks stay up-to-date with any changes made in the Prefect UI. This ensures that your configurations remain accurate and aligned across the application. **It is highly recommended to create new blocks whenever possible, as modifying existing blocks can lead to data loss or conflicts.** | ||
|
||
## Considerations When Using `echoflow load-credentials` | ||
|
||
When utilizing the `echoflow load-credentials` command, be aware of the following considerations: | ||
|
||
- **Overwriting Values**: When using `echoflow load-credentials`, all the values from the `.ini` file will be written to the credential block, potentially overwriting existing values. Exercise caution when using this command to prevent unintentional data loss. | ||
- **Creating New Blocks**: To maintain data integrity and security, it's advised to create new blocks rather than modifying existing ones. If editing an existing block becomes necessary, it should be done through the Prefect Dashboard. | ||
- **Sync Argument**: The `--sync` argument is available in the `echoflow load-credentials` command. When set, this option syncs the credential block updates with the Prefect UI. This feature facilitates the seamless management of blocks through the dashboard, enhancing collaboration and control over credentials. | ||
|
||
By adhering to these guidelines, you can ensure the secure management of sensitive information while effectively configuring and utilizing Echoflow within your projects. | ||
|
||
|
||
# Configuration File Explanation: credentials.ini | ||
|
||
This Markdown file explains the structure and contents of the `credentials.ini` configuration file. | ||
|
||
## AWS Section | ||
|
||
The `[AWS]` section contains configuration settings related to AWS credentials. | ||
|
||
- `aws_access_key_id`: Your AWS access key. | ||
- `aws_secret_access_key`: Your AWS secret access key. | ||
- `aws_session_token`: AWS session token (optional). | ||
- `region_name`: AWS region name. | ||
- `name`: Name of the AWS credentials configuration. | ||
- `active`: Indicates if the AWS credentials are active (True/False). | ||
- `options`: Additional options for AWS configuration. | ||
|
||
## AzureCosmos Section | ||
|
||
The `[AZCosmos]` section contains configuration settings related to Azure Cosmos DB credentials. | ||
|
||
- `name`: Name of the Azure Cosmos DB credentials configuration. | ||
- `connection_string`: Azure Cosmos DB connection string. | ||
- `active`: Indicates if the Azure Cosmos DB credentials are active (True/False). | ||
- `options`: Additional options for Azure Cosmos DB configuration. | ||
|
||
Example: | ||
|
||
```ini | ||
[AWS] | ||
aws_key = my-access-key | ||
aws_secret = my-secret-key | ||
token = my-session-token | ||
region = us-west-1 | ||
name = my-aws-credentials | ||
active = True | ||
option_key = option_value | ||
[AZCosmos] | ||
name = my-az-cosmos-credentials | ||
connection_string = my-connection-string | ||
active = True | ||
option_key = option_value | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Echoflow Run Configuration Documentation | ||
|
||
This document provides detailed explanations for the keys used in the provided YAML configuration used to define an Echoflow run. | ||
|
||
## Run Details | ||
|
||
- `name`: This key specifies the name of the Echoflow run. It is used to identify and label the execution of the Echoflow process. | ||
- `sonar_model`: This key indicates the model of the sonar used for data collection during the run. | ||
- `raw_regex`: This key indicates the regex to be used while parsing the source directory to match the files to be processed. | ||
|
||
## Input Arguments | ||
|
||
- `urlpath`: This key defines the source data URL pattern for accessing raw data files. The pattern can contain placeholders that will be dynamically replaced during execution. | ||
- `parameters`: This section holds parameters used in the source data URL. These parameters dynamically replace placeholders in the URL path. | ||
- `storage_options`: This section defines storage options for accessing source data. It may include settings to anonymize access to the data. | ||
- `transect`: This section provides information about the transect data, including the URL of the transect file and storage options. | ||
- `json_export`: When set to true, this key indicates that raw JSON metadata of files should be exported for processing. | ||
- `raw_json_path`: This key defines the path where the raw JSON metadata will be stored. It can be used to skip parsing files in the source directory and instead fetch files from this JSON. | ||
|
||
## Output Arguments | ||
|
||
- `urlpath`: This key defines the destination data URL where processed data will be stored. | ||
- `overwrite`: When set to true, this key specifies that the data should overwrite any existing data in the output directory. | ||
- `storage_options`: This section defines storage options for the destination data, which may include details such as the block name and type. | ||
|
||
## Notes | ||
|
||
- The provided configuration serves as a structured setup for executing an Echoflow run, allowing customization through the specified keys. | ||
- Dynamic placeholders like `ship_name`, `survey_name`, and `sonar_model` are replaced with actual values based on the context. | ||
|
||
Example: | ||
|
||
```yaml | ||
name: Bell_M._Shimada-SH1707-EK60 # Name of the Echoflow Run | ||
sonar_model: EK60 # Sonar Model | ||
raw_regex: (.*)-?D(?P<date>\w{1,8})-T(?P<time>\w{1,6}) # Regex to parse the filenames | ||
args: # Input arguments | ||
urlpath: s3://ncei-wcsd-archive/data/raw/{{ ship_name }}/{{ survey_name }}/{{ sonar_model }}/*.raw # Source data URL | ||
parameters: # Source data URL parameters | ||
ship_name: Bell_M._Shimada | ||
survey_name: SH1707 | ||
sonar_model: EK60 | ||
storage_options: # Source data storage options | ||
anon: true | ||
transect: # Source data transect information | ||
file: ./x0007_fileset.txt # Transect file URL. Accepts .zip or .txt file | ||
storage_options: # Transect file storage options | ||
block_name: echoflow-aws-credentials # Block name. For more information on Blocks refer blocks.md | ||
type: AWS # Block type | ||
default_transect_num: 1 # Set when not using a file to pass transect information | ||
json_export: true # Export raw json metadata of files to be processed | ||
raw_json_path: s3://echoflow-workground/combined_files/raw_json # Path to store the raw json metadata. Can also work to skip the process of parsing the files at source directory and fetch files present in this json instead. | ||
output: # Output arguments | ||
urlpath: s3://echoflow-workground/combined_files_dask # Destination data URL parameters | ||
overwrite: true # Flag to overwrite the data if present in the output directory | ||
storage_options: # Destination data storage options | ||
block_name: echoflow-aws-credentials | ||
type: AWS | ||
``` |
Oops, something went wrong.