You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently resources allocated to the CXG conversion step are not well utilized. Many VCPUs are allocated to the container to meet AWS memory/VCPU requirement, but very little of the actual process uses multiple CPUs. Reducing the memory needed will allow us to reduce the number of VCPUs, and support processing more cxg in parallel. This will save on compute costs by reducing the compute size. In the future we can rewrite the code to make better use of multiple CPUS to speed up CXG conversion.
Definition of Done
The size of the dataset does not have a significant(20% diff) impact on the maximum memory used by cxg_conversion.
Tasks
investigate lines of code with the comment # big memory usage.
Make changes to optimize memory usage of those lines.
Rerun the memory profiler and verify memory usage has decreased.
The text was updated successfully, but these errors were encountered:
Bento007
added
tech
Tech issues that do not require product prioritization. Tech debt, tooling, ops, etc.
dp
Data Platform workstream
labels
Jul 25, 2024
Motivation
Currently resources allocated to the CXG conversion step are not well utilized. Many VCPUs are allocated to the container to meet AWS memory/VCPU requirement, but very little of the actual process uses multiple CPUs. Reducing the memory needed will allow us to reduce the number of VCPUs, and support processing more cxg in parallel. This will save on compute costs by reducing the compute size. In the future we can rewrite the code to make better use of multiple CPUS to speed up CXG conversion.
Definition of Done
Tasks
# big memory usage
.tile_db_ctx
parameters could help.single-cell-data-portal/backend/layers/processing/h5ad_data_file.py
Line 33 in bd179bf
related issue
The text was updated successfully, but these errors were encountered: