Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory usage of CXG conversion #7310

Closed
Bento007 opened this issue Jul 25, 2024 · 1 comment
Closed

Optimize memory usage of CXG conversion #7310

Bento007 opened this issue Jul 25, 2024 · 1 comment
Assignees
Labels
dp Data Platform workstream tech Tech issues that do not require product prioritization. Tech debt, tooling, ops, etc.

Comments

@Bento007
Copy link
Contributor

Motivation

Currently resources allocated to the CXG conversion step are not well utilized. Many VCPUs are allocated to the container to meet AWS memory/VCPU requirement, but very little of the actual process uses multiple CPUs. Reducing the memory needed will allow us to reduce the number of VCPUs, and support processing more cxg in parallel. This will save on compute costs by reducing the compute size. In the future we can rewrite the code to make better use of multiple CPUS to speed up CXG conversion.

Definition of Done

  • The size of the dataset does not have a significant(20% diff) impact on the maximum memory used by cxg_conversion.

Tasks

related issue

@Bento007 Bento007 added tech Tech issues that do not require product prioritization. Tech debt, tooling, ops, etc. dp Data Platform workstream labels Jul 25, 2024
@nayib-jose-gloria
Copy link
Contributor

Estimate (incl testing): 1-1.5 weeks

@Bento007 for comment if you agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dp Data Platform workstream tech Tech issues that do not require product prioritization. Tech debt, tooling, ops, etc.
Projects
None yet
Development

No branches or pull requests

3 participants