Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse mode of study view #11224

Merged
merged 150 commits into from
Dec 19, 2024
Merged

Clickhouse mode of study view #11224

merged 150 commits into from
Dec 19, 2024

Conversation

alisman
Copy link
Contributor

@alisman alisman commented Nov 22, 2024

Re-implement study view filtering endpoints using Clickhouse. Keep legacy implementation when clickhouse mode is set to off (default).

The goals and plan of this project are detailed in this RFC80 document

@@ -223,6 +263,197 @@ public Pair<List<CopyNumberCountByGene>, Long> getPatientCnaGeneCounts(List<Mole
);
}

@Override
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haynescd what's all this stuff?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the stuff for the new clickhouse implementation

@@ -0,0 +1,32 @@
DROP TABLE IF EXISTS sample_list_columnstore;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haynescd we can kill this file, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets delete this

haynescd and others added 25 commits November 23, 2024 23:28
* Add Columnar SQL file to init Clickhouse DB

* Refactored Mapper xml to extract StudyViewFilterMapper
* ✅ Add Unit test for StudyViewMapper Clickhouse

* ✅ Update db props to include mysql and clickhouse datasources to fix tests

* Address comments

* Rename package to clickhouse

* Update to static final

* Use bean name instead of qualifier
* Create new wide table sql file and rename package

* Remove genomic_event view

* Add AlterationFilter to mutated_genes endpoint

* Add AlterationFilter to mutated-genes endpoint

* Fix unit test

* Fix sonar issues

* Add test for mutation types and status

* remove unused imports
* add missing poc clinical data binning function
* Add sample_mv materialized view and use it in mappers
* Add Support for TotalProfiledCase Counts for Mutated-genes endpoint.

* Create sql files to create new tables

* Add unit test for totalProfiledCount

* Add matching gene panel ids

* Add TotalProfiledCountsWithoutPanelData

* Add profileCount for genes without gene panel data

* Add Comments for SQL

* Update matching Gene Panel Ids

* Clean up code

* Fix test

* Add query to get correct Gene Panels

* Fix unit test

* Add comments
* working poc

* refactor logic into service, so clean

* refactor for parameters builder, simplify min max logic, streamline service call

* remove unused services and imports

* remove more unused imports
* Implement molecular profile count endpoint using Clickhouse

* Cleanup
* ✨ Add CNA Gene Endpoint

* 🐛 Fix StudyViewFilterMapper.xml to allow ability to filter on gene and alteration

* Fix merge conflict

* Address comments

* Fix unit tests

* Fix sonar issues
* ✨ Add StructuralVariant-genes endpoint

* Fix sonar issues

* Update MatchingGenePanel request to return list

* Create and use sample_derive

* Update where sample_derived is stored to fix unit test
* use clinical_data_derived instead of sample_clinical_attribute_numeric_mv and patient_clinical_attribute_numeric_mv

* use clinical_attribute_meta instead of sample_clinical_attribute_numeric_mv and patient_clinical_attribute_numeric_mv

* remove unused clinical data count methods and SQL

* fix numericalClinicalDataCountFilter

* Move CategoricalClinicalAttributeFilter to repository

* remove unused columns

* Add override to methods

---------

Co-authored-by: haynescd <[email protected]>
…0857)

* Add patient_id column to genomic_event_derived

* Update sql to convert list of patients to list of samples
* refactor to use clickhouse

* filter out empty attr values

* edit comment

* fix sonarcloud issues

* use parallel stream, shaves off 5s

* use newer mapping annotation
onursumer and others added 14 commits December 10, 2024 14:21
* fix CNA query for genomic data filter

* rename one of the cna_query statements to cna_count_query to avoid table name clash
* make sure study ids exist before using them in filter SQL

* add involved cancer studies into StudyViewFilterHelper
simplify connection checks for circleci api tests
Improve circleci api tests by removing redundant image builds
* add more unit tests with clickhouse service methods

* fix sonar issues

* fix study view service errors

---------

Co-authored-by: Bryan Lai <[email protected]>
* Use prepared statements to avoid injection attack and clickhouse native array to improve performance

* Dynamically calculate sample identifiers in study view filter helper
@alisman alisman added the RFC80 label Dec 18, 2024
Comment on lines +80 to +82
registry.addInterceptor(new WebRequestHandlerInterceptorAdapter(
new ExecuterTimeInterceptor()
)).addPathPatterns("/**");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alisman do we want to keep this?

@@ -30,7 +30,7 @@ CorsConfigurationSource corsConfigurationSource() {
configuration.setAllowedHeaders(List.of("user-agent", "Origin", "Accept", "X-Requested-With","Content-Type",
"Access-Control-Request-Method","Access-Control-Request-Headers","Content-Encoding",
"X-Proxy-User-Agreement", "x-current-url"));
configuration.setExposedHeaders(List.of("total-count", "sample-count"));
configuration.setExposedHeaders(List.of("total-count", "sample-count", "elapsed-time"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above. Do we want to keep this?

import org.springframework.cache.annotation.Cacheable;
import org.springframework.stereotype.Service;

import java.util.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onursumer Probably need to update your IDE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.jetbrains.com/help/idea/creating-and-optimizing-imports.html#disable-wildcards-for-class This should do it. Strange that we need to actually enter a number like 999 to prevent it from happening.

import org.springframework.web.bind.annotation.ControllerAdvice;
import org.springframework.web.servlet.mvc.method.annotation.ResponseBodyAdvice;

@ControllerAdvice
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docs

alisman and others added 5 commits December 18, 2024 15:04
* Use prepared statements to avoid injection attack and clickhouse native array to improve performance

* Dynamically calculate sample identifiers in study view filter helper
* Remove CH code from legacy class and remove material_view.sql

* Update CH Test container to not use material_view.sql

---------

Co-authored-by: haynescd <[email protected]>
@alisman alisman merged commit 39caae5 into master Dec 19, 2024
21 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants