-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deconvolved estimates matching known cell type proportions #67
Comments
Apologize for the delay.
There are two reasons that might have caused underestimated cell type
fraction. The first is the reason you mentioned in the email: sparser
scRNA-seq reference. The second reason is that some cell types have lower
total transcription level, and as BayesPrism estimates reads% from each
cell type rather than cell count% of each cell type, it will tend to show a
seemingly underestimated fraction.
I would recommend that you look into the scRNA reference to see is this is
the case. Additionally you may also simply compute the fraction of reads
from the marker genes of each cell type in each mixture as a sanity check.
…On Wed, Nov 29, 2023 at 04:20 njrobins ***@***.***> wrote:
Hello! I recently employed BayesPrism to deconvolve a bulk RNAseq dataset
from a particular region of the embryonic mouse brain. This region (the
striatum) houses specific neuronal populations that exhibit a fairly
well-documented distribution; namely, spiny projection neurons (SPNs) make
up ~95% of neurons and ~50% of total cells in the striatum. SPNs can be
functionally divided into two subpopulations; thus, each of these
populations should constitute ~20-25% of the total cells in a given
striatal sample. Notably, this was the case in the single-cell RNAseq
dataset I used as a reference for deconvolution (see Fig. 1B in this paper
<https://www.nature.com/articles/s41598-023-36255-5>).
When I used BayesPrism to deconvolve my bulk dataset with the reference
above, one of the two SPN subpopulations was predicted to make up ~20% of
the total sample, in line with what I expected. However, the other
subpopulation was predicted to be present at a much lower proportion (<1%).
This was true across both genotypes I was comparing, suggesting it was not
a biological phenomenon attributable to my experimental manipulation.
Moreover, it held true whether I used all expressed genes (pre-filtered as
described in the BayesPrism tutorial) or selected marker genes (using
select.marker) for cell type estimation.
In my mind, this could conceivably be due to low expression or dropout of
genes that are expressed selectively in this cell type. However, in my
all-genes analysis, after filtering there are still several genes included
whose expression is reasonably selective for this cell type over all
others. And, to reiterate, the reference dataset contained the expected
proportions of both of these cell types, so, in that regard, the reference
seems unlikely to have introduced bias into the doconvolution.
Thus, my question is: is there any aspect of the BayesPrism workflow that
might tend to systematically underestimate specific cell populations? And,
if so, what would be the reason for this, and are there computational
methods that might lessen or circumnavigate such an issue? I am happy to
provide additional information and/or code for further clarification. I
greatly appreciate any help you can provide!
—
Reply to this email directly, view it on GitHub
<#67>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4NHS4QY3WHALZRI6TYIDDYGZBQ5AVCNFSM6AAAAAA76K52GGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYTKMRTGQ4TGMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thanks so much for your response! One of the cell types in my reference is, I believe, actually a heterogenous mix of multiple cell types (or a single, highly plastic cell type that expresses markers of other cell lineages). Some of the markers of this population overlapped with my underrepresented population, and so my cells of interest were being mis-classified. I now have a workaround for this that seems to have resolved the issue. Thank you again! |
Hello! I recently employed BayesPrism to deconvolve a bulk RNAseq dataset from a particular region of the embryonic mouse brain. This region (the striatum) houses specific neuronal populations that exhibit a fairly well-documented distribution; namely, spiny projection neurons (SPNs) make up ~95% of neurons and ~50% of total cells in the striatum. SPNs can be functionally divided into two subpopulations; thus, each of these populations should constitute ~20-25% of the total cells in a given striatal sample. Notably, this was the case in the single-cell RNAseq dataset I used as a reference for deconvolution (see Fig. 1B in this paper).
When I used BayesPrism to deconvolve my bulk dataset with the reference above, one of the two SPN subpopulations was predicted to make up ~20% of the total sample, in line with what I expected. However, the other subpopulation was predicted to be present at a much lower proportion (<1%). This was true across both genotypes I was comparing, suggesting it was not a biological phenomenon attributable to my experimental manipulation. Moreover, it held true whether I used all expressed genes (pre-filtered as described in the BayesPrism tutorial) or selected marker genes (using
select.marker
) for cell type estimation.In my mind, this could conceivably be due to low expression or dropout of genes that are expressed selectively in this cell type. However, in my all-genes analysis, after filtering there are still several genes included whose expression is reasonably selective for this cell type over all others. And, to reiterate, the reference dataset contained the expected proportions of both of these cell types, so, in that regard, the reference seems unlikely to have introduced bias into the doconvolution.
Thus, my question is: is there any aspect of the BayesPrism workflow that might tend to systematically underestimate specific cell populations? And, if so, what would be the reason for this, and are there computational methods that might lessen or circumnavigate such an issue? I am happy to provide additional information and/or code for further clarification. I greatly appreciate any help you can provide!
The text was updated successfully, but these errors were encountered: