Improve count analysis of objects (or cells) with many zeros | by Daniel Manrique-Castaño | April 2024

A zero-inflated model effectively captures the nuances of data sets characterized by a preponderance of zeros. It operates by distinguishing between two distinct processes: 1) Determining whether the outcome is zero and 2) predicting values for non-zero outcomes. This dual approach is particularly suited to asking questions such as: “Are there cells present and, if so, how many?”

To handle data sets with an abundance of zeros, we employ models such as hurdle_poisson() and Zero_inflated_poissonboth designed for scenarios where standard counting models such as Poisson or negative binomial models are inadequate (3).Generally speaking, a key difference between hurdle_poisson() and Zero_inflated_poisson is that the latter incorporates an additional probability component specifically for zeros, improving its ability to handle data sets where zeros are not only common but also significant. We will see the impact these features have on our modeling strategy using brms.

Fitting a Poisson_Obstacle Model

Let's start using the hurdle_poisson() distribution in our modeling scheme:

Hurdle_Fit1 <- brm(Cells ~ Hemisphere, 
data = Svz_data, 
family = hurdle_poisson(),
# seed for reproducibility purposes
seed = 8807,
control = list(adapt_delta = 0.99),
# this is to save the model in my laptop
file    = "Models/2024-04-19_CountsZeroInflated/Hurdle_Fit1.rds",
file_refit = "never")# Add loo for model comparison
Hurdle_Fit1 <- 
add_criterion(Hurdle_Fit1, c("loo", "waic", "bayes_R2"))

Let's look at the results using the standard summary function.

summary(Hurdle_Fit1)

Given this family distribution, estimates are shown on the logarithmic scale (mu = log). In practical terms, this means that the number of cells in the contralateral subventricular zone (SVZ) can be expressed as exp(1,11) = 3.03. Similarly, the ipsilateral hemisphere is estimated to have exp(1.07) = 2.91 times the number of cells. These results align well with our expectations and offer a consistent interpretation of the cellular distribution between the two hemispheres.

Furthermore, the hu The parameter within “Family Specific Parameters” sheds light on the probability of observing zero cell counts. Indicates a 38% probability of nothing happening. This probability highlights the need for a zero-inflated model approach and justifies its use in our analysis.

To better visualize the implications of these findings, we can take advantage of the conditional_effects function. This tool in brms The package allows us to plot the estimated effects of different predictors on the response variable, providing a clear graphical representation of how the predictors influence the expected cell counts.

Hurdle_CE <- 
conditional_effects(Hurdle_Fit1)Hurdle_CE <- plot(Hurdle_CE, 
plot = FALSE)((1))
Hurdle_Com <- Hurdle_CE + 
Plot_theme +
theme(legend.position = "bottom", legend.direction = "horizontal")
Hurdle_CE_hu <- 
conditional_effects(Hurdle_Fit1, dpar = "hu")
Hurdle_CE_hu <- plot(Hurdle_CE_hu, 
plot = FALSE)((1))
Hurdle_hu <- Hurdle_CE_hu + 
Plot_theme +
theme(legend.position = "bottom", legend.direction = "horizontal")
Hurdle_Com | Hurdle_hu

Figure 5: Conditional effects of adjustment with hurdles

These graphs paint a more logical picture than our first model. The graph on the left shows the two parts of the model (“mu” and “hu”). Furthermore, if this model is adequate, we should see more aligned predictions when using pp_check:

pp_check(Hurdle_Fit1, ndraws = 100) +
labs(title = "Hurdle regression") +
theme_classic()

Figure 6: Obstacle model for subsequent predictive controls

As expected, our model predictions have a lower bound at 0.

Modeling data dispersion.

Looking at the data presented in the graph on the right of Figure 5 reveals a discrepancy between our empirical findings and our theoretical understanding of the topic. Based on established knowledge, we expect a higher probability of non-zero cell counts in the subventricular zone (SVZ) of the ipsilateral hemisphere, especially after injury. This is because the ipsilateral SVZ normally becomes a center of cellular activity, with significant cell proliferation following injury. Our data, indicating non-zero prevalent counts in this region, support this biological expectation.

However, the current model predictions do not fully align with these insights. This divergence underscores the importance of incorporating scientific understanding into our statistical models. Relying solely on standard tests without contextual adaptation can lead to misleading conclusions.

To address this, we can refine our model by specifically adjusting the hu parameter, which represents the probability of zero occurring. This allows us to more accurately reflect the expected biological activity in the SVZ of the ipsilateral hemisphere. Then we build a second obstacle model:

Hurdle_Mdl2 <- bf(Cells ~ Hemisphere, 
hu ~ Hemisphere)Hurdle_Fit2 <- brm(
formula = Hurdle_Mdl2,
data = Svz_data, 
family = hurdle_poisson(),
# seed for reproducibility purposes
seed = 8807,
control = list(adapt_delta = 0.99),
# this is to save the model in my laptop
file    = "Models/2024-04-19_CountsZeroInflated/Hurdle_Fit2.rds",
file_refit = "never")
# Add loo for model comparison
Hurdle_Fit2 <- 
add_criterion(Hurdle_Fit2, c("loo", "waic", "bayes_R2"))

Let's first see if the results graph aligns with our hypothesis:

Hurdle_CE <- 
conditional_effects(Hurdle_Fit2)Hurdle_CE <- plot(Hurdle_CE, 
plot = FALSE)((1))
Hurdle_Com <- Hurdle_CE + 
Plot_theme +
theme(legend.position = "bottom", legend.direction = "horizontal")
Hurdle_CE_hu <- 
conditional_effects(Hurdle_Fit2, dpar = "hu")
Hurdle_CE_hu <- plot(Hurdle_CE_hu, 
plot = FALSE)((1))
Hurdle_hu <- Hurdle_CE_hu + 
Plot_theme +
theme(legend.position = "bottom", legend.direction = "horizontal")
Hurdle_Com | Hurdle_hu

Figure 7: Conditional effects for obstacle setting 2

This revised modeling approach appears to be a substantial improvement. By specifically taking into account the higher probability of zero counts (~75%) in the contralateral hemisphere, the model now aligns more closely with both the observed data and our scientific knowledge. This adjustment not only reflects the expected lower cellular activity in this region, but also improves the precision of our estimates. With these changes, the model now offers a more nuanced interpretation of cellular dynamics after injury. Let's see the summary and the TRANSFORMATION FOR THE hu parameters (don't look at the others) to display them on a probability scale using the logit2prob function we created at the beginning.

logit2prob(fixef(Hurdle_Fit2))

Although the estimates for cell number are similar, the hu The parameters (on the logit scale) tell us that the probability of seeing zeros in the contralateral hemisphere is:

Instead:

It represents a drastic reduction to approximately 0.23% of the probability of observing zero cell counts in the injured (ipsilateral) hemisphere. This is a notable change in our estimates.

Now, let us explore whether a zero_inflated_poisson() The distribution family changes these ideas.

Improve count analysis of objects (or cells) with many zeros | by Daniel Manrique-Castaño | April 2024

Technical Terrence Team

A medida que crece la capacidad solar, algunas de las tierras agrícolas más productivas de Estados Unidos están en riesgo Por Reuters

Leave a Reply Cancel reply

Recommended.

Staking Could Reduce Fees, Increase Interest in Ethereum ETFs, Analyst Says

This is my plan to generate passive income in 2024!

Senator Warren criticizes Apple for shutting down Beeper's 'iMessage to Android' solution

Innovation in Padre Yermo schools: technology-driven learning

Friday the 5th: The evolution of online learning

Categories

Important Links

Improve count analysis of objects (or cells) with many zeros | by Daniel Manrique-Castaño | April 2024

Fitting a Poisson_Obstacle Model

Modeling data dispersion.

Related

Technical Terrence Team

A medida que crece la capacidad solar, algunas de las tierras agrícolas más productivas de Estados Unidos están en riesgo Por Reuters

Leave a Reply Cancel reply

Recommended.

Staking Could Reduce Fees, Increase Interest in Ethereum ETFs, Analyst Says

This is my plan to generate passive income in 2024!

Senator Warren criticizes Apple for shutting down Beeper's 'iMessage to Android' solution

Innovation in Padre Yermo schools: technology-driven learning

Friday the 5th: The evolution of online learning

Categories

Important Links

Get daily news updates to your inbox!