Introduction to finite normal mixtures in regression with R | by Lukasz Gatarek | November 2024

How to make linear regression flexible enough for nonlinear data

Linear regression is generally considered not to be flexible enough to address nonlinear data. From a theoretical point of view it is not capable of addressing them. However, we can make it work for us with any data set using finite normal mixtures in a regression model. In this way, it becomes a very powerful machine learning tool that can be applied to practically any data set, even very abnormal ones with non-linear dependencies between variables.

What makes this approach particularly interesting is interpretability. Despite an extremely high level of flexibility, all detected relationships can be directly interpreted. The model is as general as a neural network, but still does not become a black box. You can read the relationships and understand the impact of individual variables.

In this post, we demonstrate how to simulate a finite mixture model for regression using Markov Chain Monte Carlo (MCMC) sampling. We will generate data with multiple components (groups) and fit a mixed model to recover these components using Bayesian inference. This process involves regression models and mixed models, combining them with MCMC techniques for parameter estimation.

Data simulated as a combination of three linear regressions.

We start by loading the libraries needed to work with regression models, MCMC, and multivariate distributions.

# Loading the required libraries for various functions
library("pscl")         # For pscl specific functions, like regression models
library("MCMCpack")     # For MCMC sampling functions, including posterior distributions
library(mvtnorm)        # For multivariate normal distribution functio

pscl: Used for various statistical functions such as regression models.
MCMC Package: Contains functions for Bayesian inference, in particular MCMC sampling.
mvtnorm: Provides tools for working with multivariate normal distributions.

We simulate a data set where each observation belongs to one of several groups (components of the mixed model) and the response variable is generated using a regression model with random coefficients.

We consider a general setup for a regression model that uses G Normal mixture components.

## Generate the observations
# Set the length of the time series (number of observations per group)
N <- 1000
# Set the number of simulations (iterations of the MCMC process)
nSim <- 200
# Set the number of components in the mixture model (G is the number of groups)
G <- 3

north: The number of observations per group.
nSim: The number of MCMC iterations.
GRAM: The number of components (groups) in our mixture model.

Data simulation

Each group is modeled using a univariate regression model, where the explanatory variables (x) and the response variable (y) are simulated from normal distributions. He betas represent the regression coefficients for each group, and sigmas represent the variance of each group.

# Set the values for the regression coefficients (betas) for each group
betas <- 1:sum(dimG) * 2.5  # Generating sequential betas with a multiplier of 2.5
# Define the variance (sigma) for each component (group) in the mixture
sigmas <- rep(1, G) / 1  # Set variance to 1 for each component, with a fixed divisor of 1

betas: These are the regression coefficients. The coefficient of each group is assigned sequentially.
sigmas: Represents the variance of each group in the mixture model.

In this model we allow each component of the mixture to have its own variance parameter and a set of regression parameters.

Group Assignment and Mixing

We then simulated the group assignment of each observation using random assignment and shuffled the data from all components.

We augment the model with a set of component label vectors to

where

and so z_gi=1 implies that the Yo-The individual is extracted from the gram-º component of the mixture.

This random assignment forms the z_original vector, which represents the true group to which each observation belongs.

# Initialize the original group assignments (z_original)
z_original <- matrix(NA, N * G, 1)
# Repeat each group label N times (assign labels to each observation per group)
z_original <- rep(1:G, rep(N, G))
# Resample the data rows by random order
sampled_order <- sample(nrow(data))
# Apply the resampled order to the data
data <- data(sampled_order,)

We established prior distributions for the regression coefficients and variances. This background will guide our Bayesian estimation.

## Define Priors for Bayesian estimation# Define the prior mean (muBeta) for the regression coefficients
muBeta <- matrix(0, G, 1)# Define the prior variance (VBeta) for the regression coefficients
VBeta <- 100 * diag(G)  # Large variance (100) as a prior for the beta coefficients# Prior for the sigma parameters (variance of each component)
ag <- 3  # Shape parameter
bg <- 1/2  # Rate parameter for the prior on sigma
shSigma <- ag
raSigma <- bg^(-1)

in Beta: The previous mean of the regression coefficients. We set it to 0 for all components.
V Beta: The variance above, which is large (100) to allow flexibility in the coefficients.
shSigma and raSigma: Shape and rate parameters for the prior in the variance (sigma) of each group.

For the indicators and probabilities of the components we consider the following prior assignment.

The multinomial prior M is the multivariate generalizations of the binomial, and the Dirichlet prior D is a multivariate generalization of the beta distribution.

In this section, we initialize the MCMC process by setting up arrays to store the samples of regression coefficients, variances, and mixing proportions.

## Initialize MCMC sampling# Initialize matrix to store the samples for beta
mBeta <- matrix(NA, nSim, G)# Assign the first value of beta using a random normal distribution
for (g in 1:G) {
mBeta(1, g) <- rnorm(1, muBeta(g, 1), VBeta(g, g))
}# Initialize the sigma^2 values (variance for each component)
mSigma2 <- matrix(NA, nSim, G)
mSigma2(1, ) <- rigamma(1, shSigma, raSigma)# Initialize the mixing proportions (pi), using a Dirichlet distribution
mPi <- matrix(NA, nSim, G)
alphaPrior <- rep(N/G, G)  # Prior for the mixing proportions, uniform across groups
mPi(1, ) <- rdirichlet(1, alphaPrior)

mbet: Matrix to store samples of the regression coefficients.
mSigma2: Matrix to store the variances (sigma squared) of each component.
MPi: Array to store mixing proportions, initialized using a Dirichlet distribution.

If we condition the values of the component indicator variables z, the conditional probability can be expressed as

In the MCMC sampling loop, we update the group assignments (z), regression coefficients (beta), and variations (sigma) based on the posterior distributions. The assignment probability of each group is calculated and the group with the highest posterior probability is selected.

The following complete posterior conditionals can be obtained:

where

denotes all parameters in our posterior except unknown.

and where black denotes the number of observations in the gram-th component of the mixture.

and

The following algorithm is based on the series of posterior distributions above in sequential order.

## Start the MCMC iterations for posterior sampling# Loop over the number of simulations
for (i in 2:nSim) {
print(i)  # Print the current iteration number# For each observation, update the group assignment (z)
for (t in 1:(N*G)) {
fig <- NULL
for (g in 1:G) {
# Calculate the likelihood of each group and the corresponding posterior probability
fig(g) <- dnorm(y(t, 1), x(t, ) %*% mBeta(i-1, g), sqrt(mSigma2(i-1, g))) * mPi(i-1, g)
}
# Avoid zero likelihood and adjust it
if (all(fig) == 0) {
fig <- fig + 1/G
}
# Sample a new group assignment based on the posterior probabilities
z(i, t) <- which(rmultinom(1, 1, fig/sum(fig)) == 1)
}
# Update the regression coefficients for each group
for (g in 1:G) {
# Compute the posterior mean and variance for beta (using the data for group g)
DBeta <- solve(t(x(z(i, ) == g, )) %*% x(z(i, ) == g, ) / mSigma2(i-1, g) + solve(VBeta(g, g)))
dBeta <- t(x(z(i, ) == g, )) %*% y(z(i, ) == g, 1) / mSigma2(i-1, g) + solve(VBeta(g, g)) %*% muBeta(g, 1)
# Sample a new value for beta from the multivariate normal distribution
mBeta(i, g) <- rmvnorm(1, DBeta %*% dBeta, DBeta)
# Update the number of observations in group g
ng(i, g) <- sum(z(i, ) == g)
# Update the variance (sigma^2) for each group
mSigma2(i, g) <- rigamma(1, ng(i, g)/2 + shSigma, raSigma + 1/2 * sum((y(z(i, ) == g, 1) - (x(z(i, ) == g, ) * mBeta(i, g)))^2))
}
# Reorder the group labels to maintain consistency
reorderWay <- order(mBeta(i, ))
mBeta(i, ) <- mBeta(i, reorderWay)
ng(i, ) <- ng(i, reorderWay)
mSigma2(i, ) <- mSigma2(i, reorderWay)
# Update the mixing proportions (pi) based on the number of observations in each group
mPi(i, ) <- rdirichlet(1, alphaPrior + ng(i, ))
}

This code block performs the key steps in MCMC:

Group Task Update: For each observation, we calculate the probability that the data belongs to each group and update the group assignment accordingly.
Regression Coefficient Update: The regression coefficients for each group are updated using the posterior mean and variance, which are calculated based on the observed data.
Variation update: The variance of the response variable for each group is updated using the inverse gamma distribution.

Finally, we visualize the results of MCMC sampling. We plotted the posterior distributions for each regression coefficient, compared them to the true values, and plotted the most likely group assignments.

# Plot the posterior distributions for each beta coefficient
par(mfrow=c(G,1))
for (g in 1:G) {
plot(density(mBeta(5:nSim, g)), main = 'True parameter (vertical) and the distribution of the samples')  # Plot the density for the beta estimates
abline(v = betas(g))  # Add a vertical line at the true value of beta for comparison
}

This graph shows how the MCMC (posterior distribution) samples for the regression coefficients converge to the true values (betas).

Through this process, we demonstrate how finite normal mixtures can be used in a regression context, combined with MCMC for parameter estimation. By simulating data with known groupings and recovering the parameters using Bayesian inference, we can evaluate how well our model captures the underlying structure of the data.

Unless otherwise noted, all images are the author's.

Introduction to finite normal mixtures in regression with R | by Lukasz Gatarek | November 2024

Technical Terrence Team

Why I'll avoid BT stock like the plague in 2025

Leave a Reply Cancel reply

Recommended.

Is Final Cut Pro finally better on the iPad Pro M4?

SEC and Grayscale to Debate a Bitcoin ETF in Court in March

Cloud Removal from Satellite Images Using GAN (Generative Adversarial Networks) | by Aleksei Rozanov | June 2024

Salesforce AI Research Introduces Moirai-MoE: A Basic MoE Time Series Model That Autonomously Achieves Token-Level Model Specialization

Is the atmosphere coding the future? Look what the main leaders have to say!

Categories

Important Links

Introduction to finite normal mixtures in regression with R | by Lukasz Gatarek | November 2024

How to make linear regression flexible enough for nonlinear data

Data simulation

Group Assignment and Mixing

Related

Technical Terrence Team

Why I'll avoid BT stock like the plague in 2025

Leave a Reply Cancel reply

Recommended.

Is Final Cut Pro finally better on the iPad Pro M4?

SEC and Grayscale to Debate a Bitcoin ETF in Court in March

Cloud Removal from Satellite Images Using GAN (Generative Adversarial Networks) | by Aleksei Rozanov | June 2024

Salesforce AI Research Introduces Moirai-MoE: A Basic MoE Time Series Model That Autonomously Achieves Token-Level Model Specialization

Is the atmosphere coding the future? Look what the main leaders have to say!

Categories

Important Links

Get daily news updates to your inbox!