Estimating the Unobserved: Estimating Moving Average Models with Maximum Likelihood in Python | by Daniel Pollak | Jun, 2024

The MLE provides a framework that addresses this issue precisely. Introduce a likelihood function, which is a function that produces another function. This likelihood function takes a vector of parameters, often called theta, and produces a probability density function (PDF) which depends on theta.

General definition of the probability function

The probability density function (PDF) of a distribution is a function that takes a value, x, and returns its probability within the distribution. Therefore, probability functions are typically expressed as follows:

Probability as a function of theta given x

The value of this function indicates the probability of observing x from the distribution defined by the PDF with theta as its parameters.

The goal

When building a forecast model, we have data samples and a parameterized model, and our goal is to estimate the model parameters. In our examples, such as the regression and moving average models, these parameters are the coefficients in the respective model formulas.

The equivalent in MLE is that we have observations and a PDF for a distribution defined over a set of parameters, theta, which are unknown and not directly observable. Our goal is to estimate theta.

The MLE approach involves finding the set of parameters, theta, that maximizes the likelihood function given the observable data, x.

We assume that our samples, x, are drawn from a distribution with a known probability function that depends on a set of parameters, theta. This implies that the probability of observing x under this likelihood function is essentially 1. Therefore, identifying the theta values that make our likelihood function value close to 1 in our samples should reveal the true values of the parameters.

Conditional probability

Note that we have not made any assumptions about the distribution (PDF) on which the probability function is based. Now, suppose our observation x is a vector (x_1, x_2, …, x_n). We will consider a probability function that represents the probability of observing x_n conditional on having already observed (x_1, x_2, …, x_{n-1}) —

This represents the probability of observing only x_n given the above values (and theta, the set of parameters). Now, we define the conditional probability function as follows:

Later we will see why it is useful to use the conditional likelihood function instead of the exact likelihood function.

Probability logistics

In practice, it is often convenient to use the natural logarithm of the likelihood function, called the log-likelihood function:

This is more convenient because we often work with a likelihood function which is a joint probability function of independent variables, which translates to the product of the probability of each variable. Taking the logarithm converts this product into a sum.

For simplicity, I will demonstrate how to estimate the most basic moving average model: MA(1):

Here, x_t represents the time series observations, alpha and beta are the model parameters to be estimated, and epsilons are random noise drawn from a normal distribution with zero mean and some variance (sigma), which will also be estimated. Therefore, our “theta” is (alpha, beta, sigma), which we intend to estimate.

Let's define our parameters and generate some synthetic data using Python:

import pandas as pd
import numpy as npSTD = 3.3
MEAN = 0
ALPHA = 18
BETA = 0.7
N = 1000
df = pd.DataFrame({"et": np.random.normal(loc=MEAN, scale=STD, size=N)})
df("et-1") = df("et").shift(1, fill_value=0)
df("xt") = ALPHA + (BETA*df("et-1")) + df("et")

Note that we have set the standard deviation of the error distribution to 3.3, with alpha at 18 and beta at 0.7. The data looks like this:

Likelihood function for MA(1)

Our goal is to construct a likelihood function that addresses the question: what is the probability of observing our time series x=(x_1,…, x_n) assuming they are generated by the MA(1) process described above?

The challenge of calculating this probability lies in the mutual dependence between our samples (as evident from the fact that both x_t and x_{t-1} depend on e_{t-1}), which makes it non-trivial to determine the joint probability of observing all samples (known as the exact probability).

So, as discussed above, instead of calculating the exact probability, we will work with a conditional probability. Let's start with the probability of observing a single sample given all previous samples:

Conditional probability of observing x_n given the remainder

This is much easier to calculate because:

All that remains is to calculate the conditional probability of observing all samples:

Applying a natural logarithm we obtain:

what is the function that we should maximize.

Code

We will use the GenericLikelihoodModel statsmodels class for our MLE estimation implementation. As described in the tutorial On the statsmodels website, we simply need to subclass this class and include our likelihood function calculation:

from scipy import stats
from statsmodels.base.model import GenericLikelihoodModel
import statsmodels.api as smclass MovingAverageMLE(GenericLikelihoodModel):
def initialize(self):
super().initialize()
extra_params_names = ('beta', 'std')
self._set_extra_params_names(extra_params_names)
self.start_params = np.array((0.1, 0.1, 0.1))
def calc_conditional_et(self, intercept, beta):
df = pd.DataFrame({"xt": self.endog})
ets = (0.0)
for i in range(1, len(df)):
ets.append(df.iloc(i)("xt") - intercept - (beta*ets(i-1)))
return ets
def loglike(self, params):
ets = self.calc_conditional_et(params(0), params(1))
return stats.norm.logpdf(
ets,
scale=params(2),
).sum()

The function loglike It is essential to implement. Given the iterated parameter values paramsand the dependent variables (in this case, the time series samples), which are stored as members of the class self.endogcalculates the conditional log-likelihood value, as we discussed above.

Now let's create the model and fit it to our simulated data:

df = sm.add_constant(df) # add intercept for estimation (alpha)
model = MovingAverageMLE(df("xt"), df("const"))
r = model.fit()
r.summary()

and the output is:

And that's it! As demonstrated, MLE successfully estimated the parameters we selected for the simulation.

Estimating even a simple MA(1) model with maximum likelihood demonstrates the power of this method, which not only allows us to make efficient use of our data but also provides a solid statistical basis for understanding and interpreting the dynamics of the data. of time series.

I hope you liked it !

(1) Andres Lesniewski, Time series analysis2019, Baruch College, New York

(2) Eric life, ARMA model estimation2005

Unless otherwise stated, all images are the property of the author.

Estimating the Unobserved: Estimating Moving Average Models with Maximum Likelihood in Python | by Daniel Pollak | Jun, 2024

Technical Terrence Team

Catalyst Watch: Tesla deliveries, Constellation Brands earnings and the June jobs report

Leave a Reply Cancel reply

Recommended.

The morning after: Samsung reveals the Galaxy S24 Ultra

Bitcoin Dominates Ethereum In Daily Active Addresses Despite Lagging In TX Count

Math Wizard Games – Technology for Educators

Kohl's shares plummet 11%

5 games that demonstrate why Sports is the main genre of web3

Categories

Important Links

Estimating the Unobserved: Estimating Moving Average Models with Maximum Likelihood in Python | by Daniel Pollak | Jun, 2024

The goal

Conditional probability

Probability logistics

Likelihood function for MA(1)

Code

Related

Technical Terrence Team

Catalyst Watch: Tesla deliveries, Constellation Brands earnings and the June jobs report

Leave a Reply Cancel reply

Recommended.

The morning after: Samsung reveals the Galaxy S24 Ultra

Bitcoin Dominates Ethereum In Daily Active Addresses Despite Lagging In TX Count

Math Wizard Games – Technology for Educators

Kohl's shares plummet 11%

5 games that demonstrate why Sports is the main genre of web3

Categories

Important Links

Get daily news updates to your inbox!