Many models are sensitive to outliers, such as linear regression, k-nearest neighbor and ARIMA. Machine learning algorithms suffer from overfitting and may not generalize well in the presence of outliers.¹ However, the right transformation can reduce these outliers and improve the performance of your model.
Transformations for data with negative values include:
- Displaced record
- Displaced Cox-Box
- Inverse hyperbolic sine
- Sinh-arcsinh
Log and Box-Cox are effective tools when working with positive data, but inverse hyperbolic sine (arcsinh) is much more effective with negative values.
Sinh-arcsinh is even more powerful. It has two parameters that can adjust the skew and kurtosis of your data to be almost normal. These parameters can be derived using gradient descent. See a Python implementation at the end of this post.
The logarithmic transformation can be adapted to handle negative values with a changing term. to.
Visually, this is moving the vertical asymptote of the register from 0 to to.
Stock Price Forecast
Imagine you are building a model to predict the stock market. Hosenzade and Haratizadeh address this problem with a convolutional neural network using a large set of feature variables that I have extracted from UCI Irvine Machine Learning Repository². Below is the distribution of the volume change characteristic, an important technical indicator for stock market forecasts.
The quantile-quantile (QQ) plot reveals strong right and left tails. The goal of our transformation will be to bring the tails closer to normal (the red line) so that there are no outliers.
Using an offset value of -250, I get this record distribution.
The right tail looks a little better, but the left tail still shows a deviation from the red line. Log works by applying a concave function to the data that skews the remaining data by compressing high values and stretching low values.
The logarithmic transformation only clears up the right tail.
While this works well for positively skewed data, it is less effective for data with negative outliers.
In stock market data, skewness is not the problem. The extreme values are on both the left and right sides. He kurtosis is tall, which means both tails are heavy. A simple concave function is not prepared for this situation.
Box-Cox is a generalized version of log, which can also be changed to include negative values, written as
He I The parameter controls the concavity of the transformation, allowing it to take on a variety of shapes. Box-cox is quadratic when I = 2. It is linear when I = 1, and record as I approaches 0. This can be verified using L'Hôpital's rule.
To apply this transformation on our stock price data, I use a shift value -250 and determine I with scipy boxcox
function.
from scipy.stats import boxcox
y, lambda_ = boxcox(x - (-250))
The resulting transformed data looks like this:
Despite the flexibility of this transformation, it fails to reduce the tails of the stock price data. Low values of I skew the data to the left, reducing the right tail. High values of I skews the data to the right, reducing the left tail, but there is no value that can reduce both simultaneously.
The hyperbolic sine function (sinh) is defined as
and its inverse is
In this case, the inverse function is more useful because it is approximately log for large x (positive or negative) and linear for small values of x. In effect, this reduces the extremes while keeping the core values, more or less, the same.
Arcsinh reduces positive and negative queues.
For positive values, arcsinh is concave and for negative values, it is convex. This curvature change is the secret sauce that allows you to handle positive and negative extreme values simultaneously.
Using this transformation on stock data results in almost normal tails. The new data has no outliers!
Scale matters
Consider how your data scales before passing it to arcsinh.
For the record, your choice of units is irrelevant. Dollars or cents, grams or kilograms, miles or feet – it's all the same for the recording function. The scaling of your inputs only shifts the values transformed into a constant value.
The same is not true with arcsinh. Values between -1 and 1 are left almost unchanged, while large numbers are dominated by the log. You may need to play with different scales and offsets before entering your data into arcsinh to get a result you're happy with.
At the end of the article, I implement a gradient descent algorithm in Python to estimate these transformation parameters more accurately.
Proposed by Jones and Pewsey³, the sinh-arcsinh transformation is
Parameter my adjusts data skewness and d adjusts kurtosis³, allowing the transformation to take many forms. For example, the transformation of identity. f(x) = x is a special case of sinh-arcsinh when my = 0 and d = 1. Arcsinh is a limiting case for my = 0 and d approaching zero, as can be seen using L'Hôpital's rule again.