Transform data with hyperbolic sine | by David Kyle | April 2024

Why handling negative values should be a piece of cake

Many models are sensitive to outliers, such as linear regression, k-nearest neighbor and ARIMA. Machine learning algorithms suffer from overfitting and may not generalize well in the presence of outliers.¹ However, the right transformation can reduce these outliers and improve the performance of your model.

Transformations for data with negative values include:

Displaced record
Displaced Cox-Box
Inverse hyperbolic sine
Sinh-arcsinh

Log and Box-Cox are effective tools when working with positive data, but inverse hyperbolic sine (arcsinh) is much more effective with negative values.

Sinh-arcsinh is even more powerful. It has two parameters that can adjust the skew and kurtosis of your data to be almost normal. These parameters can be derived using gradient descent. See a Python implementation at the end of this post.

The logarithmic transformation can be adapted to handle negative values with a changing term. to.

Throughout the article, I use trunk to refer to natural trunk.

Visually, this is moving the vertical asymptote of the register from 0 to to.

Shifted Log Transform Chart with Shift *-5, made with* *desmos* *available under* CC BY-SA 4.0. Equation text added to image.

Stock Price Forecast

Imagine you are building a model to predict the stock market. Hosenzade and Haratizadeh address this problem with a convolutional neural network using a large set of feature variables that I have extracted from UCI Irvine Machine Learning Repository². Below is the distribution of the volume change characteristic, an important technical indicator for stock market forecasts.

The quantile-quantile (QQ) plot reveals strong right and left tails. The goal of our transformation will be to bring the tails closer to normal (the red line) so that there are no outliers.

Using an offset value of -250, I get this record distribution.

The right tail looks a little better, but the left tail still shows a deviation from the red line. Log works by applying a concave function to the data that skews the remaining data by compressing high values and stretching low values.

The logarithmic transformation only clears up the right tail.

While this works well for positively skewed data, it is less effective for data with negative outliers.

*made with* *desmos* *available under* CC BY-SA 4.0. Text and arrows added to the image.

In stock market data, skewness is not the problem. The extreme values are on both the left and right sides. He kurtosis is tall, which means both tails are heavy. A simple concave function is not prepared for this situation.

Box-Cox is a generalized version of log, which can also be changed to include negative values, written as

He I The parameter controls the concavity of the transformation, allowing it to take on a variety of shapes. Box-cox is quadratic when I = 2. It is linear when I = 1, and record as I approaches 0. This can be verified using L'Hôpital's rule.

Shifted Box-Cox Transformation Plot with Shift *-5 and five different values for λ, made with* *desmos* *available under* CC BY-SA 4.0. Text added to the image.

To apply this transformation on our stock price data, I use a shift value -250 and determine I with scipy boxcox function.

from scipy.stats import boxcox
y, lambda_ = boxcox(x - (-250))

The resulting transformed data looks like this:

Despite the flexibility of this transformation, it fails to reduce the tails of the stock price data. Low values of I skew the data to the left, reducing the right tail. High values of I skews the data to the right, reducing the left tail, but there is no value that can reduce both simultaneously.

The hyperbolic sine function (sinh) is defined as

and its inverse is

In this case, the inverse function is more useful because it is approximately log for large x (positive or negative) and linear for small values of x. In effect, this reduces the extremes while keeping the core values, more or less, the same.

Arcsinh reduces positive and negative queues.

For positive values, arcsinh is concave and for negative values, it is convex. This curvature change is the secret sauce that allows you to handle positive and negative extreme values simultaneously.

graph of the inverse hyperbolic sine (arcsinh) compared to a logarithmic function, *made with* *desmos* *available under* CC BY-SA 4.0. Text, arrows and box shape added to the image.

Using this transformation on stock data results in almost normal tails. The new data has no outliers!

Scale matters

Consider how your data scales before passing it to arcsinh.

For the record, your choice of units is irrelevant. Dollars or cents, grams or kilograms, miles or feet – it's all the same for the recording function. The scaling of your inputs only shifts the values transformed into a constant value.

The same is not true with arcsinh. Values between -1 and 1 are left almost unchanged, while large numbers are dominated by the log. You may need to play with different scales and offsets before entering your data into arcsinh to get a result you're happy with.

At the end of the article, I implement a gradient descent algorithm in Python to estimate these transformation parameters more accurately.

Proposed by Jones and Pewsey³, the sinh-arcsinh transformation is

Jones and Pewsey do not include the constant 1/*δ term in front. However, I include it here because it makes it easier to show arcsinh as a borderline case.*

Parameter my adjusts data skewness and d adjusts kurtosis³, allowing the transformation to take many forms. For example, the transformation of identity. f(x) = x is a special case of sinh-arcsinh when my = 0 and d = 1. Arcsinh is a limiting case for my = 0 and d approaching zero, as can be seen using L'Hôpital's rule again.

Transform data with hyperbolic sine | by David Kyle | April 2024

Technical Terrence Team

Pinterest Q1 Results Preview: Mood Board Looks Cheerful (NYSE:PINS)

Leave a Reply Cancel reply

Recommended.

Can one AI model master all audio tasks? Meet UniAudio: a new universal audio generation system

Will Donald Trump really launch an official currency?

Ethereum Price Drop Is Imminent as Key Support Line Collapses

The Korean government will adopt a cryptocurrency tracking system within 5 months – Bitcoin News

UAE Bitcoin Holdings Soar to $40 Billion as Bull Season Continues

Categories

Important Links

Transform data with hyperbolic sine | by David Kyle | April 2024

Why handling negative values ​​should be a piece of cake

Stock Price Forecast

Scale matters

Related

Technical Terrence Team

Pinterest Q1 Results Preview: Mood Board Looks Cheerful (NYSE:PINS)

Leave a Reply Cancel reply

Recommended.

Can one AI model master all audio tasks? Meet UniAudio: a new universal audio generation system

Will Donald Trump really launch an official currency?

Ethereum Price Drop Is Imminent as Key Support Line Collapses

The Korean government will adopt a cryptocurrency tracking system within 5 months – Bitcoin News

UAE Bitcoin Holdings Soar to $40 Billion as Bull Season Continues

Categories

Important Links

Get daily news updates to your inbox!

Why handling negative values should be a piece of cake