Matrix Approximation in Data Streams | by Mina Ghashami | Sep, 2023

Approximate a matrix without having all of its rows

Matrix approximation is a heavily studied sub-field in data mining and machine learning. A large set of data analysis tasks rely on obtaining a low rank approximation of matrices. Examples are dimensionality reduction, anomaly detection, data de-noising, clustering, and recommendation systems. In this article, we look into the problem of matrix approximation, and how to compute it when the whole data is not available at hand!

The content of this article is partly taken from my lecture at Stanford -CS246 course. I hope you find it useful. Please find the full content here.

Most data generated on web can be represented as a matrix, where each row of the matrix is a data point. For example, in routers every packet sent across the network is a data point that can be represented as a row in a matrix of all data points. In retail, every purchase made is a row in the matrix of all transactions.

Figure 1: Data as a matrix — Image by the author

At the same time, almost all data generated over web is of a streaming nature; meaning the data is generated by an external source at a rapid rate which we have no control over. Think of all searches users make on Google search engine in a every second. We call this data the streaming data; because just like a stream it pours in.

Some examples of typical streaming web-scale data are as following:

Figure 2: Size of typical streaming web-scale data — Image by the author

Think of streaming data as a matrix A containing n rows in d-dimensional space, where typically n >> d. Often n is in order of billions and increasing.

In streaming model, data arrives at high speed, one row at a time, and algorithms must process the items fast, or they are lost forever.

Matrix Approximation in Data Streams | by Mina Ghashami | Sep, 2023

Technical Terrence Team

These stunning sites may soon get UNESCO's World Heritage status

Leave a Reply Cancel reply

Recommended.

The largest NFT marketplace is laying off 50% of its workforce

Nic Carter is wrong about the US Bitcoin strategic reserve

Americans' opinions on marijuana are worsening, according to Gallup (OTCQX:TCNNF)

This Ethereum Metric Has Sparked Centralization Concerns Over ETH Ownership

Elon Musk criticizes Biden for X more before the 2024 elections

Categories

Important Links

Matrix Approximation in Data Streams | by Mina Ghashami | Sep, 2023

Approximate a matrix without having all of its rows

Related

Technical Terrence Team

These stunning sites may soon get UNESCO's World Heritage status

Leave a Reply Cancel reply

Recommended.

The largest NFT marketplace is laying off 50% of its workforce

Nic Carter is wrong about the US Bitcoin strategic reserve

Americans' opinions on marijuana are worsening, according to Gallup (OTCQX:TCNNF)

This Ethereum Metric Has Sparked Centralization Concerns Over ETH Ownership

Elon Musk criticizes Biden for X more before the 2024 elections

Categories

Important Links

Get daily news updates to your inbox!