Revealing the utilized range of learning subspaces in neural networks

In this work, we study how well the learned weights of a neural network utilize the space available to them. This notion is related to capacity, but additionally incorporates the interaction of the network architecture with the dataset. Most of the learned weights appear to be full-rank and are therefore not amenable to low-rank decomposition. This misleadingly implies that the weights are utilizing all the space available to them. We propose a simple data-driven transformation that projects the weights into the subspace where the data and weight interact. This preserves the functional mapping of the layer and reveals its low-rank structure. In our findings, we conclude that most models utilize a fraction of the available space. For example, for ViTB-16 and ViTL-16 trained on ImageNet, the mean layer utilization is 35% and 20% respectively. Our transformation results in parameter reductions to 50% and 25% respectively, while resulting in an accuracy drop of less than 0.2% after fine-tuning. We also show that self-supervised pre-training increases this utilization up to 70%, justifying its suitability for downstream tasks.

Figure 2: The identified utilized range is low enough that a simple decomposition of each layer into 2 range-truncated layers results in significant savings in network size and number of FLOPs, without sacrificing accuracy (when tuned).

Revealing the utilized range of learning subspaces in neural networks

Technical Terrence Team

Amazon is selling a $260 cordless stick vacuum for just $130 and shoppers are comparing it to Dyson

Leave a Reply Cancel reply

Recommended.

Tesla to spend $500 million to bring its Dojo supercomputer project to Buffalo factory

Instacart set for IPO as Federal Reserve’s interest rate decision looms By Investing.com

What is cryptocurrency market manipulation?

Is Bitcoin Headed for a Shakeup? Price-driven changes raise questions about future of miners

MIT and Oxford researchers propose a new AI method called ADEV that automates the math to maximize the expected value of stocks in an uncertain world

Categories

Important Links

Revealing the utilized range of learning subspaces in neural networks

Related

Technical Terrence Team

Amazon is selling a $260 cordless stick vacuum for just $130 and shoppers are comparing it to Dyson

Leave a Reply Cancel reply

Recommended.

Tesla to spend $500 million to bring its Dojo supercomputer project to Buffalo factory

Instacart set for IPO as Federal Reserve’s interest rate decision looms By Investing.com

What is cryptocurrency market manipulation?

Is Bitcoin Headed for a Shakeup? Price-driven changes raise questions about future of miners

MIT and Oxford researchers propose a new AI method called ADEV that automates the math to maximize the expected value of stocks in an uncertain world

Categories

Important Links

Get daily news updates to your inbox!