MIT researchers discover the structural properties and dynamics of deep classifiers and offer novel explanations for optimization, generalization, and approximation in deep networks

Researchers at MIT and Brown University have conducted a groundbreaking study on the dynamics of training deep classifiers, a generalized neural network used for tasks such as image classification, speech recognition, and natural language processing. The study, published in the journal Research, is the first to analyze the properties that emerge during training of lossy quadratic deep classifiers.

The study mainly focuses on two types of deep classifiers: convolutional neural networks and fully connected deep networks. The researchers found that deep networks using stochastic gradient descent, weight drop regularization, and weight normalization (WN) are prone to neural collapse if trained to fit their training data. Neural collapse refers to when the network assigns multiple examples of a particular class to a single template, making it difficult to accurately classify new examples. The researchers demonstrated that neural collapse arises by minimizing squared loss using SGD, WD, and WN.

The researchers found that weight decrement regularization helps prevent the network from overfitting the training data by reducing the magnitude of the weights, while weight normalization scales the weight matrices of a network to have a scale similar. The study also validates the classical theory of generalization, indicating that its limits are significant and that sparse networks like CNNs perform better than dense networks. The authors demonstrated new rule-based generalization bounds for CNNs with localized cores, which are networks with poor connectivity in their weight matrices.

Furthermore, the study found that a low rank bias predicts the existence of intrinsic SGD noise in the weight matrices and the network output, providing an intrinsic source of noise comparable to chaotic systems. The researchers’ findings provide new insights into the properties that arise during deep classifier training and may improve our understanding of why deep learning works so well.

In conclusion, the study by the MIT and Brown University researchers provides crucial insights into the properties that arise during deep training of the classifier. The study validates classical generalization theory, introduces new norm-based generalization bounds for CNNs with localized nuclei, and explains how weight-decrease regularization and weight normalization help prevent neural collapse. Furthermore, the study found that low rank bias predicts the existence of intrinsic SGD noise, offering a new perspective for understanding noise within deep neural networks. These findings could significantly advance the field of deep learning and contribute to the development of more accurate and efficient models.

review the Paper and Reference article. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 15k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.

Niharika is a technical consulting intern at Marktechpost. She is a third year student, currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She is a very enthusiastic individual with a strong interest in machine learning, data science, and artificial intelligence and an avid reader of the latest developments in these fields.