As deep learning models grow in size and complexity, it becomes more difficult to articulate why and how they arrive at a given result. There are several different directions that researchers are exploring to improve the interpretability of AI systems.
Attempts at mechanistic interpretability use reverse-engineered neural networks to provide such explanations for the algorithms a model employs. In image classification, convolutional neural networks have found this strategy to be quite effective. Despite these achievements, the repertoire of methods for producing mechanistic explanations is limited and poorly understood. One major hurdle is that researchers must be imaginative and diligent in testing mechanistic hypotheses.
Combining evidence from numerous ad hoc tests is the typical method for evaluating mechanistic theories. Due to the high cost involved, many approaches are only tested on simplified models or very few non-trivial circuits on more realistic models.
A new DeepMind study proposes the TRAnsformer Compiler for RASP (Tracr), a compiler that compiles human-readable code into the weights of a neural network to directly address the problem of insufficient ground truth explanations. Models that perform non-trivial computations with a known implementation can be developed using this method. To determine how well various interpretability tools are working, we can apply them to built models and then compare the resulting explanation to actual data.
Tracr converts Restricted Access Sequence Processing (RASP) code (a domain-specific programming language designed to define transformer computations) into weights for transformer models. The team also introduces craft, Tracr’s intermediate representation for expressing linear algebra operations in terms of named base addresses.
The researchers use RASP to investigate edge scenarios, such as duplicate data in multiple storage locations, focusing on transformer model implementations. With Tracr, it is possible to build models in which data is encoded at a known location and validate the proposed approach. They used Tracr to create models for ordering a sequence of numbers, counting the number of tokens in an input sequence, and checking for balanced parentheses, all of which are much simpler tasks than NLP tasks like summarizing text or answering questions. , which are generally where the decoder-only Transformer models are employed.
The researchers highlight other potential uses for Tracr beyond its current use as a tool for evaluating interpretability tools. One example is compiling and using hand-coded implementations of model sections to replace parts of a model generated by conventional training methods. It can lead to better overall model performance.
The researchers hope that its adoption by the research community will help deepen their understanding of neural networks.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 14k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.