As machine learning models get larger and more complex, they require faster and more power-efficient hardware to perform the calculations. Conventional digital computers struggle to keep up.
An analog optical neural network could perform the same tasks as a digital one, such as image classification or speech recognition, but because the computations are done using light instead of electrical signals, optical neural networks can perform many times longer. fast and consume less energy.
However, these analog devices are prone to hardware errors that can make calculations less accurate. Microscopic imperfections in hardware components are one cause of these errors. In an optical neural network that has many connected components, errors can accumulate quickly.
Even with error correction techniques, due to the fundamental properties of the devices that make up an optical neural network, a certain amount of error is inevitable. A network large enough to implement in the real world would be too imprecise to be effective.
The MIT researchers overcame this hurdle and found a way to effectively scale an optical neural network. By adding a small hardware component to the optical switches that make up the network architecture, they can reduce even uncorrectable errors that would otherwise accumulate on the device.
Their work could enable a super-fast, power-efficient analog neural network that can perform with the same precision as a digital one. With this technique, as an optical circuit gets larger, the amount of error in its calculations actually goes down.
“This is notable, as it goes counter-intuitive for analog systems, where larger circuits are assumed to have larger errors, so the errors set a limit on scalability. This paper allows us to address the question of the scalability of these systems with an unequivocal ‘yes,’ says lead author Ryan Hamerly, a visiting scientist in MIT’s Research Laboratory of Electronics (RLE) and Quantum Photonics Laboratory and a scientific head of NTT Research. .
Hamerly’s co-authors are graduate student Saumil Bandyopadhyay and senior author Dirk Englund, an associate professor in MIT’s Department of Electrical Engineering and Computer Science (EECS), leader of the Quantum Photonics Laboratory, and a member of the RLE. The research is published today in nature communications.
multiplying with light
An optical neural network is made up of many connected components that function like tunable and reprogrammable mirrors. These tunable mirrors are called Mach-Zehnder Inferometers (MZI). Data from the neural network is encoded into light, which is fired at the optical neural network from a laser.
A typical MZI contains two mirrors and two beam splitters. Light enters the top of an MZI, where it is split into two interfering parts before being recombined by the second beam splitter and then reflected from the bottom to the next MZI in the array. Researchers can take advantage of the interference from these optical signals to perform complex linear algebra operations, known as matrix multiplication, which is how neural networks process data.
But the errors that can occur in each MZI add up quickly as light moves from one device to the next. One can avoid some errors by identifying them early and adjusting the MZIs so that earlier errors are canceled by later devices in the array.
“It’s a very simple algorithm if you know what the errors are. But these errors are notoriously difficult to determine because you only have access to the inputs and outputs of your chip,” says Hamerly. “This motivated us to see if it is possible to create a debugging without calibration.”
Hamerly and his collaborators previously demonstrated a mathematical technique that was a step further. They were able to successfully infer the errors and correctly adjust the MZIs accordingly, but even this did not eliminate all of the error.
Due to the fundamental nature of an MZI, there are cases where it is impossible to adjust a device so that all light flows through the bottom port to the next MZI. If the device loses a fraction of light at each step, and the array is very large, only a small amount of power will be left in the end.
“Even with bug fixing, there is a fundamental limit to how good a chip can be. MZIs are physically incapable of doing certain configurations that they need to be configured for,” he says.
So the team developed a new type of MZI. The researchers added an additional beam splitter to the end of the device, calling it the 3-MZI because it has three beam splitters instead of two. Because of the way this additional beam splitter mixes the light, it becomes much easier for an MZI to achieve the setting it needs to send all the light through its bottom port.
Importantly, the additional beam splitter is only a few micrometers in size and is a passive component, so it does not require any additional wiring. Adding additional beam splitters does not significantly change the size of the chip.
Bigger chip, fewer bugs
When the researchers ran simulations to test their architecture, they found that it can eliminate much of the uncorrectable error that hinders accuracy. And as the optical neural network gets larger, the amount of error in the device actually decreases, the opposite of what happens in a standard MZI device.
Using 3-MZI, they could create a device large enough for commercial uses with an error that has been reduced by a factor of 20, Hamerly says.
The researchers also developed a variant of the MZI design specifically for correlated errors. These occur due to manufacturing imperfections: if a chip’s thickness is slightly wrong, it’s possible that all the MZIs are off by about the same amount, so the errors are all similar. They found a way to change the configuration of an MZI to make it resistant to these types of bugs. This technique also increased the bandwidth of the optical neural network so that it can run three times faster.
Now that they have demonstrated these techniques using simulations, Hamerly and his collaborators plan to test these approaches on physical hardware and continue to move toward an optical neural network that they can implement effectively in the real world.
This research is supported, in part, by a graduate research grant from the National Science Foundation and the US Air Force Office of Scientific Research.