Can mixed-signal architectures boost artificial intelligence performance using less power?
March 14th, 2019 – By Brian Bailey, Semiconductor Engineering
If the only tool you have is a hammer, everything looks like a nail. But development of artificial intelligence (AI) applications and the compute platforms for them may be overlooking an alternative technology—analog.
The semiconductor industry has a firm understanding of digital electronics and has been very successful making it scale. It is predictable, has good yield, and while every development team would like to see improvements, the tooling and automation available for it makes large problems tractable. But scaling is coming to an end and we know that applications still hunger for more compute capabilities. At the same time, the power being consumed by machine learning cannot be allowed to grow in the way that it has been.
The industry has largely abandoned analog circuitry, except for interfacing with the real world and for communications. Analog is seen as being difficult, prone to external interference, and time-consuming to design and verify. Moreover, it does not scale without digital assistance and does not see many of the same advantages as digital when it comes to newer technologies.
And yet, analog may hold the key for the future progression of some aspects of AI.
How we got here
At the heart of AI applications is the multiply accumulate function (MAC) or dot product operation. This takes two numbers, multiplies them together, and adds the result to an accumulator. The numbers are fetched from and stored to memory. Those operations are repeated many times and account for the vast majority of the time and power consumed by both learning and inferencing.
One reason for the rapid growth in machine learning was the availability of GPUs. These devices, although initially intended for graphics processing, have large numbers of MACs and high-speed memory interfaces. They can perform the necessary computations much faster than a general-purpose CPU. The downside is that GPUs tend to utilize floating-point arithmetic, which is well beyond the needs of AI algorithms. However, most research has used floating point because of this.
The industry is trying to pare back wasted time and power by migrating to fixed-point mathematics or modified forms of floating point that are more suited to the task. It was initially thought that 12 bits of precision were necessary, but the latest developments are pushing toward 8-bit computation. Some research is going as far as single-bit processing showing that it only reduces accuracy by a small amount.
The latest Google TPU, a chip targeting machine learning, contains 65,536 8-bit MAC blocks and consumes so much power that the chip has to be water-cooled. Given that technology scaling is slowing, we cannot expect to increase the number of MACs integrated onto a chip, unless the number of bits is reduced further. However, even going to one limits the gains that can be made. Something more is needed.
Improvements can be made on traditional von Neumann architectures. “The increasing performance of microcontrollers and the proliferation of libraries and middleware to support machine learning on them has helped to enable inference engines to run farther from the cloud and closer to the edge of the network,” says Rhonda Dirvin, senior director of marketing programs for Arm’s Automotive & IoT line of business. “With this migration comes greater usefulness of the data for things like sound identification, object recognition and vibration monitoring for motor health. As the data becomes more useful, more data will be collected. Collecting the data means taking our analog world and converting it to digital, achieved via mixed-signal ICs. New signal processing features have been added to modern MCUs, allowing the processing of the signal to be done digitally on the Arm-based MCU, for example, not requiring an additional DSP for many applications.”
That requires better analog-to-digital converters (ADC). “ADCs are needed to convert the analog sensor inputs to digital signal,” says Youbok Lee, senior technical staff engineer for the Mixed Signal and Linear Devices Division of Microchip Technology. “This digital signal is then processed using an AI algorithm that makes use of digital machine learning blocks. As machine learning applications spread out, more energy efficient adaptive mixed-signal analog-front devices will be needed.”
Could analog help?
It has been proven that AI functions can be performed using orders of magnitude less power and that it is capable of solving problems far more complex than AI systems currently being developed. That example is the mammalian brain. Even the most power hungry, the human brain, only consumes about 25W. The power consumption of a TPU is likely between 200W and 300W. While it contains 64K processing units, the human brain contains about 86 billion. We are many orders of magnitude away from what is possible. While attempting to replicate the brain is probably not the ideal path forward, it does suggest that putting all of our eggs into the digital basket may not be the most fruitful in the long run.
Some people in the industry agree. “The digital AI ASIC might not be the ideal solution for IoT edge computing due to its high-power consumption and form factor,” says Hiroyuki Nagashima, U.S. general manager at Alchip. “Mixed-signal machine learning, inspired by nature like the human brain, should play an import role in the future world. Are we able to build a machine that can sense, compute and learn like human brain, and only consume several watts of power? It is quite a challenge, but scientists should aim to this direction.”
We already have some of the building blocks. “Analog-dot-product can make use of using analog filters, op-amps, etc.,” says Microchip’s Lee. “For example, you can compare two signals or mix them, and you can make a decision from the results. There are many cases that analog computation is much faster than digital computation.”
Scrounging around with “older technology” helped out Mike Henry, CEO of Mythic AI, who said in an interview with Talla Inc. that “the hardware that most companies are putting out is not well matched to what these algorithms need to do. We dug up an old technology—analog computing—where we manipulate small electrical currents to do math. It has been talked about for 30 or 40 years but never successfully executed, but we believe we have come up with a working solution.”
Henry sees a mismatch in the capabilities that people need in inferencing solutions and what semiconductor companies can deliver in a reasonable package and power consumption today. He believes that it will take a quantum leap in architectures to enable inferencing to be deployed more universally.
It is possible to produce a chip that follows the digital architectures but uses analog circuitry. Toshiba has produced a chip that performs MAC operations using phase-domain analog technology. It uses the phase domain of an oscillator circuit by dynamically controlling oscillation time and frequency. They claim that the technology makes it possible to collectively process multiplication, addition, and memory operations that are conventionally processed by individual digital circuits, using one-eighth of the power of digital circuits with the same area.
Several problems tend to be discussed in the context of analog and AI. They center around precision and variability. One of the issues with analog is that they have limited precision, basically defined by the noise floor. Digital circuitry has no such limitation, but as the need for precision is reduced, it is becoming in the realm of what analog circuitry is capable of providing.
“For applications with very high processing speed, such as autonomous cars, the key requirement is fast parallel computations,” says Lee. “Therefore, this application needs an analog front-end device with very high conversion speeds and zero-latency is highly necessary. To achieve this requirement accuracy is compromised.”
There is a lot of work that remains to be done. “Digital data is easy to copy, but analog is not,” says Alchip’s Nagashima. “Researchers have already proved the concept by using emerging memories like MRAM or RRAM. However, emerging non-volatile memory (NVM) struggles with defect rate and process variations in large arrays, which is essential for imaging tasks. Several breakthrough technologies are required, including a new type of sensor, emerging memories, and new machine learning frameworks (not CNN).”
New computation concepts are important. “The idea is that these things can perform multiply-accumulates for fully connected neural networks layers in a single timestep,” explained Geoffrey W. Burr, principal RSM at IBM Research. “What would otherwise take a million clocks on a series of processors, you can do that in the analog domain, using the underlying physics at the location of the data. That has enough seriously interesting aspects to it in time and energy that it might go someplace.”
That leaves variability as the big problem. If analog circuitry is used for inferencing, the result may not be deterministic and is more likely to be affected by heat, noise or other external factors than a digital inferencing engine.
But analog could have some significant advantages in this domain. When digital goes wrong, it can go catastrophically wrong, whereas analog is able to tolerate errors much better. “Neural networks are fragile,” said Dario Gil, director of IBM Research, at a panel during the Design Automation Conference in 2018. “If you have seen the emergence of adversarial networks and how you can inject noise into the system to fool it into classifying an image or fooling it into how it detects language of a transcription, this tells you the fragility that is inherent in these systems. You can go from something looking like a bus and after noise injections it says it a zebra. You can poison neural networks and they are subject to all sorts of attacks.”
Digital fails, analog degrades. Would that be true for analog neural networks and if so, could they actually be more trustworthy?
Rethinking the problem
AI is basically a statistical process, said participants in a recent roundtable. When stimulus is applied during learning, it is not known with certainty what the output will be. An attempt is made, by the creation of good data sets, to improve that certainty, but it can never be known exactly. Similarly, most researchers do not fully understand what is happening within a neural network. DARPA has a program to create more explainable AI systems, arguing that if you cannot explain the actions a system is taking, it is impossible to trust it.
Others believe that such a goal is not feasible, and three laws of AI have been defined. The third law states that “any system simple enough to be understandable will not be complicated enough to behave intelligently, while any system complicated enough to behave intelligently will be too complicated to understand.”
The answer probably lies somewhere in between. “We must create AI that is less of a black box,” said Gil. “It must be more explainable, [so] we have a better understanding of what is happening in the neural networks, have debuggers and can deal with errors in those networks.”
But Gil also said that this visibility may not always be necessary. “It is okay to use a black box when you are a recommender system for books, but when used in the context of high-stakes decision-making, where there is a lot of investment, or the decision have high stakes, a total block box is mostly unacceptable to so many professions.”
Lee agrees with this. “In deep learning AI applications, we see various research reports that some forms of machine learning tasks do not require high precision. Better energy efficiency and faster speeds are more important than accuracy. For example, 1-bit processing can still obtain high accuracy. Because of this, analog techniques, which can be more energy-efficient than digital at low precision, can be used as a co-processor to accelerate workloads that are traditionally performed on digital but with up-to-5x better energy efficiencies.”
Are two kinds of systems possible? One where precision is low, but processing is fast and at low power, that can call upon a higher precision system when there is a lack of confidence in the results. This system may be similar to the way the human brain functions, performing low-precision processing on the mass of data coming from the sensors and only focusing on a very small amount of the data stream.
“We have been working with phase-change memory, and we have built chips with over a million PCM elements and demonstrated that you can implement deep-learning training with 500X improvements over a traditional GPU with similar levels of accuracy,” says Gil. “We also have a hybrid precision system so some of it can be at low precision but very efficient using a PCM matrix array, but you also have some high precision logic to be able to fine tune and get arbitrary precision you need for some calculations.”
Clearly some problems have to be overcome for analog AI systems to become practical. If the demand for them increases, we can feel confident that researchers and people within the industry will be able to solve those problems.
“Sometimes you have to look at alternatives,” said Burr. “When 2D flash went up to the wall, 3D flash no longer looked to be quite as hard. If we keep seeing improvement in existing technology that provide 2X here and another 2X there, then analog, in-memory computing will get pushed out, but if the next improvement is marginal, analog memory starts to look a lot more attractive. As researchers, we have to be ready when that opportunity comes around.”