EE Times reports that Graphcore of Bristol, UK has put on the market a new type of processor for AI acceleration. It’s called the intelligence processing unit (IPU). According to CEO Nigel Toon, the IPU processor, is the most complex processor chip that’s ever been built.
It’s described as “just shy of 24 billion transistors on a single die, in 16 nm. Each chip delivers 125 teraFLOPS, and we can put eight of those cards into a standard 4U chassis and connect them together through IPU links.”
Toon went on to tell EE Times reporter, Sally Ward-Foxton, that the processors can work together as a single processing element that delivers two petaFLOPS of compute, but in a different form to what exists in CPUs and GPUs, which provides a much more efficient processing platform for machine intelligence. These modules will go into servers for cloud computing, and potentially into autonomous vehicles as well.
Ward-Foxton asked Toon about the IPU’s performance compared to leading GPUs on the market. His response was, “If you’re doing feed-forward convolutional neural networks used for classification of static images, GPUs do that quite well. We would be able to offer a performance advantage of two or three, sometimes five times.
With much more complex models, those that have data passing through and then feeding back to try and understand context (conversations, for example), you’re passing the data through a number of times and you need to do that very quickly. Because all of the model is held inside our processor, on applications like that, we’re much faster than a GPU, maybe ten, twenty or fifty times faster.”
As for IPU’s suitability for inference and training, Toon’s responded saying, “Yes, you can use the same IPU chip for inference as well as training. That was very important to us, from an architectural point of view, since as machine learning evolves, systems will be able to learn from experience.”
Toon explained that the keys to inference performance are as follows: low latency and being able to work with small models, small batches, and trained models where you might be trying to introduce sparsity into the model.
He said, “We can do all these things efficiently on the IPU. So, in that 4U chassis, where you’ve got 16 IPUs all working together to do training, we could have each of those IPUs running a separate inference task, controlled by a virtual machine running on a CPU. What you end up with is a piece of hardware that you can use for training. Then, once you’ve trained the models, deploy it, but then as the models evolve and we start to want to learn from experience, the same hardware can be used to do that.”
From a technology perspective, all sounds good. However, what about the competition and especially GPUs already on the market? Toon said that GPUs are holding back new innovations.
He concluded by saying, “If you look at the types of models that people are working on, they are primarily working on forms of convolutional neural networks because recurrent neural networks and other kinds of structures, [such as] reinforcement learning, don’t map well to GPUs. Areas of research are being held back because there isn’t a good enough hardware platform, and that’s why we’re trying to bring IPUs to market.”