by Steven Woo
Neural networks (NNs) span a wide range of topologies and sizes. Some neural networks are relatively simple and have only two or three layers of neurons, while so-called deep neural networks may comprise 100+ layers of neurons. In addition, the layers can be extremely wide – with hundreds to thousands of neurons – or they can be much narrower with as few as a half dozen neurons, for example. Determining the topology and size of a neural network for a specific task is typically based on a combination of experimentation and comparison to similar solutions.
[Read on] Go here for our primer on:
HBM2E Implementation & Selection – The Ultimate Guide »
Training, Not Programming
Instead of being programmed, neural networks are trained to perform certain tasks like classification and identification. The training phase involves iteratively providing examples – or training sets – to the neural network and presenting the desired outcome. In some cases, the desired outcome is determined a priori by labeling the data. For example, a neural network might be trained to identify cats and dogs from a series of images. In this case, the images in the training set are labeled as having cats or dogs, with the proper type of animal being the desired output.
During training, the characteristics of the neural network, including weights and biases, are iteratively adjusted. This process, known as back propagation or back prop, uses mathematical equations (Stochastic Gradient Descent is one popular method) to increase the likelihood that the same input will produce the desired outcome from the neural network. Once the neural network achieves a high enough accuracy on the training set, it can be deployed in the field to perform inferencing on data it has never seen before.
NN Training Hardware
Depending on application requirements, the training and inference process can be hosted on any number of hardware platforms including graphics processing units (GPUs), custom application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs). Modern training sets can be enormous in size, especially those used for image classification in data center environments. Until recently, the hardware of choice for training neural networks has been graphics adapters due to their combination of performance, power efficiency and high memory capacity to hold large training sets. More recently, a resurgence of interest and funding for specialized neural network silicon has resulted in a number of purpose-built neural network chips becoming available that achieve even higher levels of performance and power-efficiency, making them the new solution of choice for neural networks.
The performance of hardware designed for neural network training and inference is heavily dependent upon memory bandwidth. The memory system typically holds the neural network parameters – weights and biases – along with training data. With hardware optimized for performing fast computations, neural network hardware continually stresses the memory system as model parameters as well as training and inference data are fetched from memory. In the case of training, the process of cycling through the training data as quickly as possible strains both memory capacity and bandwidth. For this reason, the training sets and model parameters are held in local memory to avoid data transfers over the much slower PCIe interconnects.
Neural Networks: Deployment and Future Evolution
While still at a relatively nascent stage, the current wave of AI innovation has seen numerous advances in both hardware and software, with a robust emphasis on co-design. A significant percentage of modern teams are composed of software staff, in some cases outnumbering their hardware engineering counterparts. This paradigm supports a diverse range of AI applications and addresses a key industry demand for hardware that is designed to run specific software and vice versa.
Co-design efforts have yielded interesting architecture and software innovations including reduced-precision computation, new topologies and new training algorithms that take advantage of mixed-precision to strike a balance between training speed, accuracy and power consumption. We will be taking a closer look at reduced-precision computation techniques in future blog posts.
Instead of being programmed, neural networks are trained to perform certain tasks such as classification and identification. The training phase involves iteratively providing examples – or training sets – to the neural network and presenting the desired outcome. Depending on application requirements, the training and inference process can be hosted on any number of hardware platforms including graphics processing units (GPUs), custom application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs). More recently, specialized neural network silicon has been introduced into the market and have become the new solution of choice.
Interested in reading more about machine learning and neural networks? You can browse our article archive on the subject here.