» Current | 2018 | 2017 | 2016 | Subscribe

Linley Newsletter

Cadence Mutates Its DNA to Boost AI

October 16, 2018

Author: Mike Demler

Cadence’s new DNA 100 sheds its predecessor’s reliance on fully programmable architectures, integrating purpose-built neural-network hardware. The licensable core employs a scalable compute engine that supports configurations ranging from 256 multiply-accumulators (MACs) to 4,096 MACs. In a 16nm process, the design runs at 1.0GHz, delivering up to four trillion MACs per second (MAC/s). The company plans to make the intellectual property (IP) available to lead customers in December and to offer it for general licensing in 1Q19.

In addition to streamlining the execution pipeline, the DNA 100 reduces storage requirements and further increases processing efficiency by compressing activation/feature maps and convolution-weight parameters. The DMA block detects the zero-valued weights in filter matrices, storing only nonzero values in the coefficient memory. It also discards zeroes from the input data, as well as those produced by the activation functions. Furthermore, neural networks frequently reuse weights, and the compiler identifies them to eliminate redundant storage.

The compressed feature maps and coefficient arrays work with the sparse-compute engine to reduce MAC operations to only those producing nonzero results. Tensors fetched from memory include flag bits, which indicate matrix operations the engine can skip. According to the company’s simulations, this approach effectively doubles the number of productive MAC operations per cycle for a neural network with 50% activation sparsity and 15% weight sparsity.

But even if it falls short of the company’s lofty projections, the core’s new compression and sparsity features eliminate power wasted on useless operations, and it delivers greater per-cycle throughput than less sophisticated designs. It also retains Tensilica’s traditional customization capabilities. By altering its DNA to combine programmability with dedicated hardware, Cadence has produced a deep-learning accelerator that can handle the most popular CNNs and RNNs in addition to supporting next-generation algorithms.

Subscribers can view the full article in the Microprocessor Report.

Subscribe to the Microprocessor Report and always get the full story!

Purchase the full article

Events

Linley Fall Processor Conference 2018
Covers processors and IP cores used in embedded, communications, automotive, IoT, and server designs.
October 31 - November 1, 2018
Hyatt Regency, Santa Clara, CA
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »