Microprocessor Report (MPR) Subscribe

Videantis IP Cores Pack Lots of MACs

New v-MP6000UDX Combines Video Processing With CNN Accelerator

February 5, 2018

By Mike Demler


Videantis is a small German vendor that has for more than 10 years developed computer-vision (CV) and video-codec intellectual property (IP), which it mostly licenses into automotive cameras. By offering its new v-MP6000UDX architecture, the company aims to expand into markets such as augmented and virtual reality (AR/VR), drones, IoT, and surveillance cameras. Like its predecessors, the v-MP6000 handles CV as well as image and video processing, but it’s the company’s first design that can also serve as a programmable convolutional-neural-network (CNN) accelerator.

Earlier Videantis cores have won designs in rear- and surround-view cameras for passive driver-warning systems, in addition to automatic-emergency-braking (AEB) systems for Level 1 ADASs. The new cores, however, target higher-level ADASs and self-driving cars. Production RTL for the v-MP6000 is available now.

After spinning off in 2004 from Leibniz University in Hannover, the company has operated under the radar compared with larger CV-IP vendors, but it claims that millions of cars employ its IP and that 40 million additional orders for automotive camera systems are in the pipeline. It maintains its headquarters in Hannover, close to German carmakers and Tier One suppliers. Videantis has yet to reveal which carmakers use its IP, but it has announced design wins with ADASens (automotive cameras), Tier One leader Bosch, Ficosa (an automotive subsidiary of Panasonic), and Infineon.

Something Old and Something New

The v-MP6000 is the sixth-generation Videantis architecture, but it carries forward the two core elements of the original design. As Figure 1 shows, the foundation of the company’s IP is the v-MP media processor core and the v-SP stream processor core. The early designs employed a third-party RISC CPU as the stream processor, but in its fourth-generation v-MP4000HDX, the company began licensing both components (see MPR 11/7/05, “Videantis Chases Digital Video”). The v-MP6000 still requires a host CPU, but only for loading the drivers that distribute tasks to the v-MP and v-SP cores.

 

Figure 1. Videantis computer-vision design. Designers can configure the v-MP6000UDX using up to 256 programmable v-MP cores for accelerating CNNs and one or more fixed-function v-SP cores for bit-stream processing.

Videantis designed the v-SP to handle video-codec functions such as bit-stream packing and unpacking. The core isn’t customer programmable, but the company provides the firmware for arithmetic coding, Huffman lossless video compression, and similar tasks. It uses a RISC-like CPU with 32-bit data and instruction paths and can encode video at up to a 1Gbps output bitstream. As an example, a 60fps 1080p RGB camera using 10-bit color produces 3.7Gbps for the raw input, but the v-SP encodes the data and outputs it using a 1:4 lossless compression ratio. Customers can instantiate several v-SPs to support multiple cameras and video streams. The design is compatible with a wide range of codec standards up to 60fps at 8K resolution with 10- or 12-bit color components.

The company describes the v-MP as a dual-issue VLIW core, but each of its not-so-long 64-bit instruction words can contain two vector instructions, two scalar instructions, or one of each. The core has a program counter, scalar and vector execution units, data and instruction memories, and a DMA interface. The scalar units are 32 bits wide and the vector units are 64 bits wide. Each v-MP integrates 64 single-cycle 8x8-bit MACs—an 8x increase relative to the v-MP4000HDX. The MAC units can also execute 32 single-cycle 16x16-bit MAC operations. The architecture supports multicore configurations comprising up to 256 v-MPs connected through a proprietary internal bus fabric to on-chip memory and the SoC bus.

The first-generation v-MPs employed special-purpose ISAs optimized for a particular video-codec function, such as mobile video or HDTVs. The new model uses the same principle architecture, but it adds instructions for convolutional neural networks (CNNs) and other such tasks. The v-MPs are programmable to handle CV, image processing, and video compression. Designers can distribute different jobs to run in parallel across an array of cores.

The v-MPs integrate three data memories (DMEMs). The first, DMEM1, is 4KB, but DMEM2 is configurable for up to 8KB. DMEM3 allows up to 256KB of CNN weights or other local parameter storage. DMEM1 and DMEM2 provide DSP-style XY memories for feeding two operands in parallel to the execution units. The large DMEM3 supports the MAC array, reducing accesses to external memories in compute-intensive CNNs. Each core also integrates a configurable instruction memory and has its own 16-channel DMA interface. CNNs use approximately 4KB of instruction memory per v-MP, but larger CV and video-coding applications typically require 16–32KB per core.

A Distributed Approach

The company withheld die-area information for the v-MP6000, but it says the design is small enough to integrate 10 cores on a tiny fraction of a 40nm CMOS image-sensor chip. The small size also enables designers to assign video- and vision-processing tasks to numerous parallel v-MPs. A Level 2 ADAS employing forward-looking cameras requires approximately a 32-core design, but the v-MP6000 can scale to multicamera Level 3 ADASs with 64 cores or support higher-level autonomous driving with up to 256 v-MP cores.

As Figure 2 shows, one v-MP core in a typical smart rearview camera handles basic post-image-signal-processor (post-ISP) chores, such as lens correction and scaling. The image-processing core passes its output to a second tier of v-MPs to perform vision processing. In parallel with that task, one v-SP and two v-MPs prepare the live video steam for transmission to the cockpit head unit.

 

Figure 2. Automotive rearview camera. This 16-core v-MP6000UDX system comprises 15 v-MPs and one v-SP. The encode.264 encoder processes live video for transmission to the vehicle’s cockpit displays. Twelve v-MP cores handle computer-vision tasks such as deriving 3D structure from motion and object recognition using deep neural networks.

The example rearview automotive camera employs a total of 16 cores. A DNN running a pedestrian-detection algorithm uses eight v-MPs, comprising 512 MACs per cycle. In a 16nm process, these cores can run at 1.5GHz, producing 1,500 GOPS. In parallel, two other v-MPs can generate 3D spatial models from monocular cameras using simultaneous localization and mapping (Slam) or structure-from-motion algorithms. Two additional cores can handle the camera calibration and check for dirty lenses.

Videantis supports the v-MP6000 IP with a library of CV kernels, including functions such as edge detection, image filtering, object detection, and pyramid generation. It provides its v-CNNDesigner tools for importing neural networks trained on Caffe/Caffe2, Pytorch, TensorFlow, TensorFlow Lite, and other popular frameworks. CNNDesigner analyzes data types in the source computational map and minimizes accuracy loss by optimizing the conversion from FP32/FP16 to INT8 or INT16 calculations. It also has features that minimize power consumption, and its mapping function optimizes memory allocation and task partitioning across an array of v-MPs.

Taking on the Field

The Videantis v-MP6000 will compete against neural-network IP cores from several larger vendors, including Ceva, Imagination, and Synopsys. Ceva recently announced its NeuPro cores, which offer the closest competition in raw MAC performance—although they top out at just 25% of the v-MP6000’s rating, as Table 1 shows. NeuPro employs an architecture combining a much more powerful SIMD/VLIW DSP with a special-purpose hardware engine that accelerates all common neural-network layers (see MPR 1/29/18, “Ceva NeuPro Accelerates Neural Nets”). It can run its own RTOS, whereas the Videantis IP requires a host CPU.

 

Table 1. Comparison of CNN accelerators. The Videantis v-MP6000UDX offers much greater scalability than competitors, but it requires a host CPU and must distribute the workload to a large array of cores. (Source: vendors)

Like the v-MP6000, NeuPro can mix 8- and 16-bit precision in the same neural network. The NeuPro Engine is better suited to applications at the high end of what Videantis addresses, including autonomous vehicles and video surveillance, than those at the lower end. Although both products integrate large local memories, Videantis customers must load to a separate DMA interface for each core, potentially causing a memory-access bottleneck in large CNNs.

Imagination’s PowerVR 2NX cores give designers a high degree of flexibility, allowing 1–8 neural-network compute units per core, each unit capable of performing 256x 8x8-bit MACs (see MPR 10/9/17, “Imagination Adds Neural-Network IP”). The 2NX uniquely supports mixing precision from 4- to 16-bit calculations, enabling designers to minimize power consumption. It runs at a maximum 1.0GHz clock frequency, however, so it can’t match the per-core performance of its competitors. The Imagination product is just a general-purpose CNN accelerator; it lacks the compute resources for other vision-processing functions that the Videantis product supports.

Like Ceva, Synopsys combines scalar CPUs, a SIMD/VLIW DSP, and special-purpose hardware in its EV6x cores (see MPR 8/7/17, “Synopsys EV6x Serves Up Big MACs”). The hardware accelerator handles both convolution and classification layers, but the company chose to split the difference between 8- and 16-bit precision by employing 12x12-bit MACs. The MACs can also perform 8-bit operations, but they waste power for such tasks. The minimum-size EV61 delivers 880 MACs, which is overkill for small networks.

A Vision Specialist

By focusing on video and camera-based vision processing, Videantis has built a small but profitable business that has grown steadily over the past 12 years. It initially addressed set-top boxes and media players, but in 2008, it became one of the first IP suppliers to enter the automotive market. Whereas larger competitors have just recently entered the ADAS and autonomous-vehicle segment, Videantis has already won designs at automotive-camera suppliers, processor vendors, and Tier One companies. It’s well established in the German automotive ecosystem, where it participates in that country’s Automotive Cluster supply network.

One challenge for Videantis is to expand its customer base beyond its homeland. The company lacks the global technical-support staff to compete with larger EDA/IP suppliers such as Cadence and Synopsys or with more-diversified IP suppliers such as Ceva and Imagination Technologies. In 3Q17, Videantis received an undisclosed amount of new funding from a German VC firm, with which it plans to add employees and expand outside Germany.

The v-MP6000UDX builds on earlier-generation Videantis vision processors, employing the same dual-core architecture founded on the v-SP and v-MP. Its massive scalability to as many as 256 cores provides far more total MAC capacity than any previously announced neural-network engine, but we expect few designs will even approach that limit. Nevertheless, the ability to combine multiple 64-MAC v-MPs offers much-finer-grain scalability than competitors, enabling designers to employ just the number they need for their application.

That granularity comes at the expense of duplicate instruction decoders and memories when using multiple cores, whereas competitors offer single-core designs that offer as many MACs as dozens of Videantis cores. Moving data among the v-MP6000 cores could also become a bottleneck when executing large neural networks. The company says that the internal interconnect is scalable with the core and memory-module count but withheld its bandwidth and other details.

Developers of high-level ADASs and self-driving cars have yet to reach a consensus regarding whether a central brain or a network of distributed smart sensors is the best architecture, but the v-MP6000 has the scalability to address the entire range. Combined with its technology for processing video bit streams, Videantis can offer customers a complete vision-processing subsystem that takes advantage of its core strength in automotive cameras, but the IP is also well suited to drones, video surveillance, and other applications.

Price and Availability

The v-MP6000UDX is available for licensing now. Videantis doesn’t disclose its licensing or royalty fees. For more information, access www.videantis.com/products/processor-ip.

Events

Linley Fall Processor Conference 2018
Covers processors and IP cores used in embedded, communications, automotive, IoT, and server designs.
October 31 - November 1, 2018
Hyatt Regency, Santa Clara, CA
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »