Microprocessor Report (MPR) Subscribe

Editorial: Intel Hides Behind ICC

GCC Improves CPU Performance Comparisons

October 30, 2017

By Linley Gwennap


SPEC created a series of benchmarks to compare the performance of different systems and different processors. But since these benchmarks are built from source code, the results combine hardware performance and compiler op­timizations. To isolate hardware performance, all testing should use the same compiler. Intel, however, publishes SPEC results only for its proprietary ICC compiler, refus­ing to release any performance data for the broadly used GCC. This position hides the progress that competitors are making: although Intel retains a performance advantage using ICC, the picture is different with GCC.

For example, per the SPEC web site, AMD’s new Epyc 7601 processor scores 2150 on SPECint2006_rate (baseline), putting it 21% ahead of Intel’s older Xeon E5-2699v4 but 26% behind the new top-of-the-line Xeon Plat­inum 8180 (Skylake-SP). These results were generated us­ing ICC for the Xeon systems and AMD’s Open64 compil­er for Epyc. Third-party testing with GCC, however, puts Epyc 40% ahead of the 2699v4 and about even with the Platinum pro­cessor (see MPR 10/30/17, “Epyc Bench­mark Battle”).

To obtain these results, AnandTech compiled the SPEC benchmark code using GCC version 5.4 in 64-bit mode with the -O3 flag—the highest level of general opti­mization—and -Ofast for additional optimizations. Even so, the results were nearly 40% lower than the official scores for the AMD processor and nearly 50% lower for the Intel processor.

One might assume the discrepancy owes largely to the different compilers, but it isn’t that simple. C’T (a German computer magazine) tested various Intel proces­sors on the newer SPEC CPU 2017 benchmark using both ICC -O3 and GCC -O3; surprisingly, the results differed by less than 10%. But even the ICC results were 20-30% lower than Intel’s published scores for the same processors.

C’T notes that Intel uses custom libraries to accelerate certain subtests. The company also compiles some subtests in 32-bit mode to squeeze more data into the caches. As­signing threads to specific cores can boost performance by minimizing the extra cycles it can take to traverse Skylake-SP’s mesh interconnect (this effect is greater for Epyc). A host of other compiler flags (listed in Intel’s official SPEC submissions) provide additional tuning opportunities. The company spent at least three months adjusting all these factors before publishing SPEC2017 results for Skylake-SP.

Few software developers, however, spend much time optimizing performance, instead focusing on functionality and time to market. Most prefer GCC, particularly if their applications integrate open-source code. GCC code is easi­er to debug and isn’t tied to a single processor architecture. ICC and other optimizing compilers mainly serve in high-performance-computing (HPC) applications such as phys­ics and chemistry modeling, deep learning, and big data, where some of the math optimizations can make a big dif­ference.

When comparing the performance of AMD and Intel chips, the only popular test that depends on such aggres­sive compiler optimization is SPEC CPU. The SPEC Java benchmark runs a standard JVM, and most other server benchmarks run the same binary code on both x86 proces­sors. Comparing the performance of non-x86 server pro­cessors is much more challenging, however, as all code must be recompiled.

To support the ongoing interest in GCC, I encourage Intel to publish its own GCC scores, which would likely be better than the results that third parties achieve. For exam­ple, AMD reported a GCC score for Epyc that’s 4% better than the AnandTech score, despite using -O2 instead of -O3. Conversely, it measured the Intel processor at a score much worse than AnandTech’s. MPR doesn’t report scores that vendors obtain for competing products, but some oth­er publications do. By publishing its own GCC results, In­tel could drive out these bad benchmarks. It has posted GCC scores in the past, but not within the last several years.

GCC provides a compiler-agnostic method of com­paring processors, including those using ARM, Power, and other RISC architectures. Although GCC allows some tun­ing opportunities, the open-source compiler is more diffi­cult to manipulate. By releasing GCC scores, Intel could dispel the miasma of competitors’ testing and demonstrate that the performance of its Xeon designs doesn’t depend on an op­timizing compiler that relatively few customers use.

Events

Linley Fall Processor Conference 2018
Covers processors and IP cores used in embedded, communications, automotive, IoT, and server designs.
October 31 - November 1, 2018
Hyatt Regency, Santa Clara, CA
More Events »

Newsletter

Linley Newsletter
Analysis of new developments in microprocessors and other semiconductor products
Subscribe to our Newsletter »