Nehalem: The Unwritten Chapters

Despite being extremely well prepared in having Nehalem, motherboards, coolers and memory well before launch, the run up to the NDA lift of Intel's Core i7 processors was stressful. There was so much to test: multi-GPU compatibility with X58, memory controller performance, general application performance, overclocking, Hyper Threading, etc...

We're all still hard at work on sorting out the details, Gary is working on a X58 motherboard roundup and has been testing 12GB memory configurations for the past several days (as well as working with board vendors to improve performance/compatibility with 12GB but I'll let you tell him about that), Derek is working on multi-GPU performance and Kris has been working on an overclocking guide. What have I been up to? Well, I've been trying to answer a few lingering questions about Nehalem.

What I've got today are the first results of the questions I've been asking, I've spent the past week looking at power efficiency, memory latency and talking to some of Intel's finest on the phone about Nehalem. And I'm back to report, gather 'round for Nehalem: The Unwritten Chapters.

The Uncore
I got a little more detail from Intel on the un-core clock. Just like Phenom, Intel’s Core i7 is divided into an area called the “core” and an area called the “uncore”. The core contains the individual processor cores and their L1/L2 caches, while the uncore houses the memory controller and the shared L3 cache. In our review I mentioned that the uncore runs at 2.66GHz, which is true, but only for the Core i7-965. The Core i7-940 and 920 both run the uncore at 2.13GHz.

The uncore clock is defined by Intel just like the core clock is - Intel sets it based on yield and performance targets. As I mentioned in the launch review, the uncore clock runs at a simple multiplier of the bclk (133MHz): 20x for the i7-965 and 16x for the i7-940/920. The uncore also runs at its own voltage (1.20V) and that voltage doesn't scale up/down.

On Intel’s own X58 board the uncore clock is configured on the memory settings page and is simply called UCLK:

I took the i7-965, ran it at 2.66GHz to simulate an i7-920, and varied the uncore clock to measure the impact in L3 cache and memory latency:

Core Clock Uncore Clock L3 Latency Main Memory Latency x264 HD Benchmark Cinebench XCPU Benchmark
2.66GHz 2.93GHz 34 cycles 143 cycles 72.8 fps 13456
2.66GHz 2.66GHz 36 cycles 148 cycles 73.0 fps 13429
2.66GHz 2.13GHz 41 cycles 159 cycles 72.7 fps 13182

At a 2.66GHz uncore clock things seem to hit a sweet spot, although the translation to real-world performance just isn't there. Perhaps in a very memory intensive test we'd see something more pronounced, but even the x264 HD encoding test showed no performance difference between the three uncore clock speeds.

Surprisingly enough, I couldn’t get the i7-965’s uncore to hit 3.2GHz - Vista would bluescreen before I could even get to the desktop. As the table above shows, increases in uncore frequency aren't nearly as useful as increasing the CPU frequency. Intel recognized this performance relationship as well and chose to optimize the uncore for power consumption, not clock speed, which means that the uncore won't be able to clock as high as the core itself. You could always increase the voltage a lot to try and boost uncore speed but right now it's not looking like the tradeoff would be worth it as you'd increase power quite a bit.


No comments: