Text preview for : Intel C2D Performance.PDF part of Intel C2D Performance Intel Intel C2D Performance.PDF
Back to : Intel C2D Performance.PDF | Home
PERFORMANCE ANALYSIS OF INTEL
CORE 2 DUO PROCESSOR
A Thesis
Submitted to the Graduate Faculty of the
Louisiana State University and
Agricultural and Mechanical College
in partial fulfillment of the
requirements for the degree of
Master of Science in Electrical Engineering
In
The Department of Electrical and Computer Engineering
By
Tribuvan Kumar Prakash
Bachelor of Engineering in Electronics and Communication Engineering
Visveswaraiah Technological University, Karnataka, 2004.
August 2007
Acknowledgements
I would like to express my gratitude to my advisor, Dr. Lu Peng for his guidance,
and constant motivation towards the completion of this thesis. His technical advice and
suggestions helped me to overcome hurdles and kept me enthusiastic and made this work
a wonderful learning experience.
I would like to thank my committee members Dr. David Koppelman and Dr.
Suresh Rai for taking time out of their busy schedule and agreeing to be a part of my
committee. I would like to also thank them for their valuable feedback.
I would like to thank the faculty members and Shirley and Tonya of the
Department of Electrical Engineering, for all the support and making my study at
Louisiana State University a pleasant experience.
I would like to thank my parents and sister without whom I would not have made
it to this point. I would like to thank my friends Srinath Sitaraman and Balachandran
Ramadas for their help while collecting data. I would also like to thank my roommates &
friends here at LSU and back home for all the love and unending support.
ii
Table of Contents
List of Tables ..................................................................................................................... iv
List of Figures ..................................................................................................................... v
Abstract .............................................................................................................................. vi
1. Introduction ................................................................................................................. 1
1.1 Overview ............................................................................................................. 1
1.2 Architecture of Intel Core 2 Duo ........................................................................ 3
2. Performance Analysis of SPEC CPU Benchmarks Running on Intel's Core 2 Duo
Processor ............................................................................................................................. 7
2.1 Overview ............................................................................................................. 7
2.2 Methodology ....................................................................................................... 7
2.3 Measurement Results .......................................................................................... 9
2.3.1 IPC and Instruction Profile ......................................................................... 9
2.3.2 L1 D-Cache Misses ................................................................................... 11
2.3.3 L2 Cache Misses ....................................................................................... 13
2.3.4 Branch Misprediction................................................................................ 15
3. Performance Comparison of Dual Core Processor Using Microbenchmarks .......... 17
3.1 Overview ........................................................................................................... 17
3.2 Architecture of Dual-Core Processors .............................................................. 18
3.2.1 Intel Pentium D 830 .................................................................................. 18
3.2.2 AMD Athlon 64X2 ................................................................................... 19
3.2.3 Processor Comparison .............................................................................. 20
3.3 Methodology ..................................................................................................... 21
3.4 Memory Bandwidth and Latency Measurements ............................................. 23
4. Performance Comparison of Dual Core Processors Using Multiprogrammed and
Multithreaded Benchmarks ............................................................................................... 31
4.1 Overview ........................................................................................................... 31
4.2 Methodology ..................................................................................................... 31
4.3 Multiprogrammed Workload Measurements .................................................... 33
4.4 Multithreaded Program Behavior ..................................................................... 36
5. Related Work ............................................................................................................ 39
6. Conclusion ................................................................................................................ 41
References......................................................................................................................... 43
Vita.................................................................................................................................... 46
iii
List of Tables
Table 1.1 Specification of Intel Core 2 Duo machine. ....................................................... 6
Table 2.1 SPEC CPU20006 Integer Benchmark ................................................................ 8
Table 2.2 SPEC CPU20006 Floating Point Benchmark..................................................... 8
Table 3.1 Specifications of the selected processors.......................................................... 21
Table 3.2 Memory operations from Lmbench................................................................... 22
Table 3.3 Kernel operations of the STREAM and STREAM2 benchmarks....................... 23
Table 4.1 Input parameters of the selected multithreaded workloads .............................. 33
iv
List of Figures
Figure 1-1 Block Diagram of Intel Core 2 Duo Processor ................................................. 4
Figure 1-2 Block Diagram of Intel Core Micro-architecture's IP Prefetcher..................... 5
Figure 2-1 IPC of SPEC Benchmarks.............................................................................. 10
Figure 2-2 Instruction Profile of SPEC Benchmarks........................................................ 11
Figure 2-3 L1-D Cache Misses per 1000 instructions of SPEC Benchmarks .................. 12
Figure 2-4 Sample Code of MCF Benchmark .................................................................. 13
Figure 2-5 L2 Cache Misses per 1000 instructions of SPEC Benchmarks..................... 14
Figure 2-6 Sample Code of LBM Benchmark.................................................................. 15
Figure 2-7 Branch Mispredicted Per 1000 Instructions of SPEC Benchmarks ................ 16
Figure 3-1 Block Diagram of Pentium D Processor ......................................................... 19
Figure 3-2 Block Diagram of AMD Athlon 64x2 Processor............................................ 20
Figure 3-3 Memory bandwidth collected from the lmbench suite (1 or 2 copies)............ 25
Figure 3-4 Memory load latency collected from the lmbench suite (1 or 2 copies) ......... 27
Figure 3-5 Memory bandwidth and latency collected from the STREAM and STREAM2
benchmarks (1 or 2 copies) ............................................................................................... 29
Figure 4-1 SPEC CPU2000 and CPU2006 benchmarks execution time.......................... 34
Figure 4-2 Multi-programmed speedup of mixed SPEC CPU 2000/2006 benchmarks... 35
Figure 4-3 (a) Execution time for 1-thread version of selected multithreaded programs. 36
Figure 4-4 Throughput of SPECjbb2005 running with 1 to 8 warehouses....................... 38
v
Abstract
With the emergence of thread level parallelism as a more efficient method of
improving processor performance, Chip Multiprocessor (CMP) technology is being more
widely used in developing processor architectures. Also, the widening gap between CPU
and memory speed has evoked the interest of researchers to understand performance of
memory hierarchical architectures. As part of this research, performance characteristic
studies were carried out on the Intel Core 2 Duo, a dual core power efficient processor,
using a variety of new generation benchmarks. This study provides a detailed analysis of
the memory hierarchy performance and the performance scalability between single and
dual core processors. The behavior of SPEC CPU2006 benchmarks running on Intel
Core 2 Duo processor is also explained. Lastly, the overall execution time and
throughput measurement using both multi-programmed and multi-threaded workloads for
the Intel Core 2 Duo processor is reported and compared to that of the Intel Pentium D
and AMD Athlon 64X2 processors. Results showed that the Intel Core 2 Duo had the
best performance for a variety of workloads due to its advanced micro-architectural
features such as the shared L2 cache, fast cache to cache communication and smart
memory access.
vi
1. Introduction
1.1 Overview
This thesis work analyzes the performance characteristics of major architectural
developments employed in Intel Core 2 Duo E6400 processor with 2.13GHz [15]. Intel
Core 2 Duo is a high performance and power efficient dual core Chip-Multiprocessor
(CMP). CMP embeds multiple processor cores into a single die to exploit thread-level
parallelism for achieving higher overall chip-level Instruction-Per-Cycle (IPC) [4] [14]
[15] [21]. In a multi-core, multithreaded processor chip, thread-level parallelism
combined with increased clock frequency exerts a higher demand for on-chip and off-
chip memory bandwidth causing longer average memory access delays. There has been
great interest shown by researchers to understand the underlying reasons that cause these
bottlenecks in processors.
The advances in circuit integration technology and inevitability of thread level
parallelism over instruction level parallelism for performance efficiency has made Chip-
Multiprocessor (CMP) or multi-core technology the mainstream in CPU designs. With
the evolution of processor architectures over time, the benchmarks used to measure the
performance of these high performance processors have also continued to evolve. Many
single and multi threaded benchmarks have been defined and developed to stress the
processor units to its maximum limit. Standard Performance Evaluation Corporation
(SPEC) is one of the non profit organizations that have been developing benchmarks to
meet the requirements of these dynamic processor architectures for nearly a decade.
SPEC CPU2006 is a single-threaded compute-intensive benchmark developed by SPEC
using C, C++ and FORTRAN programming language. To understand the performance of
1
multi-core processors completely it is equally important to understand their behavior
while running multi threaded applications. SPEC JBB2005, lmbench, bioperf and splash2
are some of the most popularly used multithreaded benchmarks for this purpose.
This thesis work focuses mainly on workload characteristics, memory system
behavior and multi-thread interaction of the benchmarks. This work also seeks to report
performance measurement on Intel Core 2 Duo E6400 with 2.13GHz [15] and compare
the results with Intel Pentium D 830 with 3.0GHz [19] and AMD Athlon 64X2 4400+
with 2.2GHz [2]. In contrast to existing performance evaluations [13] [26] [27] that
usually provide overall execution time and throughput, this work emphasizes on the
memory hierarchy performance. It reports the measured memory access latency and
bandwidth as well as cache-to-cache communication delays. It also examines the
performance scalability between single and dual cores on the three tested processors.
Summarized below are a few interesting findings based on experiments conducted
as part of this research:
SPEC CPU2006 running on Core 2 Duo exerts less pressure on the L1
cache compared to SPEC CPU2000 benchmarks. However, CPU2006
benchmarks have larger data sets and longer execution times resulting in
comparatively high stress on L2 cache.
The cache to cache latency of Core 2 Duo was measured to be 33ns. Core
2 Duo has high memory bandwidth and low latency as a result of on-chip
access to the other L1 cache and the presence of aggressive memory
dependence predictors. . Its shared L2 generates less off-chip traffic than
the other two.
2
Due to its shared L2 cache access the execution time of all single threaded
workloads are fast and range from 56-1500 seconds for Core 2 Duo. The
average multi-programmed speedup for CPU2006 and CPU2000
benchmarks was measured at 1.76 and 1.77 respectively which is lower
than the ideal speedup of 2. The Core 2 Duo's speed-ups are constrained
due to its ability to use the entire L2 cache.
1.2 Architecture of Intel Core 2 Duo
The Intel Core 2 Duo E6400 (Figure 1.1) processor supports CMP and belongs to
the Intel's mobile core family. It is implemented by using two Intel's Core architecture
on a single die. The design of Intel Core 2 Duo E6400 is chosen to maximize
performance and minimize power consumption [18]. It emphasizes mainly on cache
efficiency and does not stress on the clock frequency for high power efficiency. Although
clocking at a slower rate than most of its competitors, shorter stages and wider issuing
pipeline compensates the performance with higher IPC's. In addition, the Core 2 Duo
processor has more ALU units [13]. The five main features of Intel Core 2 Duo
contributing towards its high performance are: