Text preview for : Intel C2D Performance.PDF part of Intel C2D Performance Intel Intel C2D Performance.PDF



Back to : Intel C2D Performance.PDF | Home

PERFORMANCE ANALYSIS OF INTEL
CORE 2 DUO PROCESSOR



A Thesis
Submitted to the Graduate Faculty of the
Louisiana State University and
Agricultural and Mechanical College
in partial fulfillment of the
requirements for the degree of
Master of Science in Electrical Engineering




In
The Department of Electrical and Computer Engineering




By
Tribuvan Kumar Prakash
Bachelor of Engineering in Electronics and Communication Engineering
Visveswaraiah Technological University, Karnataka, 2004.
August 2007
Acknowledgements

I would like to express my gratitude to my advisor, Dr. Lu Peng for his guidance,

and constant motivation towards the completion of this thesis. His technical advice and

suggestions helped me to overcome hurdles and kept me enthusiastic and made this work

a wonderful learning experience.

I would like to thank my committee members Dr. David Koppelman and Dr.

Suresh Rai for taking time out of their busy schedule and agreeing to be a part of my

committee. I would like to also thank them for their valuable feedback.

I would like to thank the faculty members and Shirley and Tonya of the

Department of Electrical Engineering, for all the support and making my study at

Louisiana State University a pleasant experience.

I would like to thank my parents and sister without whom I would not have made

it to this point. I would like to thank my friends Srinath Sitaraman and Balachandran

Ramadas for their help while collecting data. I would also like to thank my roommates &

friends here at LSU and back home for all the love and unending support.




ii
Table of Contents

List of Tables ..................................................................................................................... iv

List of Figures ..................................................................................................................... v

Abstract .............................................................................................................................. vi

1. Introduction ................................................................................................................. 1
1.1 Overview ............................................................................................................. 1
1.2 Architecture of Intel Core 2 Duo ........................................................................ 3

2. Performance Analysis of SPEC CPU Benchmarks Running on Intel's Core 2 Duo
Processor ............................................................................................................................. 7
2.1 Overview ............................................................................................................. 7
2.2 Methodology ....................................................................................................... 7
2.3 Measurement Results .......................................................................................... 9
2.3.1 IPC and Instruction Profile ......................................................................... 9
2.3.2 L1 D-Cache Misses ................................................................................... 11
2.3.3 L2 Cache Misses ....................................................................................... 13
2.3.4 Branch Misprediction................................................................................ 15

3. Performance Comparison of Dual Core Processor Using Microbenchmarks .......... 17
3.1 Overview ........................................................................................................... 17
3.2 Architecture of Dual-Core Processors .............................................................. 18
3.2.1 Intel Pentium D 830 .................................................................................. 18
3.2.2 AMD Athlon 64X2 ................................................................................... 19
3.2.3 Processor Comparison .............................................................................. 20
3.3 Methodology ..................................................................................................... 21
3.4 Memory Bandwidth and Latency Measurements ............................................. 23

4. Performance Comparison of Dual Core Processors Using Multiprogrammed and
Multithreaded Benchmarks ............................................................................................... 31
4.1 Overview ........................................................................................................... 31
4.2 Methodology ..................................................................................................... 31
4.3 Multiprogrammed Workload Measurements .................................................... 33
4.4 Multithreaded Program Behavior ..................................................................... 36

5. Related Work ............................................................................................................ 39

6. Conclusion ................................................................................................................ 41

References......................................................................................................................... 43

Vita.................................................................................................................................... 46



iii
List of Tables

Table 1.1 Specification of Intel Core 2 Duo machine. ....................................................... 6

Table 2.1 SPEC CPU20006 Integer Benchmark ................................................................ 8

Table 2.2 SPEC CPU20006 Floating Point Benchmark..................................................... 8

Table 3.1 Specifications of the selected processors.......................................................... 21

Table 3.2 Memory operations from Lmbench................................................................... 22

Table 3.3 Kernel operations of the STREAM and STREAM2 benchmarks....................... 23

Table 4.1 Input parameters of the selected multithreaded workloads .............................. 33




iv
List of Figures

Figure 1-1 Block Diagram of Intel Core 2 Duo Processor ................................................. 4

Figure 1-2 Block Diagram of Intel Core Micro-architecture's IP Prefetcher..................... 5

Figure 2-1 IPC of SPEC Benchmarks.............................................................................. 10

Figure 2-2 Instruction Profile of SPEC Benchmarks........................................................ 11

Figure 2-3 L1-D Cache Misses per 1000 instructions of SPEC Benchmarks .................. 12

Figure 2-4 Sample Code of MCF Benchmark .................................................................. 13

Figure 2-5 L2 Cache Misses per 1000 instructions of SPEC Benchmarks..................... 14

Figure 2-6 Sample Code of LBM Benchmark.................................................................. 15

Figure 2-7 Branch Mispredicted Per 1000 Instructions of SPEC Benchmarks ................ 16

Figure 3-1 Block Diagram of Pentium D Processor ......................................................... 19

Figure 3-2 Block Diagram of AMD Athlon 64x2 Processor............................................ 20

Figure 3-3 Memory bandwidth collected from the lmbench suite (1 or 2 copies)............ 25

Figure 3-4 Memory load latency collected from the lmbench suite (1 or 2 copies) ......... 27

Figure 3-5 Memory bandwidth and latency collected from the STREAM and STREAM2
benchmarks (1 or 2 copies) ............................................................................................... 29

Figure 4-1 SPEC CPU2000 and CPU2006 benchmarks execution time.......................... 34

Figure 4-2 Multi-programmed speedup of mixed SPEC CPU 2000/2006 benchmarks... 35

Figure 4-3 (a) Execution time for 1-thread version of selected multithreaded programs. 36

Figure 4-4 Throughput of SPECjbb2005 running with 1 to 8 warehouses....................... 38




v
Abstract
With the emergence of thread level parallelism as a more efficient method of

improving processor performance, Chip Multiprocessor (CMP) technology is being more

widely used in developing processor architectures. Also, the widening gap between CPU

and memory speed has evoked the interest of researchers to understand performance of

memory hierarchical architectures. As part of this research, performance characteristic

studies were carried out on the Intel Core 2 Duo, a dual core power efficient processor,

using a variety of new generation benchmarks. This study provides a detailed analysis of

the memory hierarchy performance and the performance scalability between single and

dual core processors. The behavior of SPEC CPU2006 benchmarks running on Intel

Core 2 Duo processor is also explained. Lastly, the overall execution time and

throughput measurement using both multi-programmed and multi-threaded workloads for

the Intel Core 2 Duo processor is reported and compared to that of the Intel Pentium D

and AMD Athlon 64X2 processors. Results showed that the Intel Core 2 Duo had the

best performance for a variety of workloads due to its advanced micro-architectural

features such as the shared L2 cache, fast cache to cache communication and smart

memory access.




vi
1. Introduction
1.1 Overview
This thesis work analyzes the performance characteristics of major architectural

developments employed in Intel Core 2 Duo E6400 processor with 2.13GHz [15]. Intel

Core 2 Duo is a high performance and power efficient dual core Chip-Multiprocessor

(CMP). CMP embeds multiple processor cores into a single die to exploit thread-level

parallelism for achieving higher overall chip-level Instruction-Per-Cycle (IPC) [4] [14]

[15] [21]. In a multi-core, multithreaded processor chip, thread-level parallelism

combined with increased clock frequency exerts a higher demand for on-chip and off-

chip memory bandwidth causing longer average memory access delays. There has been

great interest shown by researchers to understand the underlying reasons that cause these

bottlenecks in processors.

The advances in circuit integration technology and inevitability of thread level

parallelism over instruction level parallelism for performance efficiency has made Chip-

Multiprocessor (CMP) or multi-core technology the mainstream in CPU designs. With

the evolution of processor architectures over time, the benchmarks used to measure the

performance of these high performance processors have also continued to evolve. Many

single and multi threaded benchmarks have been defined and developed to stress the

processor units to its maximum limit. Standard Performance Evaluation Corporation

(SPEC) is one of the non profit organizations that have been developing benchmarks to

meet the requirements of these dynamic processor architectures for nearly a decade.

SPEC CPU2006 is a single-threaded compute-intensive benchmark developed by SPEC

using C, C++ and FORTRAN programming language. To understand the performance of




1
multi-core processors completely it is equally important to understand their behavior

while running multi threaded applications. SPEC JBB2005, lmbench, bioperf and splash2

are some of the most popularly used multithreaded benchmarks for this purpose.

This thesis work focuses mainly on workload characteristics, memory system

behavior and multi-thread interaction of the benchmarks. This work also seeks to report

performance measurement on Intel Core 2 Duo E6400 with 2.13GHz [15] and compare

the results with Intel Pentium D 830 with 3.0GHz [19] and AMD Athlon 64X2 4400+

with 2.2GHz [2]. In contrast to existing performance evaluations [13] [26] [27] that

usually provide overall execution time and throughput, this work emphasizes on the

memory hierarchy performance. It reports the measured memory access latency and

bandwidth as well as cache-to-cache communication delays. It also examines the

performance scalability between single and dual cores on the three tested processors.

Summarized below are a few interesting findings based on experiments conducted

as part of this research:

SPEC CPU2006 running on Core 2 Duo exerts less pressure on the L1

cache compared to SPEC CPU2000 benchmarks. However, CPU2006

benchmarks have larger data sets and longer execution times resulting in

comparatively high stress on L2 cache.

The cache to cache latency of Core 2 Duo was measured to be 33ns. Core

2 Duo has high memory bandwidth and low latency as a result of on-chip

access to the other L1 cache and the presence of aggressive memory

dependence predictors. . Its shared L2 generates less off-chip traffic than

the other two.




2
Due to its shared L2 cache access the execution time of all single threaded

workloads are fast and range from 56-1500 seconds for Core 2 Duo. The

average multi-programmed speedup for CPU2006 and CPU2000

benchmarks was measured at 1.76 and 1.77 respectively which is lower

than the ideal speedup of 2. The Core 2 Duo's speed-ups are constrained

due to its ability to use the entire L2 cache.


1.2 Architecture of Intel Core 2 Duo
The Intel Core 2 Duo E6400 (Figure 1.1) processor supports CMP and belongs to

the Intel's mobile core family. It is implemented by using two Intel's Core architecture

on a single die. The design of Intel Core 2 Duo E6400 is chosen to maximize

performance and minimize power consumption [18]. It emphasizes mainly on cache

efficiency and does not stress on the clock frequency for high power efficiency. Although

clocking at a slower rate than most of its competitors, shorter stages and wider issuing

pipeline compensates the performance with higher IPC's. In addition, the Core 2 Duo

processor has more ALU units [13]. The five main features of Intel Core 2 Duo

contributing towards its high performance are: