Text preview for : cyrix3db.pdf part of VIA Cyrix III Cyrix III Processor DataBook
Back to : cyrix3db.pdf | Home
Cyrix III Processor DataBook
Socket 370 Compatible x86 CPU Featuring MMXTM and 3DNow!TM Technology
©2000 Copyright Via-Cyrix Corporation. All rights reserved. Printed in the United States of America Trademark Acknowledgments: Cyrix is a registered trademark of Via Cyrix Corporation. Cyrix III is a trademark of Via-Cyrix Corporation. MMX is a trademark of Intel Corporation. All other brand or product names are trademarks of their respective companies.
Via-Cyrix Corporation 2703 North Central Expressway Richardson, Texas 75080-2010 United States of America Via-Cyrix Corporation (Cyrix) reserves the right to make changes in the devices or specifications described herein without notice. Before design-in or order placement, customers are advised to verify that the information is current on which orders or design activities are based. Via-Cyrix warrants its products to conform to current specifications in accordance with Via-Cyrix' standard warranty. Testing is performed to the extent necessary as determined by Via-Cyrix to support this warranty. Unless explicitly specified by customer order requirements, and agreed to in writing by Via-Cyrix, not all device characteristics are necessarily tested. Via-Cyrix assumes no liability, unless specifically agreed to in writing, for customers' product design or infringement of patents or copyrights of third parties arising from the use of Via-Cyrix devices. No license, either express or implied, to Cyrix patents, copyrights, or other intellectual property rights pertaining to any machine or combination of Via-Cyrix devices is hereby granted. Via-Cyrix products are not intended for use in any medical, life saving, or life sustaining system. Information in this document is subject to change without notice.
ii
Cyrix III
REVISION HISTORY
Date 1/25/00 3/22/99 3/17/99 Version 1.0 0.53 0.52 Revision Final Specs updated for production Changed name from MXs to Cyrix III processor. Pages 1-9, 1-10: Both L1 and L2 caches are unified. Page 1-11 Remove paragraph concerning Scratch Pad Cache Memory Page 2-36 and similar pages Replace "Don't care" with xxxxb. Page 2-56 Added Clock Ratio Table for BIOS Core/Bus Frequency Ratios Added CPUPRES# signal. Changed X32 pin to GND Changed Z32 pin to Vcc Changed name VDD-2.5 to VCC_2.5 Changed AD32 pin to Vcc Added IERR# signal Added BRO# Modified Figure 5.3 Voltage Connections Changed REF7 to VREF7 in pin diagrams. Corrected Pages 5-1 and 5-2 Pin Assignment Diagrams Corrected Pages 5-3 and 5-4 Pin Signal List Added power diagram Page 5-6 Figure 5-3. Page 2-33 Updated and completed CPU Configuration Register Table Page 2-43 Added question marks --Does CCR7 exist? Page 2-48 Updated RCRn bit 0 RCD/RCE Page 2-53 Added Clock Ratio Table for DIR3 TYPE Field Bullet Page:Added Programmable Clock/Bus Ratio, new ratio Page 3-1 Made several changes to Figure 3-1. Page 3-2 Removed subname explanation. Page 3-2 Redefined ADS#. Page 3-3 LOCK# is now I/O. Page 3-5 Many changes to Table 3-2, non-supported signals. Page 3-9 Rewrote second paragraph right column. Page 3-10 Omitted table 3-6 as all signals are disconnected during RESET#. Page 3-13 Placed Error Phase before Snoop Phase. Page 3-14 Rewrote INTR explanation. Now there is one interrupt acknowledge bus cycle. Page 3-15 Next to last paragraph. Removed last sentence concerning FERR#. Page 4-2 Made many changes to Table 4-1 Pull-Up Resistors. Page 4-3 Table 4-2 Note 3. Removed the word "APIC". Page 4-4 Table 4-3, Recommended Operating Conditions for CMOS Signals. Removed VCCIO row. Page 4-5 Moved BSEL signals from GTL I/O to CMOS Input row. Removed LINT signals. Page 4-7 and 4-8 Added 433, 450, 500 MHz frequencies. Page 1-9, reworded right-top paragraph concerning exclusive cache. Corrected BSEL0 and BSEL1 typos on page 3-1 and 4-5. Added 133 MHz bus on page 3-9. Corrected note to Figure 4-14 on page 4-12. Updated Figure 4-15 on page 4-11. Typos. Added 2.5x to Table 3-3. Updated thermal information. Initial Version C:\documentation\joshua\CyrixIII_0.fm
April 4, 2000 9:49 am
3/16/99
0.51
3/11/99
0.5
3/9/99
0.4
3/3/99
0.3
3/1/99 2/18/99
0.2 0.1
Cyrix III
iii
Cyrix III PROCESSOR
Socket 370 Compatible CPU MMXTM and 3DNow!TM Technology
Cyrix Processors
Introduction
- Performance Rating: PR400, PR450, PR500 and higher Other Features - Leverages Existing Socket 370 Infrastructure - Integrated 8-way 256 KByte L2 Cache - Compatible with MMX TM and 3D Now!TM Technology - 64K 4-Way Unified Write-Back L1 Cache - Runs Windows® 98, Windows 3.x, Windows NT, - Branch Prediction with a 512-entry BTB - Enhanced Memory Management DOS, UNIX ® , OS/2® , and all other x86 operating systems. Unit 2 Level TLB (16 Entry L1, 384 Entry L2) - 2.2 V Core - Scratchpad RAM in Unified Cache - Flexible Core/Bus Clock Ratios: 2.5x, 3x, 3.5x, 4.0x, 4.5x, 5.0x, 5.5x, 6.0x, 6.5x, 7.0x, 7.5x - Optimized pipelining for both 32- and 16-Bit Code - BIOS Programmable Core/Bus Clock Ratio - High Performance dual pipeline 80-Bit FPU - Supports bus speeds of 66, 100, and 133 Mhz
can execute instructions from both execution units simultaneously. The 64 K L1 cache and 256 K integrated L2 cache employ write-back technologies to make access to the code and data as fast as possible to avoid pipeline stalls. The cache supports caching SMI code and data, and can be used as scratchpad RAM by the processor. The superpipelined architecture reduces timing constraints and increases frequency scalability. Advanced architectural technologies include register renaming, out of order completion, data dependency removal, branch prediction, and speculative execution. The pipelining and superscaling are designed to remove data dependencies and resolve conflicts to allow for a high number of instruction executions per clock cycle. This promotes the highest performance for both 32 bit and 16-bit applications.
Instruction Address Dire ct- M apped 32 X Linear Address 32 Y Linear 512-Entry BTB F PU with M MX, 3DN ow! Extensio ns S uperpipelined Inte ger U nit Instruction Data 1 28 X Data Address 32 Y Data 32 F PU Data 64 Cache Unit 32 6 4- KByte 4-Way Unified Ca che Data 64 Control Inte grated 25 K L2 Cache 6 32 Bus Interface Unit D63 -D0 64 32 A31-A3
Performance Features
The Cyrix III processor offers significant performance enhancements over previous generation processors in a Socket 370 compatible package. The Cyrix III includes a 64 KByte L1 cache, an integrated 256 KByte L2 cache, and has frequency scalability to 400 MHz and beyond. It is compatible with MMX and 3DNow! Technology for superior graphics performance. A new dual pipeline FPU/MMX Unit delivers superior floating point performance. The Cyrix III delivers high 32-bit and 16-bit performance while running Windows 98, 95 and 3.X, Windows NT, OS/2, DOS, UNIX, and all other x86 operating systems and applications. The Cyrix III processor achieves top performance through the use of two optimize superpipelined execution units, two integer units, and a dual issue FPU/MMX/3DNow! unit that
16-Entry
Level 1 TLB
6-Way 3 84-Entry Le ve l 2
TLB
Memor y Manage me nt Unit
Address
BCLK
CPU Core
X Phy sical
Address Y Physical Address
32
B us Interface
1 75130 0
iv
Cyrix III
Cyrix III PROCESSOR
Socket 370 Compatible CPU MMX TM and 3DNow!TM Technology
Cyrix Processors
Table of Contents
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 3 3.1 3.2 4 4.1 4.2 4.3 4.4 4.5 4.6 ARCHITECTURE OVERVIEW Processor Differences . . . . . CeleronTM Compatibility . . . . Major Functional Blocks . . . . Integer Unit . . . . . . . . . . . Data Bypassing . . . . . . . . Cache Units . . . . . . . . . . Memory Management Unit . . . Floating Point Unit . . . . . . . Bus Interface Unit . . . . . . . PROGRAMMING INTERFACE Processor Initialization . . . . . Instruction Set Overview . . . . Register Sets . . . . . . . . . . System Register Set . . . . . . Cyrix III Register Set . . . . . . Debug Registers . . . . . . . . Address Space . . . . . . . . . Memory Addressing Methods . Memory Caches . . . . . . . . Interrupts and Exceptions . . . System Management Mode . . Sleep and Halt . . . . . . . . . Protection . . . . . . . . . . . Virtual 8086 Mode . . . . . . . Floating Point Unit Operations . MMX Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 .2 .3 .4 .8 10 12 13 14
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. 15 . 18 . 19 . 28 . 47 . 73 . 75 . 76 . 85 . 90 . 98 107 109 112 113 116
April 4, 2000 9:51 am
Cyrix III BUS INTERFACE Signal Description Table . . . . . . . . . . . . . . . . . . . 119 Signal Descriptions . . . . . . . . . . . . . . . . . . . . . . 124 ELECTRICAL SPECIFICATIONS Introduction . . . . . . . . . . . . . . Electrical Ground. . . . . . . . . . . Power Supply Voltage Signalling. . . Power and Ground Connections . . . Gunning Transceiver Logic. . . . . . Recommended Operating Conditions
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
133 133 133 133 133 136
Cyrix III
v
Cyrix Processors
Table of Contents
4.7 4.8 4.9 5 5.1 5.2 6 6.1 6.2 6.3 6.4 6.5 6.6 6.7
Bus Signal Groups . . . . . . . . . . . . . . . . . . . . . . 137 DC Characteristics . . . . . . . . . . . . . . . . . . . . . . 138 AC Characteristics . . . . . . . . . . . . . . . . . . . . . . 141 MECHANICAL SPECIFICATIONS 370-Pin SPGA Package . . . . . . . . . . . . . . . . . . . 145 Thermal Resistances . . . . . . . . . . . . . . . . . . . . . 153 INSTRUCTION SET Instruction Set Format . . . . . . . . . . . . . . . General Instruction Format. . . . . . . . . . . . . CPUID Instruction . . . . . . . . . . . . . . . . . Instruction Set Tables . . . . . . . . . . . . . . . FPU Instruction Clock Counts . . . . . . . . . . . Cyrix III Processor MMX Instruction Clock Counts. Cyrix III Processor 3DNow! Clock Counts . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
155 157 166 167 186 193 199
vi
Cyrix III
Cyrix III PROCESSOR
Socket 370 Compatible CPU MMXT M and 3DNow!TM Technology
Cyrix Processors
Product Overview
Introduction
1
ARCHITECTURE OVERVIEW
April 4, 2000 11:10 am
The Cyrix III processor is a 64-bit, x86 instruction set compatible processor that provides high-performance in a CeleronTM compatible PGA 370 socket. The Cyrix III processor offers an enhanced super-scalar core, and a new pipeline, dual-issue MMXTM and 3DNOW!TM -compatible floating point unit (FPU). The Cyrix III processor can process 57 new multimedia instructions compatible with MMXTM technology. The processor contains a 64K L1 cache and a 256K L2 cache. It operates at a higher frequency, contains an enlarged cache, a two-level TLB, and an improved branch target cache. The Cyrix III processor core is an enhanced version of a proven design that offers competitive CPU performance. It has integer and floating point execution units that are based on sixth-generation technology. The integer core contains a dual-issue, seven-stage execution pipeline and offers advanced features such as operand forwarding, branch target buffers, and extensive write buffering. The FPU has been redesigned to provide additional buffering, reduced latency, and improved throughput up to 1 GFLOP (peak). The dual issue FPU can allow two MMX or floating point instructions
to execute simultaneously. A 64KB write-back L1 cache is accessed in a unique fashion that eliminates pipeline stalls for fetch operands that hit in the cache. Through the use of unique architectural features, the Cyrix III processor eliminates many data dependencies and resource conflicts, resulting in optimal performance for both 16-bit and 32-bit x86 software. To provide support for multimedia operations, the cache can be turned into a scratchpad RAM memory on a line by line basis. The cache area set aside as scratchpad memory acts as a private memory for the CPU and does not participate in cache operations. The on-chip FPU has been enhanced to process MMX TM and 3DNow! instructions as well as the floating point instructions. Both types of instructions execute in parallel with integer instruction processing. To facilitate FPU operations, the FPU features a 64-bit data interface, a four-deep instruction queue and a six-deep store queue. For mobile systems and other power sensitive applications, the Cyrix III processor incorporates low power suspend mode, stop clock capability, and system management mode (SMM).
Cyrix III
1
Cyrix Processors
Processor Differ-
1.1
Processor Differences
Tables 1 describe the major differences between the M II and Cyrix III processors.
Table 1-1. Cyrix III Processor vs. M II Processor
Feature Package/pinout Supply voltage CPU primary cache (L1) Support for secondary cache (L2) MMX
TM
Cyrix III Processor Socket 370 SPGA Core voltage = 2.2v i/o reference voltage = 1.0v 64 KB write-back L1 Cache Internal L2 Cache, 256KBs Yes Yes Dual-issue from Integer Unit Yes Yes Socket 7
M II Processor
Core voltage = 2.9v I/O voltage = 3.3v 64 KB write-back Cache Yes (512K external typical) Yes No Single-issue from Integer Unit No No
Instruction Set
TM
3DNOW!
Instruction Set
Floating point unit 4MB paging Virtual Mode Extensions
1.2
CeleronTM Compatibility
The Cyrix III processor is design to be compatible with motherboards created for the Intel® Celeron processor with a socket 370 footprint. However some electrical signaling differs so that the Cyrix III can provide features not supported by the Celeron. In particular, the Cyrix III support two unique pins, the VID[4] pin AK36 used to signal 2.2 volt operation and the BSEL1 pin AK30 used with BSEL0 pin AJ33 to signal system bus frequency. Conversely, a few minor Celeron signals are not supported by the Cyrix III processor and include: breakpoint signals (BP[3:2] and BPM[1:0]#; internal error signal (IERR#); probe signals (PRDY#, PREQ#); and thermal trip signal (THERMTRIP#). Refer to chapter 3 of this manual for more details on Cyrix III signal descriptions. For motherboard design considerations and more details concerning Celeron compatibility refer to the Cyrix III Board Design and AC/DC Specifications Application Note 120.
2
Cyrix III
Major Functional Blocks
1
1.3
Major Functional Blocks The Memory Management Unit calculates physical addresses including addresses based on paging. Physical addresses are calculated by the Memory Management Unit and passed to the Cache Unit and the Bus Interface Unit (BIU).
The Cyrix III processor consists of four major functional blocks, as shown in the overall block diagram on the first page of this manual:
· · · ·
Memory Management Unit CPU Core Cache Unit Bus Interface Unit
The CPU contains the superpipelined integer unit, the BTB (Branch Target Buffer) unit and the FPU (Floating Point Unit). The BIU (Bus Interface Unit) provides the interface between the external system board and the processor's internal execution units. During a memory cycle, a memory location is selected through the address lines (A[31-3]#). Data is passed from or to memory through the data lines (D[63-0]#). Each instruction is read into 256-Byte Instruction Line Cache. The Cache Unit stores the most recently used data and instructions to allow fast access to the information by the Integer Unit and FPU. The CPU core requests instructions from the Cache Unit. The received integer instructions are decoded by either the X or Y processing pipelines within the superpipelined integer unit. If the instruction is a MMX or FPU instruction it is passed to the floating point unit for processing. Data is fetched from the 64-KB unified cache as required. If the data is not in the cache it is accessed via the bus interface unit from main memory.
April 4, 2000 11:10 am
Cyrix III
3
Cyrix Processors
Integer Unit
1.4
Integer Unit
The Integer Unit (Figure 1-1) provides parallel instruction execution using two seven-stage integer pipelines. Each of the two pipelines, XandY, can process several instructions simultaneously. The Integer Unit consists of the following pipeline stages:
· · · ·
Address Calculation 1 (AC1) Address Calculation 2 (AC2) Execute (EX) Write-Back (WB)
The instruction decode and address calculation functions are both divided into superpipelined stages.
· Instruction Fetch (IF) · Instruction Decode 1 (ID1) · Instruction Decode 2 (ID2)
Instruction Fetch
Instruction Decode 1
In-Order Processing
Instruction Decode 2
Instruction Decode 2
Address Calculation 1
Address Calculation 1
Address Calculation 2
Address Calculation 2 FPU
Out-of-Order Processing
Execution
Execution
Write Back X Pipeline
Write Back Y Pipeline
Figure 1-1. Integer Unit
4
Cyrix III
Integer Unit
1
1.4.1
Pipeline Stages
The Instruction Fetch (IF) stage, shared by both the X and Y pipelines, fetches 16 bytes of code from the cache unit in a single clock cycle. Within this section, the code stream is checked for any branch instructions that could affect normal program sequencing. If an unconditional or conditional branch is detected, branch prediction logic within the IF stage generates a predicted target address for the instruction. The IF stage then begins fetching instructions at the predicted address. The superpipelined Instruction Decode function contains the ID1 and ID2 stages. ID1, shared by both pipelines, evaluates the code stream provided by the IF stage and determines the number of bytes in each instruction. Up to two instructions per clock are delivered to the ID2 stages, one in each pipeline. The ID2 stages decode instructions and send the decoded instructions to either the X or Y pipeline for execution. The particular pipeline is chosen, based on which instructions are already in each pipeline and how fast they are expected to flow through the remaining stages. The Address Calculation function contains two stages, AC1 and AC2. If the instruction refers to a memory operand, the AC1 calculates a linear memory address for the instruction. The AC2 stage performs any required memory management functions, cache accesses, and register file accesses. If a floating point instruction is detected by AC2, the instruction is sent to the FPU for processing.
April 4, 2000 11:10 am
The Execute (EX) stage executes instructions using the operands provided by the address calculation stage. The Write-Back (WB) stage is the last IU stage. The WB stage stores execution results either to a register file within the IU or to a write buffer in the cache control unit. 1.4.2 Out-of-Order Processing
If an instruction executes faster than the previous instruction in the other pipeline, the instructions may complete out of order. All instructions are processed in order, up to the EX stage. While in the EX and WB stages, instructions may be completed out of order. If there is a data dependency between two instructions, the necessary hardware interlocks are enforced to ensure correct program execution. Even though instructions may complete out of order, exceptions and writes resulting from the instructions are always issued in program order.
Cyrix III
5
Cyrix Processors
Integer Unit
1.4.3
Pipeline Selection
1.4.4
Data Dependency Solutions
In most cases, instructions are processed in either pipeline and without pairing constraints on the instructions. However, certain instructions are processed only in the X pipeline:
· Branch instructions · Floating point instructions · Exclusive instructions
Branch and floating point instructions may be paired with a second instruction in the Y pipeline. Exclusive Instructions cannot be paired with instructions in the Y pipeline. These instructions typically require multiple memory accesses. Although exclusive instructions may not be paired, hardware from both pipelines is used to accelerate instruction completion. Listed below are the Cyrix III CPU exclusive instruction types:
When two instructions that are executing in parallel require access to the same data or register, one of the following types of data dependencies may occur:
· Read-After-Write (RAW) · Write-After-Read (WAR) · Write-After-Write (WAW)
Data dependencies typically force serialized execution of instructions. However, the Cyrix III CPU implements three mechanisms that allow parallel execution of instructions containing data dependencies:
· Register Renaming · Data Forwarding · Data Bypassing
The following sections provide detailed examples of these mechanisms. 1.4.4.1 Register Renaming
· Protected mode segment loads · Special register accesses · · · · ·
(Control, Debug, and Test Registers) String instructions Multiply and divide I/O port accesses Push all (PUSHA) and pop all (POPA) Intersegment jumps, calls, and returns
The Cyrix III CPU contains 32 physical general purpose registers. Each of the 32 registers in the register file can be temporarily assigned as one of the general purpose registers defined by the x86 architecture (EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP). For each register write operation a new physical register is selected to allow previous data to be retained temporarily. Register renaming effectively removes all WAW and WAR dependencies. The programmer does not have to consider register renaming as register renaming is completely transparent to both the operating system and application software.
6
Cyrix III
Integer Unit
1
1.4.4.2
Data Forwarding
Register renaming alone cannot remove RAW dependencies. The Cyrix III CPU uses two types of data forwarding in conjunction with register renaming to eliminate RAW dependencies:
· Operand Forwarding · Result Forwarding
Operand forwarding takes place when the first in a pair of instructions performs a move from register or memory, and the data that is read by the first instruction is required by the second instruction. The Cyrix III CPU performs the read operation and makes the data read available to both instructions simultaneously. Result forwarding takes place when the first in a pair of instructions performs an operation (such as an ADD) and the result is required by the second instruction to perform a move to a register or memory. The Cyrix III CPU performs the required operation and stores the results of the operation to the destination of both instructions simultaneously. Operand forwarding can only occur if the first instruction does not modify its source data. In other words, the instruction is a move type instruction (for example, MOV, POP, LEA). Operand forwarding occurs for both register and memory operands. The size of the first instruction destination and the second instruction source must match.
April 4, 2000 11:10 am
Cyrix III
7
Cyrix Processors
Data Bypassing
1.5
Data Bypassing encountered, the Cyrix III CPU accesses the BTB to check for the branch instruction's target address. If the branch instruction's target address is found in the BTB, the Cyrix III CPU begins fetching at the target address specified by the BTB. In case of conditional branches, the BTB also provides history information to indicate whether the branch is more likely to be taken or not taken. If the conditional branch instruction is found in the BTB, the Cyrix III CPU begins fetching instructions at the predicted target address. If the conditional branch misses in the BTB, the Cyrix III CPU predicts that the branch will not be taken, and instruction fetching continues with the next sequential instruction. The decision to fetch the taken or not taken target address is based on a four-state branch prediction algorithm. Once fetched, a conditional branch instruction is first decoded and then dispatched to the X pipeline only. The conditional branch instruction proceeds through the X pipeline and is then resolved in either the EX stage or the WB stage. The conditional branch is resolved in the EX stage, if the instruction responsible for setting the condition codes is completed prior to the execution of the branch. If the instruction that sets the condition codes is executed in parallel with the branch, the conditional branch instruction is resolved in the WB stage. Correctly predicted branch instructions execute in a single core clock. If resolution of a branch indicates that a misprediction has occurred, the Cyrix III CPU flushes the pipeline and starts fetching from the correct target address. The Cyrix III CPU prefetches both the
Cyrix III
In addition to register renaming and data forwarding, the Cyrix III CPU implements a third data dependency-resolution technique called data bypassing. Data bypassing reduces the performance penalty of those memory data RAW dependencies that cannot be eliminated by data forwarding. Data bypassing is implemented when the first in a pair of instructions writes to memory and the second instruction reads the same data from memory. The Cyrix III CPU retains the data from the first instruction and passes it to the second instruction, thereby eliminating a memory read cycle. Data bypassing only occurs for cacheable memory locations. 1.5.1 Branch Control
Branch instructions occur on average every four to six instructions in x86-compatible programs. When the normal sequential flow of a program changes due to a branch instruction, the pipeline stages may stall while waiting for the CPU to calculate, retrieve, and decode the new instruction stream. The Cyrix III CPU minimizes the performance degradation and latency of branch instructions through the use of branch prediction and speculative execution. 1.5.1.1 Branch Prediction
The Cyrix III CPU uses a 512-entry, 4-way set associative Branch Target Buffer (BTB) to store branch target addresses. The Cyrix III CPU has 1024-entry branch history table. During the fetch stage, the instruction stream is checked for the presence of branch instructions. If an unconditional branch instruction is
8
Data Bypassing
1
predicted and the non-predicted path for each conditional branch, thereby eliminating the cache access cycle on a misprediction. If the branch is resolved in the EX stage, the resulting misprediction latency is four cycles. If the branch is resolved in the WB stage, the latency is five cycles. Since the target address of return (RET) instructions is dynamic rather than static, the Cyrix III CPU caches target addresses for RET instructions in an eight-entry return stack rather than in the BTB. The return address is pushed on the return stack during a CALL instruction and popped during the corresponding RET instruction. 1.5.1.2 Speculative Execution
checkpointed resources is cleared. For a branch misprediction, the Cyrix III processor generates the correct fetch address and uses the checkpointed values to restore the machine state in a single clock. In order to maintain compatibility, writes that result from speculatively executed instructions are not permitted to update the cache or external memory until the appropriate branch is resolved. Speculative execution continues until one of the following conditions occurs: 1) A branch or floating point operation is decoded and the speculation level is already at four. 2) An exception or a fault occurs. 3) The write buffers are full. 4) An attempt is made to modify a non-checkpointed resource (i.e., segment registers, system flags).
April 4, 2000 11:10 am
The Cyrix III CPU is capable of speculative execution following a floating point instruction or predicted branch. Speculative execution allows the pipelines to continuously execute instructions following a branch without stalling the pipelines waiting for branch resolution. The same mechanism is used to execute floating point instructions in parallel with integer instructions. The Cyrix III CPU is capable of up to four levels of speculation (i.e., combinations of four conditional branches and floating point operations). After generating the fetch address using branch prediction, the CPU checkpoints the machine state (registers, flags, and processor environment), increments the speculation level counter, and begins operating on the predicted instruction stream. Once the branch instruction is resolved, the CPU decreases the speculation level. For a correctly predicted branch, the status of the
Cyrix III
9
Cyrix Processors
Cache Units
1.6
Cache Units stores up to 64Kilobytes of code and data in 2048 lines. The cache is dual-ported and allows any two of the following operations to occur in parallel:
The Cyrix III CPU employs two caches, the 64KB L1 Cache and the Exclusive L2 Cache (Figure 1-2, Page 1-10). The main cache is a 4-way set-associative 64-KB unified cache. The unified cache provides a higher hit rate than using equal-sized separate data and instruction caches. While in Cyrix SMM mode both SMM code and data are cacheable. To avoid data conflicts, both caches are exclusive, that is data can be stored in either cache but not both at the same time. 1.6.1 64-KB L1 Cache
· Code fetch · Data read (X pipe, Y pipeline or FPU) · Data write (X pipe, Y pipeline or FPU)
The unified cache uses a pseudo-LRU replacement algorithm and can be configured to allocate new lines on read misses only or on read and write misses.
The 64-KB unified write-back cache functions as the primary cache. Configured as a four-way set-associative cache, the cache
Integer Unit IF X Y Pipe Pipe
Instruction Address Instruction Line Cache 256-Byte, Fully Associative, 8 Lines FPU Data Bus L2 Cache Exclusive 256K Memory Management Unit (TLB)
Bus Interface Unit
L1 Cache 64KB, 4-Way Set Associative, 2048 Lines
Cache Tags
Key: = Dual Bus = Single Bus
Modified X, Y Physical Addresses
Figure 1-2. Cache Unit Operations
10
Cyrix III
Cache Units
1
1.6.2
Exclusive L2 Cache
The exclusive 256 KB L2 cache serves as a unified secondary cache. This L2 cache is filled through L1 cache eviction. Fetches from the integer unit that do not hit in the L1 cache will access the L2 cache. The L2 can act as a victim cache, saving cache lines released by the L1 cache. The L2 cache is thus referred to as an exclusive, or victim cache, since it only contains data that is not found in the L1 cache. The L2 cache is 8-way set associative. The total cache of the Cyrix III CPU can be up to 320k since the exclusive L2 architecture ensures that no data will be in both the L1 and L2. This also eliminates the need for an L1 to L2 writeback cycle, thus improving performance. The L2 cache bus operates at the same frequency as the cpu core, delivering cache data at very high speed to the execution units.
April 4, 2000 11:10 am
Cyrix III
11
Cyrix Processors
Memory Manage-
1.7
Memory Management Unit
The Memory Management Unit (MMU), translates the linear address supplied by the IU into a physical address to be used by the unified caches and the bus interface. Memory management procedures are x86 compatible, adhering to standard paging mechanisms. Within the Cyrix III CPU there are two TLBs, the main L1 TLB and the larger L2 TLB. The 16-entry L1 TLB is direct mapped and holds
42 lines. The 384-entry L2 TLB is 6-way associative and hold 384 lines. The DTE is located in memory. Cache locking is controlled through use of the RDMSR and WRMSR instructions.
Linear Ad ess dr
Main L1 T LB
D E T
L2 TLB
Dir ector T y able CR3 C ontrol Register Page Table M or em y
1748000
PTE
Physical Page
12
Cyrix III
Floating Point Unit
1
1.8
Floating Point Unit tive to the FPU queue, until the Cyrix III processor encounters one of the conditions that causes speculative execution to halt. As the FPU completes instructions, the speculation level decreases and the checkpointed resources are available for reuse in subsequent operations. The FPU also uses a set of six write buffers to prevent stalls due to speculative writes.
The Floating Point Unit (FPU) processes floating point, MMX, and 3DNOW! instructions. The FPU interfaces to the Integer Unit and the Cache Unit through a 64-bit bus. The FPU is x87 instruction set compatible and adheres to the IEEE-754 standard. Since most applications contain FPU instructions mixed with integer instructions, the FPU achieves high performance by completing integer and FPU operations in parallel.
FPU Parallel Execution
The Cyrix III processor executes integer instructions in parallel with FPU instructions. Integer instructions may complete out of order with respect to the FPU instructions. The Cyrix III processor maintains x86 compatibility by signaling exceptions and issuing write cycles in program order. FPU instructions can be dispatched from the Integer Unit's X or Y pipeline. The address calculation stage of the pipeline checks for memory management exceptions and accesses memory operands used by the FPU. If no exceptions are detected, the Cyrix III processor checkpoints the state of the CPU and, during AC2, dispatches the floating point instruction to the FPU instruction queue. The Cyrix III processor can then complete any subsequent integer instructions speculatively and out of order relative to the FPU instruction and relative to any potential FPU exceptions which may occur. As additional FPU instructions enter the pipeline, the Cyrix III processor dispatches up to eight FPU instructions to the FPU instruction queue. The Cyrix III processor continues executing speculatively and out of order, relaCyrix III 13
April 4, 2000 11:10 am
Cyrix Processors
Bus Interface Unit
1.9
Bus Interface Unit
The Bus Interface Unit (BIU) provides the signals and timing required by external circuitry. The signal descriptions and bus interface timing information is provided in Chapters 3 and 4 of this manual.
14
Cyrix III
Cyrix III PROCESSOR
Socket 370 Compatible CPU MMXTM and 3DNow!TM Technology
Cyrix Processors
Programming Interface
2
PROGRAMMING INTERFACE
2.1
Processor Initialization
April 4, 2000 11:32 am
In this chapter, the internal operations of the Cyrix III CPU are described mainly from an application programmer's point of view. Included in this chapter are descriptions of processor initialization, the register set, memory addressing, various types of interrupts and the shutdown and halt process. An overview of real, virtual 8086, and protected operating modes is also included in this chapter. The FPU operations are described separately at the end of the chapter.
The Cyrix III CPU is initialized when the RESET# signal is asserted. The processor is placed in real mode and the registers listed in Table 2-1 (Page 2-16) are set to their initialized values. RESET# invalidates and disables the cache and turns off paging. When RESET# is asserted, the Cyrix III CPU terminates all local bus activity and all internal execution. During the entire time that RESET# is asserted, the internal pipelines are flushed and no instruction execution or bus activity occurs. Approximately 150 to 250 external clock cycles after RESET# is negated, the processor begins executing instructions at the top of physical memory (address location FFFF FFF0h). Typically, an intersegment JUMP is placed at FFFF FFF0h. This instruction will force the processor to begin execution in the lowest 1 MB of address space. Note: The actual time depends on the clock scaling in use. Also an additional 220 clock cycles are needed if self-test is requested.
Cyrix III
15
Cyrix Processors
Processor Initialization Table 2-1. Initialized Core Registers Contents
Register EAX EBX ECX EDX EBP ESI EDI ESP EFLAGS EIP ES CS SS DS FS GS IDTR GDTR LDTR TR CR0 CR2 CR3 CR4 CCR1 CCR2 CCR3 CCR7 DIR0 DIR1 DR7 Register Name Accumulator Base Count Data Base Pointer Source Index Destination Index Stack Pointer Flags Instruction Pointer Extra Segment Code Segment Stack Segment Data Segment Extra Segment Extra Segment Interrupt Descriptor Table Register Global Descriptor Table Register Local Descriptor Table Register Task Register Machine Status Word Control Register 2 Control Register 3 Control Register 4 Configuration Control 1 Configuration Control 2 Configuration Control 3 Configuration Control 7 Device Identification 0 Device Identification 1 Debug Register 7 Initialized Contents xxxx xxxxh xxxx xxxxh xxxx xxxxh xxxx 04 [DIR0] xxxx xxxxh xxxx xxxxh xxxx xxxxh xxxx xxxxh 0000 0002h 0000 FFF0h 0000h F000h 0000h 0000h 0000h 0000h Base = 0, Limit = 3FFh xxxx xxxxh xxxxh xxxx xxxxh, xxxxh xxxxh 6000 0010h xxxx xxxxh xxxx xxxxh 0000 0000h 00h 00h 00h 00h 4xh xxh 0000 0400h Comments 0000 0000h indicates self-test passed.
DIR0 = Device ID
See Table 2-6 on page 2-26 for bit definitions. Base address set to 0000 0000h. Limit set to FFFFh. Base address set to FFFF 0000h. Limit set to FFFFh. Base address set to 0000 0000h. Limit set to FFFFh. Base address set to 0000 0000h. Limit set to FFFFh. Base address set to 0000 0000h. Limit set to FFFFh. Base address set to 0000 0000h. Limit set to FFFFh.
See Table 2-12 on page29 for bit definitions. See page30. See page30. See Table 2-9 on page 2-30 for bit definitions. See paragraph 2.5.4.1 on page52 for bit definitions. See paragraph 2.5.4.3 on page53 for bit definitions. See paragraph 2.5.4.4 on page54 for bit definitions. See paragraph 2.5.4.8, page58 for bit definitions. Device ID and reads back initial CPU clock-speed setting. Stepping and Revision ID (RO). See (XREF) for bit definitions.
16 Via Confidential, Requires Non-Disclosure Agreement
Cyrix III
Processor Initialization
2
Table 2-1. Initialized Core Registers Contents (Continued)
Register x = Undefined value Register Name Initialized Contents Comments
April 4, 2000 11:32 am
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
17
Cyrix Processors
Instruction Set Overview
2.2
Instruction Set Overview
The Cyrix III Processor instruction set can be divided into ten types of operations:
· · · · · Arithmetic Shift/Rotate Control Transfer Data Transfer Floating Point · · · · · High-Level Language Support Operating System Support Bit Manipulation String Manipulation MMX and 3DNow! Instructions
For example, the use of prefixes allows a 32-bit operand to be used with 16-bit code or a 16-bit operand to be used with 32-bit code. Chapter 6 of this manual lists each instruction in the Cyrix III CPU instruction set along with the associated opcodes, execution clock counts, and effects on the FLAGS register. 2.2.1 Lock Prefix
Cyrix III Processor instructions operate on as few as zero operands and as many as three operands. An NOP instruction (no operation) is an example of a zero-operand instruction. Two-operand instructions allow the specification of an explicit source and destination pair as part of the instruction. These two operand instructions can be divided into eight groups according to operand types:
· · · · Register to Register Register to Memory Memory to Register Memory to Memory · · · · Register to I/O I/O to Register Immediate Data to Register Immediate Data to Memory
The LOCK prefix may be placed before certain instructions that read, modify, then write back to memory. The LOCK prefix can be used with the following instructions only when the result is a write operation to memory: Bit Test Instructions (BTS, BTR, BTC) Exchange Instructions (XADD, XCHG, CMPXCHG) One-operand Arithmetic and Logical Instructions (DEC, INC, NEG, NOT) Two-operand Arithmetic and Logical Instructions (ADC, ADD, AND, OR, SBB, SUB, XOR). An invalid opcode exception is generated if the LOCK prefix is used with any other instruction or with one of the instructions above when no write operation to memory occurs (for example, when the destination is a register).
An operand can be held in the instruction itself (as in the case of an immediate operand), in one of the processor's registers or I/O ports, or in memory. An immediate operand is fetched as part of the opcode for the instruction. Operand lengths of 8, 16, 32 or 48 bits are supported as well as 64 or 80 bits associated with floating-point instructions. Operand lengths of 8 or 32 bits are generally used when executing code written for 386- or 486-class (32-bit code) processors. Operand lengths of 8 or 16 bits are generally used when executing existing 8086 or 80286 code (16-bit code). The default length of an operand can be overridden by placing one or more instruction prefixes in front of the opcode.
18
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
Register Sets
2
2.3
Register Sets
2.3.1
Application Register Set
From the programmer's point of view the accessible registers in the Cyrix III CPU are grouped into two sets of registers, the application and system registers set. The application register set contains the registers frequently used by application programmers, and the system register set contains the registers typically reserved for use by operating system programmers. The application register set is made up of general purpose registers, segment registers, a flag register, and an instruction pointer register.
April 4, 2000 11:32 am
The Application Register Set, as shown in Table 2-2, consists of the registers most often used by the applications programmer. These registers are generally accessible, although some bits in the Flags register are protected. The General Purpose Register contents are frequently modified by instructions and typically contain arithmetic and logical instruction operands. In real mode, Segment Registers contain the base address for each segment. In protected mode, the segment registers contain segment selectors. The segment selectors provide indexing for tables (located in memory) that contain the base address for each segment, as well as other memory addressing information. The Instruction Pointer Register points to the next instruction that the processor will execute. This register is automatically incremented by the processor as execution progresses. The Flags Register contains control bits used to reflect the status of previously executed instructions. This register also contains control bits that affect the operation of some instructions.
The system register set is made up of the remaining registers which include control registers, system address registers, debug registers, configuration registers, and test registers. Each of the registers is discussed in detail in the following sections.
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
19
Cyrix Processors
Register Sets
2.3.2
General Purpose Registers
The General Purpose Registers are divided into four data registers, two pointer registers, and two index registers as shown in Table 2-2 on page 2-20. The Data Registers are used by the applications programmer to manipulate data structures and to hold the results of logical and arithmetic operations. Different portions of a general data registers can be addressed by using different names. An "E" prefix identifies a complete 32-bit register. An "X" suffix without the "E" prefix identifies the lower 16 bits of the register. The lower two bytes of a data register are addressed with an "H" suffix (identifies the upper byte) or an "L" suffix (identifies the lower byte). These _L and _H portions of the data registers act as independent registers. For example, if the AH register is written to by an instruction, the AL register bits remain unchanged.
Table 2-2. Application Register Set General Purpose Registers
31 16 15 8 7 0
AX AH EAX (Extended A Register) BX BH EBX (Extended B Register) CX CH ECX (Extended C Register) DX DH EDX (Extended D Register) SI (Source Index) ESI (Extended Source Index) DI (Destination Index) EDI (Extended Destination Index) BP (Base Pointer) EBP (Extended Base Pointer) SP (Stack Pointer) ESP (Extended Stack Pointer) DL CL BL AL
20 Via Confidential, Requires Non-Disclosure Agreement
Cyrix III
Register Sets
2
The Pointer and Index Registers are listed below.
SI or ESI DI or EDI SP or ESP Source Index Destination Index Stack Pointer
2.3.3
Segment Registers and Selectors
BP or EBP Base Pointer These registers can be addressed as 16- or 32-bit registers, with the "E" prefix indicating 32 bits. The pointer and index registers can be used as general purpose registers; however, some instructions use a fixed assignment of these registers. For example, repeated string operations always use ESI as the source pointer, EDI as the destination pointer, and ECX as a counter. The instructions that use fixed registers include multiply and divide, I/O access, string operations, stack operations, loop, variable shift and rotate, and translate instructions. The Cyrix III Processor implements a stack using the ESP register. This stack is accessed during the PUSH and POP instructions, procedure calls, procedure returns, interrupts, exceptions, and interrupt/exception returns. The Cyrix III Processor automatically adjusts the value of the ESP during operations that result from these instructions. The EBP register may be used to refer to data passed on the stack during procedure calls. Local data may also be placed on the stack and accessed with BP. This register provides a mechanism to access tack data in high-level languages.
Segmentation provides a means of defining data structures inside the memory space of the microprocessor. There are three basic types of segments: code, data, and stack. Segments are used automatically by the processor to determine the location in memory of code, data, and stack references. There are six 16-bit segment registers as shown in Table 2-3.
Table 2-3. Application Register Set Segment Selector Registers
15 0
April 4, 2000 11:32 am
CS (Code Segment) SS (Stack Segment) DS (D Data Segment) ES (E Data Segment) FS (F Data Segment) GS (G Data Segment)
In real and virtual 8086 operating modes, a segment register holds a 16-bit segment base. The 16-bit segment is multiplied by 16 and a 16-bit or 32-bit offset is then added to it to create a linear address. The offset size is dependent on the current address size. In real mode and in virtual 8086 mode with paging disabled, the linear address is also the physical address. In virtual 8086 mode with paging enabled, the linear address is translated to the physical address using the current page tables.
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
21
Cyrix Processors
Register Sets Table 2-4. Segment Register Selection Rules
Type of Memory Reference
Implied (Default) Segment SegmentOverride Prefix
Code Fetch Destination of PUSH, PUSHF, INT, CALL, PUSHA instructions Source of POP, POPA, POPF, IRET, RET instructions Destination of STOS, MOVS, REP STOS, REP MOVS instructions Other data references with effective address using base registers of: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
CS SS SS ES DS SS
None None None None CS, ES, FS, GS, SS CS, DS, ES, FS, GS
22 Via Confidential, Requires Non-Disclosure Agreement
Cyrix III
Register Sets
2
In protected mode a segment register holds a Segment Selector containing a 13-bit index, a Table Indicator (TI) bit, and a two-bit Requested Privilege Level (RPL) field. The Index points into a descriptor table in memory and selects one of 8192 (213) segment descriptors contained in the descriptor table. A segment descriptor is an eight-byte value used to describe a memory segment by defining the segment base, the segment limit, and access control information. To address data within a segment, a 16-bit or 32-bit offset is added to the segment's base address. Once a segment selector has been loaded into a segment register, an instruction needs only to specify the segment register and the offset. The Table Indicator (TI) bit of the selector defines which descriptor table the index points into. If TI=0, the index references the Global Descriptor Table (GDT). If TI=1, the index references the Local Descriptor Table (LDT). The GDT and LDT are described in more detail in Section 2.4.2 (Page 2-33). Protected mode addressing is discussed further in Sections 2.8.2 (Page 2-78). The Requested Privilege Level (RPL) field in a segment selector is used to determine the Effective Privilege Level of an instruction (where RPL=0 indicatesthe most privileged level, and RPL=3 indicatesthe least privileged level). If the level requested by RPL is less than the Current Program Level (CPL), the RPL level is accepted and the Effective Privilege Level is changed to the RPL value. If the level requested by RPL is greater than CPL, the CPL overrides the requested RPL and Effective Privilege Level remains unchanged.
When a segment register is loaded with a segment selector, the segment base, segment limit and access rights are loaded from the descriptor table entry into a user-invisible or hidden portion of the segment register (i.e., cached on-chip). The CPU does not access the descriptor table entry again until another segment register load occurs. If the descriptor tables are modified in memory, the segment registers must be reloaded with the new selector values by the software. The active segment register is selected according to the rules listed in Table 2-4 and the type of instruction being currently processed. In general, the DS register selector is used for data references. Stack references use the SS register, and instruction fetches use the CS register. While some of these selections may be overridden, instruction fetches, stack operations, and the destination write operation of string operations cannot be overridden. Special segment-override instruction prefixes allow the use of alternate segment registers. These segment registers include the ES, FS, and GS registers.
April 4, 2000 11:32 am
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
23
Cyrix Processors
Register Sets
2.3.4
Instruction Pointer Register
The Instruction Pointer (EIP) Register contains the offset into the current code segment of the next instruction to be executed. The register is normally incremented by the length of the current instruction with each instruction execution unless it is implicitly modified through an interrupt, exception, or an instruction that changes the sequential execution flow (for example JMP and CALL.
Table 2-5. Application Register Set Instruction Pointer
31 0
EIP (Extended Instruction Pointer Register)
24 Via Confidential, Requires Non-Disclosure Agreement
Cyrix III
Register Sets
2
2.3.5
Extended Flags Register
The Extended Flags Register, EFLAGS, contains status information and controls certain operations on the Cyrix III CPU microprocessor. The lower 16 bits of this register are referred to as the FLAGS register that is used when executing 8086 or 80286 code. The flag bits listed in Table 2-6.
Flags
3 1 0 April 4, 2000 11:32 am 0 0 0 0 0 0
2 4 0 S S S D S S A C S D A A A A A
2 3 0 0
2 1 I D 0
1 1 9 8 0 A C
1 7
1 6
1 5 0
1 4 N T
1 3
1 2
1 1 O F
1 0 D F
9 I F
8 T F
7 S F
6 Z F
5 0
4 A F
3 0
2 P F
1 1
0 C F
V R M F
IO PL
I dentification Alignment Check Virtual 8086 Mode Resume Flag Nested Task Flag I Privilege Level /O Overflow Direction Flag I nterrupt Enable Trap Flag Sign Flag Zero Flag Auxiliary Carry Parity Flag Carry Flag
A = Arithmetic Flag, D = Debug Flag, S = System Flag, C = Control Flag 0 or 1 I ndicates Reserved
1 701105
Figure 2-3. EFLAGS Register
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
25
Cyrix Processors
Register Sets
Table 2-6. Register Bits EFLAGS Register
Bit 31:22 21 Name RSVD ID Flag Type -System Description Reserved -- Set to 0. Identification Bit -- The ability to set and clear this bit indicates that the CPUID instruction is supported. The ID can be modified only if the CPUID bit in CCR4 (Index E8h[7]) is set. Reserved -- Set to 0. Alignment Check Enable -- In conjunction with the AM flag in CR0, the AC flag determines whether or not misaligned accesses to memory cause a fault. If AC is set, alignment faults are enabled. Virtual 8086 Mode -- If set while in protected mode, the processor switches to virtual 8086 operation handling segment loads as the 8086 does, but generating exception 13 faults on privileged opcodes. The VM bit can be set by the IRET instruction (if current privilege level is 0) or by task switches at any privilege level. Resume Flag -- Used in conjunction with debug register breakpoints. RF is checked at instruction boundaries before breakpoint exception processing. If set, any debug fault is ignored on the next instruction. Reserved -- Set to 0. Nested Task -- While executing in protected mode, NT indicates that the execution of the current task is nested within another task. I/O Privilege Level -- While executing in protected mode, IOPL indicates the maximum current privilege level (CPL) permitted to execute I/O instructions without generating an exception 13 fault or consulting the I/O permission bit map. IOPL also indicates the maximum CPL allowing alteration of the IF bit when new values are popped into the EFLAGS register. Overflow Flag -- Set if the operation resulted in a carry or borrow into the sign bit of the result but did not result in a carry or borrow out of the high-order bit. Also set if the operation resulted in a carry or borrow out of the high-order bit but did not result in a carry or borrow into the sign bit of the result. Direction Flag -- When cleared, DF causes string instructions to auto-increment (default) the appropriate index registers (ESI and/or EDI). Setting DF causes auto-decrement of the index registers to occur. Interrupt Enable Flag -- When set, maskable interrupts (INTR input pin) are acknowledged and serviced by the CPU. Trap Enable Flag -- Once set, a single-step interrupt occurs after the next instruction completes execution. TF is cleared by the single-step interrupt. Sign Flag -- Set equal to high-order bit of result (0 indicates positive, 1 indicates negative). Zero Flag -- Set if result is zero; cleared otherwise. Reserved -- Set to 0. Auxiliary Carry Flag -- Set when a carry out of (addition) or borrow into (subtraction) bit position 3 of the result occurs; cleared otherwise. Reserved -- Set to 0.
20:19 18
RSVD AC
-System
17
VM
System
16
RF
Debug
15 14 13:12
RSVD NT IOPL
-System System
11
OF
Arithmetic
10
DF
Control
9 8 7 6 5 4 3
IF TF SF ZF RSVD AF RSVD
System Debug Arithmetic Arithmetic -Arithmetic --
26 Via Confidential, Requires Non-Disclosure Agreement
Cyrix III
Register Sets
2
Table 2-6. Register Bits EFLAGS Register (Continued)
Bit 2 1 0 Name PF RSVD CF Flag Type Arithmetic Description Parity Flag -- Set when the low-order 8 bits of the result contain an even number of ones; otherwise PF is cleared. Reserved -- Set to 1. Carry Flag -- Set when a carry out of (addition) or borrow into (subtraction) the most significant bit of the result occurs; cleared otherwise.
Arithmetic
April 4, 2000 11:32 am
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
27
Cyrix Processors
System Register Set
2.4
System Register Set
The system register set is used for system level programming. The system register set consists of registers not generally used by application programmers. These registers are typically employed by system level programmers who generate operating systems and memory management programs. Associated with the system register set are tables and segments which are defined in memory. The Control Registers control certain aspects of the Cyrix III Processor such as paging, coprocessor functions, and segment protection. The Descriptor Tables hold descriptors that manage memory segments and tables, interrupts and task switching. The tables are defined by corresponding registers. The two Task State Segments Tables defined by TSS register, are used to save and load the computer state when switching tasks.
The Configuration Registers are used to define Cyrix III Processor CPU setup including cache management. The ID Registers allow BIOS and other software to identify the specific CPU and stepping. System Management Mode (SMM) control information is stored in the SMM registers. The Debug Registers provide debugging facilities for the Cyrix III Processor and enable the use of data access breakpoints and code execution breakpoints. The Test Registers provide a mechanism to test the contents of both the on-chip 16KB cache and the Translation Lookaside Buffer (TLB). The TLB is used as a cache for the tables that are used in to translate linear addresses to physical addresses while paging is enabled
28 Via Confidential, Requires Non-Disclosure Agreement
Cyrix III
System Register Set
2
2.4.1
Control Registers
The standard x86 Control Registers (CR0, CR2, CR3 and CR4), are shown in Table 2-7. 1 The CR0 register contains system control bits which configure operating modes and indicate the general state of the CPU. The lower 16 bits of CR0 are referred to as the Machine Status Word (MSW). The CR0 bit definitions are described in Table 2-13. The reserved bits in CR0 should not be modified. A CR1 register is not defined. When paging is enabled and a page fault is generated, the CR2 register retains the 32-bit linear address of the address that caused the fault. When a double page fault occurs, CR2 contains the address for the second fault. Register CR3 contains the 20 most significant
1. The CRn are standard x86 registers, and are distinct from the CCRn registers unique to Cyrix.)
bits of the physical base address of the page directory. The page directory must always be aligned to a 4-KB page boundary, therefore, the lower 12 bits of CR3 are not required to specify the base address. Register CR3 contains the 20 most significant bits of the physical base address of the page directory. The page directory must always be aligned to a 4KB page boundary, therefore, the lower 12 bits of CR3 are not required to specify the base address. CR3 also contains the Page Cache Disable (PCD) and Page Write Through (PWT) bits. Control Register CR4 Table 2-9 on page 30 controls usage of the Time Stamp Counter Instruction, Debugging Extensions, Page Global Enable and the RDPMC instruction.
April 4, 2000 11:32 am
Table 2-7. Control Registers
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 CR4 Register
RSVD
P P C G E E
RSVD
D T RSVD E S C
CR3 Register
PDBR (Page Directory Base Register)
RSVD
P P C W D T
RSVD
CR2 Register
PFLA (Page Fault Linear Address)
CR1 Register
RSVD
CR0 Register
P C N G D W
RSVD
A R W M S P V D
RSVD
N E
1
T E M PE S M P
Machine Status Word (MSW)
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
29
Cyrix Processors
System Register Set
Table 2-8. CR4 Bit Definitions
BIT POSITION
NAME
FUNCTION
2
TSD
Time Stamp Counter Instruction If = 1 RDTSC instruction enabled for CPL=0 only; Reset State If = 0 RDTSC instruction enabled for all CPL states Debugging Extensions If = 1 enables I/O breakpoints and R/W bits for each debug register are defined as: 00 -Break on instruction execution only. 01 -Break on data writes only. 10 -Break on I/O reads or writes. 11 -Break on data reads or writes but not instruction fetches. If = 0 I/O breakpoints and R/W bits for each debug register are not enabled. Page Global Enable If = 1 global page feature is enabled. If = 0 global page feature is disabled. Global pages are not flushed from TLB on a task switch or write to CR3 Performance Monitoring Counter Enable If = 1 enables execution of RDPMC instruction at any protection level. If = 0 RDPMC instruction can only be executed at protection level 0.
3
DE
7
PGE
8
PCE
Table 2-9. CR3 Bit Definitions
Bits Name Description
31 - 12 11 - 5 4
PDBR RSVD PCD
3
PWT
2-1
RSVD
Page Directory Base Register: Identifies page directory base address on a 4KB page boundary. Reserved: Set to 0. Page Cache Disable: During bus cycles that are not paged, the state of the PCD bit is reflected on the PCD pin. These bus cycles include interrupt acknowledge cycles and all bus cycles, when paging is not enabled. The PCD pin should be used to control caching in an external cache. Page Write-Through: During bus cycles that are not paged, the state of the PWT bit is driven on the PWT pin. These bus cycles include interrupt acknowledge cycles and all bus cycles, when paging is not enabled. The PWT pin should be used to control write policy in an external cache. Reserved: Set to 0.
Table 2-10. CR2 Bit Definitions
Bits Name Description
31 - 0
PFLA
Page Fault Linear Address: With paging enabled and after a page fault, PFLA contains the linear address of the address that caused the page fault.
30 Via Confidential, Requires Non-Disclosure Agreement
Cyrix III
System Register Set
2
Table 2-11. CR1 Bit Definitions
Bit Name Description
31:0
RSVD
Reserved: Set to 0 (always returns 0 when read).
Table 2-12. CR0 Bit Definitions
Bit Name Description
31
PG
30
CD
29
April 4, 2000 11:32 am
NW
28:19 18 17 16
RSVD AM RSVD WP
15:6 5 4 3
RSVD NE 1 TS
2 1
EM MP
0
PE
Paging Enable Bit: If PG=1 and protected mode is enabled (PE=1), paging is enabled. After changing the state of PG, software must execute an unconditional branch instruction (e.g., JMP, CALL) to have the change take effect. Cache Disable: If CD=1, no further cache line fills occur. However, data already present in the cache continues to be used if the requested address hits in the cache. Writes continue to update the cache and cache invalidations due to inquiry cycles occur normally. The cache must also be invalidated to completely disable any cache activity. Not Write-Back: If NW=1, the on-chip cache operates in write-through mode. In write-through mode, all writes (including cache hits) are issued to the external bus. If NW=0, the on-chip cache operates in write-back mode. In write-back mode, writes are issued to the external bus only for a cache miss, a line replacement of a modified line, or as the result of a cache inquiry cycle. Reserved: Do not modify. Alignment Check Mask: If AM=1, the AC bit in the EFLAGS register is unmasked and allowed to enable alignment check faults. Setting AM=0 prevents AC faults from occurring. Reserved: Do not modify. Write Protect: Protects read-only pages from supervisor write access. WP=0 allows a read-only page to be written from privilege level 0-2. WP=1 forces a fault on a write to a read-only page from any privilege level. Reserved: Do not modify. Numerics Exception: NE=1 to allow FPU exceptions to be handled by interrupt 16. NE=0 if FPU exceptions are to be handled by external interrupts. Reserved: Do not attempt to modify. Task Switched: Set whenever a task switch operation is performed. Execution of a floating point instruction with TS=1 causes a DNA fault. If MP=1 and TS=1, a WAIT instruction also causes a DNA fault. Emulate Processor Extension: If EM=1, all floating point instructions cause a DNA fault 7. Monitor Processor Extension: If MP=1 and TS=1, a WAIT instruction causes Device Not Available (DNA) fault 7. The TS bit is set to 1 on task switches by the CPU. Floating point instructions are not affected by the state of the MP bit. The MP bit should be set to one during normal operations. Protected Mode Enable: Enables the segment based protection mechanism. If PE=1, protected mode is enabled. If PE=0, the CPU operates in real mode and addresses are formed as in an 8086-style CPU.
Cyrix III Via Confidential, Requires Non-Disclosure Agreement
31
Cyrix Processors
System Register Set Table 2-13. CR0 Register EM, TS, and MP Bits Combinations
EM Bit 2 CR0 Register Bits TS Bit 3 Instruction Type MP Bit 1 WAIT ESC
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
Execute Execute Execute Fault 7 Execute Execute Execute Fault 7
Execute Execute Fault 7 Fault 7 Fault 7 Fault 7 Fault 7 Fault 7
32 Via Confidential, Requires Non-Disclosure Agreement
Cyrix III
System Register Set
2
2.4.2
Descriptor Table Registers and Descriptors
Descriptor Table Registers The Global, Interrupt, and Local Descriptor Table Registers (GDTR, IDTR and LDTR), shown in Figure 2-4, are used to specify the location of the data structures that control segmented memory management. The GDTR, IDTR and LDTR are loaded using the LGDT, LIDT and LLDT instructions, respectively. The values of these registers are stored using the corresponding store instructions. The GDTR and IDTR load instructions are privileged instructions when operating in protected mode. The LDTR can only be accessed in protected mode. The Global Descriptor Table Register (GDTR) holds a 32-bit linear base address and 16-bit limit for the Global Descriptor Table (GDT). The GDT is an array of up to 8192 8-byte descriptors. When a segment register is loaded from memory, the TI bit in the segment selector chooses either the GDT or the Local Descriptor Table (LDT) to locate a descriptor. If TI = 0, the index portion of the selector is used to locate the descriptor within the GDT table. Th