CDC 6600 -------- The Control Data Corporation (CDC) 6600 computer system, introduced in 1964, had a powerful Central Processor (CP) and a multithreaded set of ten logical 12-bit Peripheral Processors (PP). The CP ran user code and the PPs ran I/O drivers and much of the operating system. I describe only the original CP instruction set here. The programming model comprises a Program Counter and three register files: 8 18-bit A registers for load/store addresses 8 18-bit B registers for counters and indices 8 60-bit X registers for integer and floating-point data Register B0 is hard-wired to zero. Register B1 usually contains 1 by software convention. Integer data uses the one's-complement representation of negative numbers. The floating-point representation is also one's-complement in the sense that a negative value is represented by complementing all of the bits of the corresponding positive value, not just the sign bit. The layout of a floating-point number is: bit 59 sign bits 58:48 11-bit exponent biased by 02000 bits 47:0 48-bit explicitly normalized binary mantissa Memory is addressed in units of 60-bit words for all purposes. Even instruction addresses are word addresses, so jump target instructions must always occupy the upper bits of a word. Instructions are encoded in either a single 15-bit parcel or a big-endian pair. A two-parcel instruction cannot span a word boundary. The compilers and assemblers must insert no-op instructions to respect these constraints, a process known as "forcing upper". Conditional jumps use absolute destination addresses, not relative displacements. Their predicates can compare two B registers or test a single X register. The subroutine call instruction ("Return Jump") actually modifies the program. It constructs an unconditional jump instruction back to the return address and stores this instruction as the first word of the target subroutine, and then it jumps to the next word to enter the subroutine proper. Subroutines must therefore allocate a one-word buffer at their entry points. A subroutine return is accomplished by simply jumping to the entry point and executing the jump instruction that the Return Jump had stored there. Compilers for recursive languages (e.g., the original ETH Zuerich Pascal 6000 compiler) must save and restore the return address, and reentrant code in later multiprocessor systems had to avoid the Return Jump mechanism altogether. Though widely recognized as the architectural progenitor of later "RISC" load/store architectures, in which all loads and stores are explicit instructions rather than implicit in the addressing modes of computational instructions, the CDC 6600 does not actually have any pure load or store instructions! Instead, it implements loads and stores as side effects of those computational instructions that enter their results into the A registers, as follows: A0 is scratch and does not cause a load or store A1-A5 causes a load into the corresponding X1-X5 A6,A7 causes a store from the corresponding X6 or X7 So incrementing A1 to point to the next element of an array will also cause that element to be loaded into X1. Copying a store address into A6 will then cause X6 to be stored there. (Ralph Grishman dedicated his excellent book "Assembly Language Programming for the Control Data 6000 and Cyber Series" to registers A6 and A7, "without which none of the results in this book could have been saved.") This scheme permits very dense code. Density is an all-important goal for a CDC 6600 programmer, because the processor caches the last seven instruction words in an "instruction stack". A software package called STACKLIB comprised hand-optimized routines that fit in seven words. One can view the load/store mechanism of the 6600 as A register computational instructions with load/store side effects, or as load/store instructions with A register computational side effects. This latter perspective appears in later architectures, most notably POWER, which also increase code density by having "update" modes on their load/store instructions that overwrite their index registers with the effective addresses. The instruction encoding of the CDC 6600 uses a 6-bit opcode (yielding 64 major opcodes) and three 3-bit register designators known as the "i", "j", and "k" fields. Some instructions use the 6-bit "jk" field for an immediate operand. Instructions needing a memory offset or branch target use an additional 15-bit instruction parcel. It is appended to the 3-bit "k" field to construct an 18-bit "K" field constant. Note that the CDC 6600 instruction parcel layout corresponds well to the octal notation of the parcel. It is not impossible to read CDC 6600 code directly from an octal dump. The opcode table for the 6600 is remarkably concise. In octal (naturally), the instructions are encoded thus: +0 +1 +2 +3 000 PS RJ K or XJ JP Bi+K [a] 004 EQ Bi,Bj,K NE Bi,Bj,K GE Bi,Bj,K LT Bi,Bj,K 010 BXi Xj BXi Xj*Xk BXi Xj+Xk BXi Xj-Xk 014 BXi -Xk BXi -Xk*Xj BXi -Xk+Xj BXi -Xk-Xj 020 LXi jk AXi jk LXi Bj,Xk AXi Bj,Xk 024 NXi Bj,Xk ZXi Bj,Xk UXi Bj,Xk PXi Bj,Xk 030 FXi Xj+Xk FXi Xj-Xk DXi Xj+Xk DXi Xj-Xk 034 RXi Xj+Xk RXi Xj-Xk IXi Xj+Xk IXi Xj-Xk 040 FXi Xj*Xk RXi Xj*Xk DXi Xj*Xk MXi jk 044 FXi Xj/Xk RXi Xj/Xk NO [b] CXi Xk 050 SAi Aj+K SAi Bj+K SAi Xj+K SAi Xj+Bk 054 SAi Aj+Bk SAi Aj-Bk SAi Bj+Bk SAi Bj-Bk 060 SBi Aj+K SBi Bj+K SBi Xj+K SBi Xj+Bk 064 SBi Aj+Bk SBi Aj-Bk SBi Bj+Bk SBi Bj-Bk 070 SXi Aj+K SXi Bj+K SXi Xj+K SXi Xj+Bk 074 SXi Aj+Bk SXi Aj-Bk SXi Bj+Bk SXi Bj-Bk [a] 003 uses the "i" field to encode X register conditional jumps: +0 +1 +2 +3 0030 ZR Xj,K NZ Xj,K PL Xj,K NG Xj,K 0034 IR Xj,K OR Xj,K DF Xj,K ID Xj,K [b] 046000 is the canonical no-op used for "forcing upper" one parcel. "i" field values 4-7 encodes the strange "Compare/Move Unit" string instructions on later models. The assembly language syntax for the CDC 6600 supported by the COMPASS assembler uses an initial letter ("B", "S", etc.) to represent some families of related instructions. Operator symbols like "+" and "-" in the operand field then represent the specific operation within the family. The "BXi" instructions (010-017) are the bitwise Boolean operations: 010 BXi Xj a superfluous copy instruction 011 BXi Xj*Xk AND 012 BXi Xj+Xk OR 013 BXi Xj-Xk XOR 014 BXi -Xj complement (also negation) 015 BXi -Xk*Xj AND complement of Xk with Xj 016 BXi -Xk+Xj OR complement of Xk with Xj 017 BXi -Xk-Xj equivalence (XOR with complement) The "S" instructions (050-077) perform 18-bit integer addition and subtraction. Results are sign-extended into their X register destinations. The "SA" instructions (050-057), as noted above, also cause loads and stores as side effects if the result register is not A0. As is nearly universal in instruction set encodings (but see MIPS and Intel x86), opcode zero is the "Program Stop" instruction. PS was a two-parcel instruction. On the CDC 6600, however, this is not necessarily an error condition, but is also the means by which the Central Processor (CP) requests service by the operating system code running in the PPs. The XJ instruction, 01300, was the "exchange jump" that swapped processor state with the CP monitor program's registers in memory, if the CEJ/MEJ (Central Exchange Jump / Monitor Exchange Jump) feature were enabled. (The PP could also force the CP to do an XJ if it were in user mode.) The peculiarities of the Return Jump (RJ) direct subroutine call instruction were described above. The indirect jump (JP) instruction uses a B register. And although "JP B0+K" can be used as an unconditional jump, that is better accomplished with "EQ B0,B0,K" because the RJ and JP instructions void the "instruction stack" cache. The unconditional jump written into the first word of a subroutine by RJ is an EQ, not a JP. The conditional jumps (003i) test an integer value in an X register for four of the six possible relations against zero: 0030 ZR Xj,K jump if Xj == 0 0031 NZ Xj,K jump if Xj != 0 0032 PL Xj,K jump if Xj >= 0 (sign bit clear) 0033 NG Xj,K jump if Xj < 0 (sign bit set) The floating-point representation of the CDC 6600 includes infinite values and illegal ("indefinite") values. Unlike today's IEEE infinities, using an infinite operand raises an exception on the CDC 6600. The conditional jumps IR and OR test a floating-point value and jump if it is "in range" (finite) or "out of range" (infinite), respectively. The conditional jumps DF and ID test a floating-point value and jump if it is definite or indefinite, respectively. The "LXi" and "AXi" instructions are logical left and arithmetic right shifts. There is no logical right shift. Since the CDC 6600 uses a one's-complement representation of negative integer value, its arithmetic right shift really does work like division by a power of two. Note that "LXi jk" and "AXi jk" are the only instructions that require the same register to be used as both the operand and the result. The shifts have interesting end-case behavior, too: - "LXi jk" has a six-bit shift count, and the X register size is 60 bits. A shift of 60 bits clears Xi. Shifts of 61-63 cause a left shift of 1-3. - "AXi jk" leaves Xi filled with copies of its sign bit if "jk" is 59 or greater. - "LXi Bj,Xk" and "AXi Bj,Xk" shift Xk by the amount in the low-order 6 bits of Bj. If Bj is nonnegative, they operate like "LXi jk" and "AXi jk", except that a Bj value greater than 63 causes "AXi Bj,Xk" to return zero. But if Bj is negative, each of these instructions computes instead what the other one would with the absolute value of Bj. "MXi jk" constructs a bit mask in Xi from the count in the "jk" field. It sets the upper "jk" bits and clears the rest. The floating-point instruction set is rather complicated and can only be summarized here. The "FXi" forms return the most significant 48 bits of their unrounded results. The "DXi" forms return the next 48 bits, also unrounded. The "RXi" forms round their results and return the most significant 48 bits. Floating-point numbers on the CDC 6600 have explicit normalization, unlike IEEE-754/1985, and are indeed not always in normalized form in the X registers. The "NXi Bj,Xk" instruction normalizes Xk into Xi and also returns the necessary shift count into Bj. The "ZXi Bj,Xk" instruction normalize with rounding. Since floating-point addition and subtraction return unnormalized results, these normalization instructions are required after sequences of adds and subtracts. Multiplication and division are guaranteed to return a normalized result from normalized operands. Conversions from integer to unnormalized floating-point take place through the "pack" instruction "PXi Bj,Xk". It takes an integer value in Xk and a true exponent in Bj (usually B0). Its complement is the "unpack" instruction "UXi Bj,Xk", which returns an integer value in Xi and a shift count in Bj. Integer multiplication and division in the earliest CDC 6000 and 7000 series machines required conversions to floating-point and the use of the floating-point multiplication and division instructions. Later machines extended the semantics of the double-precision floating-point multiplication instruction "DXi Xj*Xk" to recognize 48-bit integer operands as special cases. The COMPASS assembler accepted "IXi Xj*Xk" as a synonym if the system had this integer multiplication feature installed. Last, the CDC 6600 supports a bit population count instruction, written as "CXi Xk". As a simple example of CDC 6600 code, here is a SAXPY loop: DO 1 I=1,N 1 X(I)=A*X(I)+Y(I) SB1 1 SA1 X load X1 with X(1) SA2 Y load X2 with Y(1) SA3 A load X3 with A SB2 N trip count LAB1 FX4 X3*X1 A*X(I) SA1 A1+B1 advance A1, load next X1=X(I) FX5 X4+X2 A*X(I)+Y(I) SA2 A2+B1 advance A2, load next X2=Y(I) NX6 B0,X5 normalize SA6 A1-B1 store X6 to X(I) SB2 B2-B1 decrement trip count NE B2,B0,LAB1 (This code example avoids a number of common assembler abbreviations used in real COMPASS and is not fully optimized.) The code density here, 8 15-bit instructions in the inner loop, is superb for a scalar machine even by today's standards. One might want to unroll this loop for better performance on the pipelined CDC 7600 -- the 6600 had independent functional units but they were not pipelined -- but care must be taken to keep the loop body such that it would still fit in the small register files (by today's standards) as well as the small "instruction stack".