Cray-2 and Cray-3 ----------------- The Cray-2 computer system was introduced by Cray Research in 1983. It was a shared-memory multiprocessor from the start, shipping initially with 4 high-performance vector "background processors", one simple "foreground processor" for I/O and system controller code, and a very large shared memory. Only the background processor instruction set is described here. The background processor's programming model comprises a Program Counter and: 8 32-bit address (A) registers 8 64-bit scalar (S) registers 8 64-bit vector (V) registers, each with 64 elements 1 64-bit vector mask (VM) register 1 6-bit vector length (VL) register 16K 64-bit Local Memory (LM) words 1 1-bit Semaphore shared with other processors Memory is addressed, like the Cray-1, in units of 64-bit words for data references and in units of 16-bit instruction parcels for instruction fetches. Instruction parcels are mapped in big-endian order into their words and denoted by an octal word address suffixed with a letter 'a'-'d'. No register has special values, unlike B0 on the CDC 6600 and A0 or S0 on the Cray-1. Any A or S register can be tested in a conditional jump. Only S and V registers can be loaded or stored from/to real memory. The A registers had to be copied through the S registers. There is also no path between Local Memory and real memory, either. Vector registers must be used to implement block copies. octal assembly description 000x00 ERR error exit 000xjk EXIT jk normal exit (system call to foreground) 001000 CMR wait until memory port quiet 002i-k R,Ai Ak return jump to Ak, return address to Ai 003---mn J mn unconditional jump 004---mn JCS mn jump if semaphore clear (and set it) 005---mn JSS mn jump if semaphore set (else set it) 006--- SSM set semaphore 007--- CSM clear semaphore 010--kmn JZ Ak,mn jump if Ak zero 011--kmn JN Ak,mn jump if Ak nonzero 012--kmn JP Ak,mn jump if Ak >= 0 (sign bit clear) 013--kmn JM Ak,mn jump if Ak < 0 (sign bit set) 014-j-mn JZ Sj,mn jump if Sj zero 015-j-mn JN Sj,mn jump if Sj nonzero 016-j-mn JP Sj,mn jump if Sj >= 0 (sign bit clear) 017-j-mn JM Sj,mn jump if Sj < 0 (sign bit set) 020ijk Ai Aj+Ak address add 021ijk Ai Aj-Ak address subtract 022ijk Ai Aj*Ak address multiply 024ij- Ai Sj transfer Sj to Ai 025i-- Ai VL read vector length (mod 64) 026ijk Ai jk,S,P immediate load (6 bits, zero filled) 027ijk Ai jk,S,M immediate load (6 bits, one filled) 030--k VM Vk,Z test Vk for zero elements 031--k VM Vk,N test Vk for nonzero elements 032--k VM Vk,P test Vk for elements >= 0 (sign clear) 033--k VM Vk,M test Vk for negative elements (sign set) 034-j- VM Sj copy Sj to vector mask 035--0 DRI disable memory addressing error interrupt 035--1 ERI enable memory addressing error interrupt 035--2 DFI disable floating-point interrupt 035--3 EFI enable floating-point interrupt 036--k VL Ak set vector length [note!!] 040i--m Ai m,P,P immediate load (16 bits, zero filled) 041i--m Ai m,P,M immediate load (16 bits, one filled) 042i--mn Ai mn,H immediate load (32 bits) 044i--m Ai [m] load Ai from Local Memory (direct) 045--km [m] Ak store Ak to Local Memory (direct) 046i-k Ai [Ak] load Ai from Local Memory (indexed) 047-jk [Ak] Aj store Aj to Local Memory (indexed) 050i--mn Si mn,H,P immediate load (32 bits, zero filled) 051i--mn Si mn,H,M immediate load (32 bits, one filled) 052i--mn Si mn,L immediate load upper 32 bits (zero fill) 053i--mnop Si mnop,F immediate load 64 bits 054i--m Si [m] load Si from Local Memory (direct) 055-j-m [m] Sj store Sj to Local Memory (direct) 056i-k Si [Ak] load Si from Local Memory (indexed) 057i-k [Ak] Si store Si to Local Memory (indexed) 060ijk Si (Aj,Ak) load Si 061ijk (Aj,Ak) Si store Si 062i-k Si (Ak) load Si 063i-k (Ak) Si store Si 064i-kmn Si (Ak,mn) load Si 065i-kmn (Ak,mn) Si store Si 066i--mn Si (mn) load Si 067i--mn (mn) Si store Si 070ijk Vi (Aj,Ak) load Vi from Aj, stride Ak 071ijk (Aj,Ak) Vi store Vi to Aj, stride Ak 072ijk Vi (Ak,Vj) gather Vi from Ak, offsets Vj 073ijk (Ak,Vj) Vi scatter Vi to Ak, offsets Vj 074i-k Vi [Ak] load Vi from Local Memory, stride 1 only 075i-k [Ak] Vi store Vi to Local Memory, stride 1 only 076--- PASS canonical no-op 100ijk Si Sj&Sk AND 101ijk Si #Sk&Sj AND with complement 102ijk Si Sj\Sk XOR 103ijk Si Sj!Sk OR 104ijk Si Sj+Sk integer add 105ijk Si Sj-Sk integer subtract 106ij0 Si PSj population count 106ij1 Si QSj parity (low bit of pop count) 107ij- Si ZSj leading zero count 110ijk Si Sijk logical right shift 112ijk Si Si,SjAk right shift with fill from Sj 114i-- Si VM read vector mask 115i-- Si RT read real-time clock 116ijk Si jk,S,P immediate load (6 bits, zero fill) 117ijk Si jk,S,M immediate load (6 bits, one fill) 120ijk Si Sj+FSk floating add 121ijk Si Sj-FSk floating subtract 122ijk Si FIX,Sk convert floating to integer 123ijk Si FLT,Sk convert integer to floating 124ijk Si Sj*FSk floating multiply 126ijk Si Sj*ISk reciprocal iteration (2-Sj*Sk) 127ijk Si Sj*QSk recip square root iteration (3-Sj*Sk)/2 130i-k Si Ak transfer Ak to Si, zero fill 131i-k Si +Ak transfer Ak to Si, sign extended 132ij- Si /HSj reciprocal approximation 133ij- Si *QSj reciprocal square root approximation 140ijk Vi Sj&Vk AND 141ijk Vi Vj&Vk 142ijk Vi Sj\Vk XOR 143ijk Vi Vj\Vk 144ijk Vi Sj!Vk OR 145ijk Vi Vj!Vk 146ijk Vi Sj!Vk&VM merge Sj (where VM set) with Vk (where clear) 147ijk Vi Vj!Vk&VM merge Vj (where VM set) with Vk (where clear) 150ijk Vi VjAk logical right shift 152ijk Vi Vj,VjAk continuous right shift (fill from prior element) 154ijk Vi Sj*FVk floating multiply 155ijk Vi Vj*FVk 156ijk Vi Vj*IVk reciprocal iteration (2-Vj*Vk) 157ijk Vi Vj*QSk recip square root iteration (3-Vj*Vk)/2 160ijk Vi Sj+Vk integer add 161ijk Vi Vj+Vk 162ijk Vi Sj-Vk integer subtract 163ijk Vi Vj-Vk 164ij0 Vi PVj population count 164ij1 Vi QVj parity (low bit of pop count) 165ij- Vi ZVj leading zero count 166i-k Vi /HVk reciprocal approximation 167i-k Vi *QVk reciprocal square root approximation 170ijk Vi Sj+FVk floating add 171ijk Vi Vj+FVk 172ijk Vi Sj-FVk floating subtract 173ijk Vi Vj-FVk 174i-k Vi FIX,Vk convert floating to integer 175i-k Vi FLT,Vk convert integer to floating 176ijk Vi CI,Sj&Sk compressed index from mask Sj, 32-bit stride Sk The Cray-3 was a system developed but never successfully produced by Cray Computer Corporation (1989-1995). Its background and foreground processor instruction sets were nearly identical to that of the Cray-2, apart from gratuitous differences in "j"/"k" field usage in some monadic instructions, different floating-point rounding behavior, and the addition of these "bidirectional" vector memory reference instructions: 134ijk Vi load Vi from Aj, stride Ak 135ijk Vi store Vi to Aj, stride Ak 136ijk Vi gather Vi from Ak, offsets Vj 137ijk Vi scatter Vi to Ak, offsets Vj These have nearly the same semantics as the 070-073 instructions, but with a twist: a 134 or 136 load instruction can run simultaneously with a 135 or 137 store instruction. It is up to the programmer or compiler to guarantee that the parallel address streams are distinct. An S register load or store, a normal 070-073 vector reference, or a CMR instruction serves as a memory barrier. Cray-2 assembly language code would have assembled and run on the Cray-3 without change (had the machine worked!) but would have produced slightly different floating-point results. [ pmk - summarize difficulties of Local Memory ??!! ]