CSG 65CE02
The CSG 65CE02 is an 8/16-bit microprocessor developed by Commodore Semiconductor Group in 1988. It is a member of the MOS Technology 6502 family, developed from the CMOS WDC 65C02 released by the Western Design Center in 1983.
The 65CE02 was built on a 2 µm CMOS process instead of the original 6502's 8 µm NMOS technology, making the chip smaller as well as using much less power. In addition to changes made in the 65C02, the 65CE02 also included improvements to the processor pipeline to allow one-byte instructions to complete in 1 cycle, rather than the 6502's minimum of 2 cycles. It also removed 1 cycle delays when crossing page boundaries. These changes improved performance as much as 25% at the same clock speed.
Other changes included the addition of a third index register, Z, along with the addition and modification of a number of instructions to use this register. The zero-page, the first 256 bytes of memory that were used as pseudo-registers, could now be moved to any page in main memory using the B register. The stack register was widened from 8 to 16-bits using a similar page register, SPH, allowing the stack to be moved out of page one and to grow to larger sizes.
The 65CE02 was the basis for the system on a chip CSG 4510 that was developed for the unreleased Commodore 65. The 65CE02 was later used for the A2232 serial port card for the Amiga computer. It appears to have seen no other use.
Description
Background
The original 6502 was designed in the era before microcomputers existed, when microprocessors were used as the basis for simpler systems like smart terminals, desktop calculators and many different industrial controller systems. This was also an era when memory devices were generally based on static RAM, which was very expensive and had low memory density. For both of these reasons, the ability to handle "large" amounts of memory was not required, and many processors had operating modes that worked with small portions of a larger address space in order to offer higher performance. Such was the case in the 6502, which used the first memory page, or "zero page", to provide faster access, and the second page, "page one", to hold a 256-byte stack.By the late 1970s, the original MOS Technology team that designed the 6502 had broken up. Bill Mensch had moved to Arizona and set up the Western Design Center to provide 6502-based design services. Around 1981, the main licensees of the 6502 design, Rockwell Semiconductor, GTE and Signetics, began a redesign effort with Mensch that led to the WDC 65C02. This was mainly a CMOS implementation of the original NMOS 6502 that used 10 to 20 times less power, but it also included a number of new instructions to help improve the code density in certain applications. New instructions included
INA/DEA
to increment and decrement the accumulator, STZ
to write a zero to a memory location, and BRA
which was a jump with a branch-style 1-byte relative address. The 65C02 also fixed a number of minor bugs in the original 6502 design.New features
The 65CE02 is a further improved version of the 65C02 which expands the memory model to make it more suitable for a system with large amounts of main memory. To do this, it adds an 8-bit B register, for Base Page, that offsets the zero page to any location in memory. B is set to zero on power-up or reset, so the 65CE02 initially works exactly like the 6502. If a value is placed into the B register usingTAB
the zero page then moves to the new location. A significant use of this feature is to allow small routines that can fit within the 256 bytes of a page to use zero-page addressing which makes the code smaller because addresses no longer have a second byte, which also makes the code run faster because the second byte does not have to be fetched from memory.The 65CE02 also extends the stack from the original 256-bytes of page one to, in theory, the entire address space. It does this by adding another 8-bit register, SPH, for Stack Pointer High. Normally this works like B, offsetting the base address of the stack from page one to any selected page. It otherwise continues to work as before, having a maximum size of one page, 256 bytes. Like B, on startup or reset, SPH is set to 01 so that it works exactly like the 65C02.
When the new "stack extend" bit in the status register is set, using the new
CLE/SEE
instructions, the stack pointer becomes a true 16-bit value. The value in this register is added to the value in the original SP, now known as SPL for Stack Pointer Low, to produce a 16-bit pointer to the bottom of the stack. This allows the stack to grow much larger than the original 256 bytes, which was too small for high-level languages.This means there are two types of stacks, a 256-byte one that can be anywhere, or a 16-bit one spanning memory. While the latter is more flexible, it does mean that accesses into the stack have to construct a 16-bit address from the two registers, taking an extra cycle, and thus slowing overall performance. Using the smaller stack, where possible, offers better performance.
The 65CE02 also adds a new index register, Z. This is set to zero on startup or reset, meaning that its store-Z-to-memory instruction,
STZ
, works just like it does in the 65C02 where the same instruction means store-zero-to-memory. This allows unmodified 65C02 code to run on the 65CE02. A number of other instructions are added or modified to allow access to the Z register. Among these are the LDZ
to load the value from memory, TZA/TAZ
to transfer the value to or from the accumulator, PHZ/PLZ
to push and pull Z to the stack, INZ/DEZ
for increment and decrement, and CPZ
to compare the value in Z to a value in memory.The 65C02 added
BRA
, Branch Always, which was essentially a JMP
that used branch-style 8-bit relative address instead of an absolute 16-bit address. This could be simulated on the original 6502 using BVC
, which, for other reasons, was almost always clear in the 6502. This was no longer true in the 65C02 where certain operations now correctly set this flag. For unknown reasons, the 65CE02 changed the mnemonic to BRU
. They also added the BSR
instruction, Branch to SubRoutine, which uses the same relative addressing mode with the JSR
, Jump to SubRoutine.In addition, the CE added 16-bit addressing, or "word relative", to all of the existing branch instructions. Previously, the branches could only move backward 128 locations or forward 127, based on a signed 8-bit value, the "relative address". In the 65CE02, these could be -32768 or +32767 locations, by following the branch with a 16-bit value. Previously to perform a "long branch" one normally had to use a
JMP
to the 16-bit target and then branch over those three bytes when you didn't want to do it. For instance, if one wanted to branch to address $1234 if the accumulator is zero, one would do a CMP #$00/BNE +3/JMP $1234
, meaning you want to skip over the 3-byte JMP addr
if the accumulator is not zero. In the 65CE02 this can be reduced to something like CMP #$00/BEQ $0123
, thereby making the code more obvious, removing two bytes of instructions, and removing the need for the lost cycles fetching and running the branch. However, as it still uses relative addressing, the relative address has to be calculated from the label by the programmer or assembler when converting to machine code.Another addition to the system were a number of "word" instructions that carried out operations on 16-bit data. This included
INW/DEW
to increment and decrement a value in memory, and ASW/ROW
to perform an Arithmetic Shift Word or ROtate Word.
Finally, more minor changes include the addition of
ASR
to bit-shift right, a NEG A
instruction which performs a two's complement negation on the accumulator, and RTN
, a variation on RTS
that returns to an address offset into the stack instead of at the top, avoiding the need to explicitly POP
off anything the routine added while it ran. The system also added a new addressing mode that used a base address on the stack as the basis for indirect addressing.Finally, the new four-byte instruction was added for future expansion. Although the data-sheet is not clear on its ultimate purpose, it appears to be a placeholder intended to allow instructions to be passed to co-processor units, like a memory management unit.
Pipeline improvements
A major oddity of the original 6502 was that one-byte instructions likeINX
still took two cycles to complete. This allowed for simplifications in the pipeline system; the next byte from memory was fetched while the operation was being decoded, meaning the next byte was fetched no matter what. For most instructions, this byte would be part of an operand, which could then be immediately fed into the now-decoded instruction.If the instruction required only one byte, the processor still read the following byte as it decoded the first. In this case the next byte was the following instruction, but it had no way to feed that back into the first stage of the pipeline to decode it. The fetched instruction was instead discarded and re-read to feed it into the decoder. This wastes a cycle. Although this led to a number of instructions being slower than they could have been, this "feature" was retained in the 65C02, although whether this was in order to retain its pipeline's simplicity or its cycle timing is not explained in available sources.
Maintaining cycle compatibility was not a requirement for the 65CE02, and new fabrication processes made the extra circuitry in the pipeline a non-issue, so the pipeline was re-arranged to correctly handle one-byte instructions. As a result, the 65CE02 can recover faster from the engagement of the SYNC signal, which reduces the minimum instruction execution time from 2 cycles to 1 cycle. These improvements allow the 65CE02 to execute code up to 25% faster than previous 65xx models.
A further improvement addresses an issue involving addressing instructions that add values to produce a final address. Examples include "indexed indirect" where the value in one of the index registers is added to a base address, and then applies the instruction to the resulting address. In the original 6502, if the addition of the two values crossed a page boundary, every 256 locations, an extra cycle was needed to produce the final address value. The 65CE02 removed this limitation, thereby improving the performance of these commonly used modes.
Physical details
It is fabricated using 2 µm CMOS technology, allowing for lower power operation compared to previous NMOS and HMOS versions of the 65xx family. It is housed in a 40-pin DIP that is pin compatible with the 6502.CSG 4510
The 4510 is a system in package variant of the 65CE02 that includes two 6526 CIA I/O port controllers and a custom MMU to expand the address space to 20 bit. It is housed in an 84-pin PLCC.The 4510 was used in the unreleased Commodore 65 home computer and the unreleased Commodore CDTV cost-reduced revision.