I built a 16 bit CPU in Logisim.
It has 15 general purpose registers, 30 machine instructions and a 12 operation Arithmetic Logic Unit (ALU). The CPU can address up to 65536 16 bit words of RAM.
Here’s a picture of the CPU:

Essentially the above circuit connects the CPU CONTROL circuit to the ALU and REGISTER file. It also provides input and output to RAM and an external CLOCK.
ALU
First thing created was the ALU. This is the device within the CPU that performs arithmetic operations, such as addition and multiplication, as well as logic operations, such as shifting or Boolean operations. My ALU pictured below does 12 operations: addition, subtraction, left shift, logical right shift, arithmetic right shift, and, or, nor, exclusive or, multiplication, division and comparison.

My ALU can perform 12 operations represented by a four bit code:
0000 addition
0001 subtraction
0010 shift left logical
0011 shift right logical
0100 shift right arithmetic
0101 logical and
0110 logical or
0111 logical nor (not or)
1000 logical exclusive or
1001 multiplication
1010 division
Register File
Next I created the register file. A register file is type of very fast RAM internal to the CPU which is used as temporary storage for memory contents being operated upon. For example if the CPU was to add two numbers, those numbers would first be loaded from RAM into two registers in the register file before being sent to the ALU to perform the addition. Once the addition is complete the result would be sent back to a register before being written back to RAM. The reason for this is that RAM is typically far slower in operation compared to the register file.
My register file contains 16 16 bit registers. Register 0 is hard wired to zero (0000 0000 0000 0000).
Registers are addressed r0, r1, r2, … r15. Although all registers are general purpose, by convention some are used for fixed purposes and can be address by alternative names:
r0 - zero always zero
r1 - at reserved for assembler use only
r2 - v0 values for function returns
r3 - v1
r4 - a0 function arguments
r5 - a1
r6 - a2
r7 - s0 saved temporaries
r8 - s1
r9 - s2
r10 - t0 temporaries
r11 - t1
r12 - t2
r13 - fp frame pointer
r14 - sp stack pointer
r15 - ra return address
I stole these conventions from the MIPS architecture.
Here’s a picture of my register file. It has one write port and two read ports which means it can write one register and read two registers in a single operation.

Machine Instruction Encoding
The purpose of the CPU is to execute machine instructions stored in RAM. I needed to design an appropriate instruction set and encoding for the instructions.
Again I borrowed heavily from the MIPS architecture. I decided on encoding each machine instruction using two 16 bit words. There are two types of instructions called X and Y.
The X type encoding has five fields: INS8 DST4 SRC4 TGT4 ignored12
And the Y type encoding has four fields: INS8 DST4 SRC4 C16
The names above indicate the use of the field and the numbers above indicate the bit width of the fields.
Both X and Y type encoding have a INS (instruction) field which encodes the instruction to perform, i.e. addition, branch etc. The instruction field is eight bits in size which allows for 256 possible machine instructions, although I only needed 30.
Also common to both types are the DST (destination) and SRC (source) fields which indicate registers in the register file. The source and destination fields are each four bits in size and so can address up to 16 registers. The source field addresses one of the register whose contents is to be used in an operation. The destination field addresses a register where the result of an operation is to the written.
The X type encoding has a TGT (target) field, four bits in size which indicate a register to be used in an operation. The remaining 12 bits in an X type encoding are ignored.
The Y type encoding has a C (constant) field 16 bits in size. This field represents a 16 bit number which can be used as a memory address, branch displacement or a shift constant.
An example instruction is
add r4, r5, r6
which means add register 5 to register 6 and write the result into register 4. This instruction uses an X type encoding.
The add instruction has an encoding of 0000 0000. This encoding is commonly referred as an operation code or opcode. The complete encoding of this instruction is:
0000 0000 | 0100 | 0101 | 0110 | xxxx xxxx xxxx
Where the first 8 bits are the opcode (0000 0000 for add), the next four bits are 4 for register 4, the next four bits are 5 for register 5 and the next four are 6 for register 6. The last 12 bits are shown as X which means “don’t care” and can be either zero or one as they are ignored by the CPU.
Another instruction is:
lw r8, r12, 18fa
lw is shorthand for load word and this instruction will take the 16 bit value 18fa, add to it the contents of register 12 and use the resulting value as a memory address. The contents of the computed memory address will be written to register 8.
The lw instruction uses a Y type encoding. The lw instruction has a opcode of 0001 1001. The complete encoding is:
0001 1001 | 1000 | 1100 | 0001 1000 1111 1010
Again the first 8 bits are the lw opcode, the next four are the destination register r8, the next is the source register r12 and the last 16 bits are the constant value 18fa.
There are 30 machine instruction, each having zero to three operands. The operands are either register addresses or a constant 16 bit integer.
This is the complete instruction set:
add d, s, t - add s to t write result to d
addi d, s, C - add t to immediate C write result to d
sub d, s, t - subtract t from s write result to d
mul s, t - multiply s and t write result to hi and lo registers
div s, t - divide s by t write result to hi and lo registers
sll d, s, C - shift s left logically by constant C write result to d
srl d, s, C - shift s right logically by constant C write result to d
sra d, s, C - shift s right arithmetically by constant C write result to d
sllv d, s, t - shift s left logically by t write result to d
srlv d, s, t - shift s right logically by t write result to d
srav d, s, t - shift s right arithmetically by variable t write result to d
beq s, d, C - branch, add constant C to PC if s equal to d
bne s, d, C - branch, add constant C to PC if s not equal to d
slt d, s, t - set d to 1 if s is less than t
slti d, s, C - set d to 1 if s is less than constant C
and d, s, t - logically and s and t write the result to d
andi d, s, C - logically and s and constant C write the result to d
or d, s, t - logically or s and t write the result to d
ori d, s, C - logically or s and constant C write the result to d
xor d, s, t - logically exclusive or s and t write the result to d
nor d, s, t - logically nor s and t write the result to d
j C - jump, write constant C to the PC
jr t - jump register, write r to PC
jal d, C - jump and link (subroutine call), write PC to d, write constant C to PC
jalr d, t - jump and link register, write PC to d, write t to PC
lw d, s, C - load word, write contents of memory at address s+C to d
sw d, s, C - store word, write d to memory at address s+C
mfhi d - write HI to d
mflo d - write LO to d
halt - halt cpu
In addition to the 30 machine instructions there are 16 pseudo instructions:
nop - no operation
move d, s - write s to d
clear d - write zero to d
li d, C - load immediate, write constant C to d
bgt x, y, C - branch to C if x > y
blt x, y, C - branch to C if x < y
bge x, y, C - branch to C if x >= y
ble x, y, C - branch to C if x <= y
mul d, s, t - write lower 16 bit result of s multiplied by t into d
div d, s, t - write result of s divided by t into d
push d - push d onto the stack using register sp as the stack pointer
pop d - pop d from the stack using register sp as the stack pointer
jal C - jump and link to C using register ra implicitly
jalr t - jump and link to contents of t using register ra implicitly
not d - flip the bits of the contents d
neg d - negate the contents of d
These pseudo instructions are substituted by one or more real instructions by the assembler during assembly. For example neg d is replaced by:
nor d, d, d
addi d, d, 1
Which will flip the bits of d and add one resulting in the 2′s complement (negation) of the contents of d.
Some example code to sum numbers from 1 to 100 is:
clear t0 ; zero sum in t0
li t1, 0064 ; store 100 in t1 count
loop:
add t0, t0, t1 ; sum = sum + count
sub t1, t1, 0001 ; count = count - 1
bne t1, zero, loop ; jump back to add if t1 is not zero
halt ; sum of 1 to 100 in t0
Control Unit
Next came the most difficult step: the control unit.
Essentially a CPU is a deterministic finite state machine and the control unit is what determines the next state of the CPU based upon its current state.
The current state of the CPU is governed by a clock which pulses many times a second. Each time the clock pulses the CPU can do one operation. Modern CPUs can actually do several things per clock pulse (or clock cycle), however I decided to keep thing simple.
To execute the instructions in RAM the CPU needs to fetch the instruction, decode the instruction and only then is it able to execute the instruction. Instructions are addressed in RAM by the Program Counter (PC) register. My instructions are two 16 bits words each, so we need two additional registers to store the fetched instruction. I called these the Instruction Register (IR) and the Operand (OP).
The fetch operation requires four distinct steps:
fetch the word from memory at the address in the PC and store it in IR
add 1 to the PC register
fetch the word from memory at the address in the PC and store it in OP
add 1 to the PC register
These four steps take four clock cycles to complete. The result is that the instruction to execute is in the IR and OP registers, ready for decoding and execution.
Each component in the CPU has one or more control lines. These lines control the operation of the component. For example in the register file above you can see a WRITE input. This controls if a register will be written. The control unit is in charge of determining the values of all the control lines based upon the current clock counter and instruction.
Here’s a picture of my control unit circuit:

My control unit has three inputs:
FETCH 0 indicates if the CPU is fetching an instruction
T 0000 clock cycles elapsed since previous instruction completion
INSTRUCTION 00000 opcode from the INS8 field from the IR register
There are 17 outputs which represent control lines to other parts of the CPU.
HALT 0 halt CPU
T RESET 0 reset clock counter
FETCH 0 fetching
PC SRC 00 selector for PC src
PC WRT 00 write PC reg
MAR SRC 0 selector for MAR src
MEM WRT 0 write RAM
IR WRT 0 write IR reg
OP WRT 0 write OP reg
REG SRC 000 selector for write REG src
REG WRT 0 write REG
ALU B SRC 0 selector for ALU B src
REG SRC2 SRC 0 selector for REG reg2 src
ALU OP 0000 ALU opcode
RESULT WRT 0 write RESULT reg
LO WRT 0 write LO reg
HI WRT 0 write HI reg
These 17 control lines coordinate everything that happen in the CPU during an instruction execution.
Here is an example for the instruction lw r2, r4, babe at RAM address 0000 0000 0000 000.
This instruction lw (load word) loads a 16 bit word from RAM into a register. In this example it adds the 16 bit hexadecimal value babe (47806) to the contents of r4 and fetches the contents of RAM at that address and stores it in r2. The full machine instruction encoding is: 0001 1001 0010 0100 1011 1010 1011 1110.
Let’s assume the PC contains 0000 0000 0000 0000, the T counter is 0000 and FETCH is 1.
Because FETCH is 1 the control unit will trigger control signals to perform an instruction fetch. For my CPU the instruction fetch takes a total of 4 clock cycles. The control circuit outputs 1 on IR WRT to trigger the IR reg to write. It outputs 0 on MAR SRC to indicate that the PC reg is to be used as the RAM address src (MAR is the memory address register). This results in the 16 bits (0001 1001 0010 0100) from address 0000 0000 0000 0000 being written to the IR register. The control unit also outputs 1 on FETCH to ensure it stays in the fetch state.
When the clock again pulses the T counter counts to 0001. Now the control unit input is FETCH 1, T 1, INSTRUCTION 11001. The INSTRUCTION value is coming from the INS field of the IR register, but since the CPU is still fetching it’s ignored for now. The control unit now outputs 01 on PC WRT which selects the value to write into the PC reg. Value 01 selects PC+1 to be written to PC which results in the PC equal to 1. The control unit also outputs a 1 on FETCH as the fetch operation is not yet complete.
The clock pulses again causing T to increment to 0010. FETCH is still 1 so the control unit outputs OP WRT as 1 which causing the OP reg to be written with the contents of RAM. 0 is output on MAR SRC causing the PC reg to be used as the address src. The result is the 16 bit contents of RAM (1011 1010 1011 1110) at the address contained in the PC reg (0000 0000 0000 0001) is stored in the OP reg. 1 is also output on FETCH to continue the fetch in the next clock cycle.
Again the clock pulses and T becomes 0011. As in step 2 the PC WRT line is set to 01 which causes the PC reg to be written with the value of PC+1. This results in the PC containing 0000 0000 0000 0010 which is the address of the next instruction in RAM. The control unit also sets FETCH to 0 as the fetch operation is complete.
Now T is incremented to 0100 due the the next clock pulse. The input to the control unit is now FETCH 0, T 0100 and INSTRUCTION 11001. Because FETCH is 0 the control unit this time will enter the execution phase. It outputs 1 to RESULT WRT which triggers the RESULT reg to be written. The RESULT reg is connected to the output from the ALU, so this causes the result from the ALU operation to written to the RESULT reg. The ALU OP line is set to 0000 which is the ALU opcode for addition. The ALU B SRC is set to 1 which causing the B input of the ALU to come from the OP reg (1011 1010 1011 1110). The A input of the ALU is always connected to the SRC1 OUT of the register file. SRC1 REG of the register file is always connected to the SRC4 field of the IR reg which is 0100 (r4 from the instruction). This results in the contents of r4 and the OP reg sent to the ALU which performs an addition and send to the result to the RESULT reg which is written. FETCH is kept at 0 as the instruction is not yet complete.
Now we enter the 5th clock pulse since the beginning of this lw instruction. T is incremented to 0101, FETCH is still 0 and INSTRUCTION is still 11001. The control unit outputs MAR SRC as 1 which causes the RESULT reg to be used as the source for the RAM address. REG SRC is set to 101 which causes the data read from RAM to be sent to the DATA IN input of the register file. The DST REG input of the register file addresses the register to write and is always connected to the DST4 field of the IR reg. The control unit outputs 1 to REG WRT which causes the register file to write it’s DATA IN input to the register specified by the DST REG input (r2 in the example). The control unit also sets T RESET to 1 which triggers the T reg to reset to 0000 and sets FETCH to 1. Both of these setup for the next instruction fetch as this instruction is now complete.
Computer
Once I had the CPU I needed a computer to test it in. Here’s a picture of mine:

The computer is just the CPU with some RAM and various input/output (IO) devices.
I added a few memory mapped IO devices to the computer. A memory mapped IO device is a device that appears to the computer as a series of memory addresses. A program can read or write to those memory addresses to communicate with the IO device. For example in my computer I added a simple output screen at address fffe and ffff. To write a character to the screen a program writes it’s ASCII code the the memory address fffe. There’s circuity to detect this address and instead of writing to RAM the data is sent to the screen instead. Writing a 1 to ffff clears the screen.
I added a random number generator to address f002. A program can get a random 16 bit value every time it reads from address f002.
Here’s a program to write random characters to the screen using the above IO devices.
loop:
lw t1, zero, 0f002 ; get random num
andi t2, t1, 1f ; zero top 11 bits
addi t2, t2, 20 ; t2 between 20 and 3f (printable ASCII)
sw t2, zero, 0fffe ; write random character
j loop ; do it again forever
I wrote an assembler to take this code and assemble it into the machine instructions that cane be loaded into RAM and executed in the CPU emulated by Logisim.
Next Steps
It took about six months to create this CPU, working for a few hours each weekend.
When designing the instruction set I wanted it to be easy to program, hence the large number of registers. The four cycle instruction fetch is quite poor, so I’m going to try to remove that. I’m considering a much simpler RISC design, with a single cycle fetch, decode and execute steps. Perhaps even some form of pipe-lining or dual ALU cores, or perhaps a stack based architecture.
I also have plans of writing a Forth implementation for it and then perhaps an implementation of Lisp or C. I’d need to write an emulator for the CPU in order to simplify testing as programs running on the CPU in Logisim are quite difficult to debug.