Adding an IRQ
handler to the caching, pipelined model
We've added an extra final stage to the pipeline
in which a check is made for interrupts and other
kinds of exceptions. That's the CPU5 model.
The design rationale is explained in detail
below, but for the moment just note that an
interrupt causes existing instructions to be
flushed from the pipeline and a jump to a handler
at address 0x4 to occur instead.
The Cpu5 Java code evokes this model. The
interface is syntactically the same as for all the
other CPUs, but there are a few semantic
differences which will be enumerated below:
Cpu5
static void main (String args[]) |
entry point; handles command line options
|
We have created some peripherals which will run
simultaneously in a different Java thread, talking
across to the CPU in an unpredictable manner (from
the CPU's point of view!).
The peripherals we've prepared are a screen and a
keyboard unit. They engage in IRQ-mediated
communications with the CPU. I've embedded one
screen and one keyboard together in a new console
object instantiating a new Console5 class. The
main code in Cpu5 now takes care to embed one of
the new-style consoles instead of an old-style
console in the memory unit's address intercept
table:
cpu.memory.console = console = new Console5(cpu);
Separate threads of computation corresponding to
the screen and keyboard are launched:
Thread sh = new Thread (console.screen);
sh.start();
Thread kh = new Thread (console.keyboard);
kh.start();
and from then on its up to the correctness of the
simulation of the IRQ-driven I/O to keep things
running smoothly.
What else is new?
The IRQ-enabled emulator's Java code is to be
found in the CPU5 class, which runs the basic
fetch decode execute pipeline. At the interface
level, there is no change:
CPU5
CPU5 () |
CPU builder |
void run () |
fetch/decode/execute cycle |
There are internal changes, however.
In particular, interrupts are initiated by a
peripheral setting a new IRQ boolean in the CPU.
There's a precise protocol involved which I'll
describe in further detail below. And when an
interrupt handler code finishes it will run the
new MIPS 'rfe' instruction ("return from
exception"), and that will finish off the CPU's
interrupt acknowledgment part in the protocol. So
there is a new instruction to be handled.
In summary, the IRQ-enabled simulator's Java code
differs in the following ways from its
predecessors:
- It implements three new MIPS instructions in
total: 'rfe' (return from exception); 'mfc0'
(move from coprocessor 0, which is an extra
hardware unit in the CPU intended to help deal
with interrupts and other exceptions); 'mtc0'
(move to coprocessor 0).
- There are three new ('IRQ coprocessor')
registers, STATUS, CAUSE and EPC, involved with
these instructions.
- The register unit has been extended to deal
with the extra registers and the Decode stage
has been taught to deal with the extra
instruction formats.
- The STATUS register is used for turning
the servicing of interrupts on and off. When
bit 1 is clear, interrupts are ignored by
the processor.
- The CAUSE register is set by the CPU to
indicate the reason for which an exception
handler is being run.
- The EPC register is where the CPU saves a
copy of the current value of the PC register
while a handler is being run.
STATUS, CAUSE, EPC are registers $12, $13, $14
respectively in the interrupt coprocessor (0)
register index space. You can use the
abbreviations $status, $cause, $epc, respectively.
The mfc0 and mtc0 instructions respectively read
from and write to the new STATUS, CAUSE and EPC
registers (to/from the standard registers $0-$31),
and that's how the MIPS programmer checks and
changes them.
For example, "mfc0 $1, $status" reads from the
STATUS register to general register $1. "mtc0
$status, $1" moves data the other way.
- There are two new CPU boolean instance
variables, IRQ and IACK, which serve to mediate
the IRQ protocol.
- The IRQ boolean is set by a peripheral
wishing to alert the CPU.
- The IACK boolean is set by the CPU to
indicate that it has seen the IRQ.
Any code dealing with the IRQ or IACK booleans
will be found inside a Java synchronized block, in
order to make sure that only one thread at a time
attempts to access these variables, which are
shared between the threads. The synchronization is
effected through the Clock class.
- There are two new CPU methods which
respectively lower and raise IRQ. They are meant
to be used by peripheral I/O devices to signal
to the CPU:
public void lowerIRQ() {
synchronized (Clock.class) {
while (!IRQ || !IACK)
wait();
// IRQ IACK
IRQ = false;
notifyAll();
// !IRQ IACK
while (IACK)
wait();
// !IRQ !IACK
}
}
public void raiseIRQ() {
synchronized (Clock.class) {
while (IRQ || IACK)
wait();
// !IRQ !IACK
IRQ = true;
notifyAll();
// IRQ !IACK
while (!IACK)
wait();
}
}
Peripheral devices using these methods will be
held up until "the coast is clear". The IRQ
boolean cannot be set by a peripheral until it has
first been unset, for example. A peripheral device
wanting to raise IRQ via the raiseIRQ method will
be forced to wait if IRQ is already set until both
it and IACK become unset first. Then there is a
further wait until the CPU sets IACK, indicating
that it has seen IRQ and is running or going to
run the handler. This semantics corresponds to the
way the hardware built into I/O peripherals works.
- When an I/O interrupt occurs (i.e. IRQ is set
by a peripheral) and the CPU starts to handle
it, the CAUSE register is immediately set to the
value 0. Other settings indicate other kinds of
exception, such as floating point overflows,
that the CPU deals with in similar but slightly
different ways. The code at the end of the
pipeline that sets the CAUSE register looks like
this:
int status = STATUS.read();
if ((status & 0x2) != 0 // irqs are not masked when bit 1 is set
&& IRQ // peripheral raised irq
&& !IACK // we/handler haven't dealt with it yet
) {
// set cause register to value 0
CAUSE.write(0);
}
Note the guard which checks bit 1 (using bitmask
0x2 which is ...0010 in binary and has bit 1 set
and all other bits unset) of the STATUS register
is set before starting to handle the IRQ. When bit
1 of the STATUS register is not set, we say that
interrupts are masked. They will not be serviced.
And note that it is also required that the CPU's
IACK flag not be set for the handler to be
started. If it were set it would indicate that the
CPU were already running the handler for an IRQ,
and one does not want to interrupt an interrupt
(it can be and is safely done, using interrupt
priorities, in other architectures).
- While handling the interrupt, the CPU turns
off further interrupts by unsetting bit 1 of the
STATUS register for the duration of the handler
call. It saves the current status flags (the
bottom 16 bits of the STATUS register) for later
by shifting them out of the way further up the
register:
STATUS.write(status << 16); // save status mask and zero current mask
We're not really cheating by implementing the
shift in Java instead of via the ALU or another
"hardware" object, because shifts are often
implemented in hardware just by connecting the
right wires together and waiting a cycle. Still, I
haven't been very exact here. There is an extra
cycle required to execute all these movements of
data on receiving the interrupt, and we really
need to modify the accounting to show it. The
pipeline will be flushed, however, as discussed
below, and that will introduce delays that are
measured by the simulation and that are even
longer than the unaccounted one cycle we are
quibbling about now. And interrupts are relatively
rare per clock on a 1GHz CPU! So whatever mistake
we have made it's going to make a difference only
on the order of one in a million in the
accounting, "long" term. Nevertheless, kudos will
go to somebody who studies the time accounting
around the treatment of interrupts in the
simulator, and who makes it better if it needs to
be.
- The CPU also copies the PC value to the new
EPC register for safekeeping. Then it will jump
to a predefined address (0x4 in this emulation),
where some interrupt handler code will have been
placed at CPU boot time. The jump is prepared by
loading the PC with the destination address and
flushing the pipeline:
PC.write(0x4); // prepare jump to Ox4
conf0 = null; // flush pipeline
conf1 = null;
conf2 = null;
conf3 = null;
The next cycle the CPU will naturally fetch the
first word of the handler code into the pipeline
and execution of the handler will commence.
However, it is not that simple to copy the PC
value for safekeeping because the PC may have been
pre-incremented for Fetch several times after the
"current" instruction was started. More details of
how it is done will be found below. It's quite a
little saga to figure out what it should be and
you may like to consider if one perhaps needs some
dedicated extra decode hardware to do it. I don't
think so, but I may be mistaken in my by-eyeball
evaluation.
When the interrupt handler code finishes, it
executes rfe, which restores PC from EPC and
shifts the STATUS register down again, thereby
restoring its original state. The rfe instruction
is pipelined as follows:
- Fetch ...
- Decode ...
- Read EPC and STATUS registers
- Execute ALU op to shift status value right 16
bits
- Write EPC value into PC and shifted status
value into STATUS register, drop IACK if IRQ has
already been dropped or else set a flag that
indicates the CPU must drop IACK when IRQ is
eventually dropped
and rfe needs no other handling in the pipeline
beyond a flush that must follow on its writing the
PC - the prefetched instructions trailing it in
the pipeline are likely just random nonsense
tagging on beyond the end of the handler code in
memory, and they need to be purged:
if (... // Write stage termination code
|| conf3.op == J
...
|| conf3.op == RFE // flush pipeline after rfe reaches Write too!
) {
conf0 = null;
conf1 = null;
conf2 = null;
}
In case rfe had to set a flag to drop IACK later
rather than being able to drop it at once (because
IRQ is still set high by the peripheral when the
handler finishes - the CPU is generally much, much
faster than any peripheral), there is an extra
check just at the end of each and every cycle. It
checks to see if IACK should be dropped now
because IRQ has finally dropped:
if (!IRQ && pleaseLowerIACK) { // carry out pending drop of IACK
if (IACK) {
IACK = false;
notifyAll(); // tell interested threads IACK has changed
}
pleaseLowerIACK = false;
}
Until both IRQ and IACK have dropped no new IRQ
will be issued by any peripheral.
- The CPU only checks for interrupts at all and
- perhaps - performs the actions detailed above
immediately after some instruction has finished
Write.
As was commented at the start of this section,
that is accomplished by introducing an extra,
final, pipeline stage called Irq. The stage checks
for IRQ having been raised by some peripheral,
performing the actions described in the paragraphs
above. Entry of an instruction into the Irq
processing final stage is guarded by a check of
the IRQ value to see if it is even plausible that
an IRQ might need processing now, otherwise the
stage is skipped:
if ((STATUS.read() & 0x2) == 0
|| !IRQ
|| IACK
)
return;
Why only check for IRQs after some instruction
has exited the Write stage? And what does the CPU
need to do that is based on the instruction rather
than being completely generic? Why look at the
instruction at all?
Firstly, the CPU cannot attend to an interrupt
while the pipeline contains only partially
completed instructions.
If we were to try that we'd find that we really
wouldn't know what program address to return to
with the rfe instruction, because it's not yet
certain which if any of the jumps or branches in
the pipeline at the time of the interrupt will be
executed or not.
Worse, the value of the PC when the interrupt
occurs is that corresponding to the instruction
being pre-fetched into the pipeline, which is not
the same as the next instruction that has yet to
complete.
It's just not going to work.
You may wish to do a cleverer analysis than I,
but I've settled for not handling an IRQ at all
until some instruction has just completed for
sure, so we know what the next instruction is
going to be. And what is it going to be? It
depends on the instruction just completed! That
instruction needs to be examined, and that's why
there is a final Irq stage. It is precisely to
look at the just-completed instruction and set up
the value of the PC to be saved according to what
it is.
If the just-completed instruction is a jump (or
rfe itself), then the next instruction has to be
from the value of PC just set by the jump (or rfe)
in the Write stage.
Likewise if the completed instruction is a branch
that succeeded.
In all other cases the next instruction should be
the instruction that comes 4 bytes after the
address of the one that just completed.
Here's the code that sets the EPC in the Irq
stage. It performs the instruction-based analysis
detailed just above:
if (
(conf4.op == J
|| conf4.op == JAL
|| (conf4.op == 0 && (conf4.func == ALU_JALR || conf4.func == ALU_JR))
|| conf4.op == RFE
|| (s.z != 0 &&
(conf4.op == BEQZ
|| conf4.op == BEQ
|| conf4.op == BNEZ
|| conf4.op == BNE
|| conf4.op == BLTZ
|| conf4.op == BGEZ
|| conf4.op == BLE
|| conf4.op == BGT
)
)
)
) { // for jump or successful branch ...
EPC.write(PC.read()); // the PC just set in Write is what to come back to
} else { // in all other cases ...
EPC.write(conf4.pc); // want to come back to this instr's PC+4
}
- The CPU also sets the IACK boolean early in
the Irq stage to indicate to peripherals that it
has now seen the IRQ flag and is definitely
going to be executing the handler. Here's the
code:
if (IRQ && !IACK) {
IACK = true;
notifyAll();
}
The notifyAll is the mechanism used to tell all
interested peripherals running in other threads
(in the simulation!) that the IACK flag has just
changed. Look in the Java lang API documntation to
see how it and wait work.
- The peripheral that raised IRQ may afterwards
choose to lower IRQ whenever it likes. But it
won't do so until it has seen IACK.
- The CPU will lower IACK again when the hander
finishes, in the execution of the rfe
instruction, or later still if IRQ has still not
yet been lowered by the peripheral.
- When both IRQ and IACK have been lowered, the
same peripheral or another may raise another
IRQ.
With the pipeline flushed, the PC set to the
handler address, EPC containing the return
address, STATUS shifted up 16 bits to blank the
current mask, IACK set, the next CPU cycle will
start with the fetch of the first word of the
handler code into the now empty pipeline.
It's hard to see what one could do to assuage the
pain of the pipeline flush, because any
instruction may be followed by an IRQ handler
sequence without warning, so one can't prefetch
the forthcoming handler sequence into the pipeline
at the Fetch stage. Can you find any ideas out
there on the Web? All that occurs to me is to let
the pipeline drain naturally before starting the
handler. Or start the handler at the next jump or
branch instruction, since the pipeline would have
been flushed there too.
- Note that the sequence of states CPU and
peripheral jointly pass through is always
!IRQ !IACK; IRQ !IACK; IRQ IACK; !IRQ IACK; !IRQ !ACK
The second transition marks the handler start.
The handler finishes on the final transition. The
CPU will not start handling another interrupt
while the handler is running. It will not handle
another interrupt until the sequence of states
comes back to the start again.
Putting it another way, the cycle of events is
always:
- peripheral sets IRQ
- CPU sets IACK and starts handler
- peripheral unsets IRQ
- CPU finishes handler and unsets IACK
The cycle starts from the situation in which IRQ
and IACK are unset, and terminates with them unset
again. Note that the peripheral controls the IRQ
setting and the CPU controls the IACK setting.
The peripheral device must not deassert IRQ
before it has seen the processor raise IACK (or
else the interrupt may be missed by the CPU).
The CPU must not drop IACK before it has seen the
peripheral drop IRQ, even if the handler has
finished (or else the acknowledgment may be missed
by the peripheral).
The peripheral must not raise IRQ again until it
has seen the processor drop IACK (or else the CPU
may see it as continuing to assert the last
interrupt and thus miss the new one).
Peripherals
What do peripherals do and how do they work with
the IRQ model protocol introduced above?
Peripherals are like small dumb CPUs in at least
one way: they run a continuous cycle, like a CPU.
But it's one in which they "do their own thing"
and also occasionally try and tell the CPU about
it by raising an IRQ. The IRQ means "you (the CPU)
can learn something by looking here now".
A screen
Consider first a screen I/O peripheral. We've
written it as a Screen class which implements the
Runnable interface so it can be launched as a
separate thread in Java. That means that it has a
"main" routine called run:
Screen implements Runnable
Screen ( ) |
Constructor |
void run ( ) |
runs the screen action loop |
int print (char) |
sends a character to the screen buffer for
later printing, returns number accepted |
int available () |
returns number of characters it is still
possible to write to screen buffer |
int signalled () |
returns number of IRQs signalled to but
not yet acknowledged by CPU |
int printed () |
returns number of characters printed since
last IRQ |
int dropped () |
returns number of characters dropped since
last IRQ |
The run method contains the screen's working
cycle. It's a loop that only stops when the CPU
stops the clock. Until then it does nothing but
wait on an event from the CPU clock, likely a
clock tick forward, but possibly also a
notification from the CPU on a change of value in
the CPU's IACK variable or its STATUS register.
When the notification reaches it, it runs its
private output method, which may move either 0 or
1 characters out of its internal buffer and onto
the screen's physical display area:
public void run () {
while (Clock.running()) {
Clock.class.wait(); // wait for a clock tick
output(); // perform the screen's next action - 0 or more chars printed
}
}
Characters are accumulated in the screen's
internal buffer as code executing in the CPU
writes to a specific memory address
PUTCHAR_ADDRESS and that eventually translates to
a call to the screen's print method via the CPU's
iobus.
The screen's internal (circular) buffer is some
128 bytes in length. It is implemented via an
index front that points to the front of the
buffer, which is the first character due to be
printed next, and an integer count that says how
many characters there are in the buffer beyond
front waiting to be printed.
Thus the last character in the buffer is the
(front + count - 1) % 128 th, counting from
0. The first character in the buffer is the
front % 128 th. That modular arithmetic
calculation for the index position is what
circularity means! The 128th character is also the
0th character in the buffer. Here's a picture:
If one kept adding to the end of the buffer
without cease (and the buffer were to accept that)
one would eventually overrun the front of the
buffer again, like a snake catching up with its
own tail.
Characters are taken from the front of the buffer
by the screen hardware and sent to the physical
display. Each character printed increments front
and decrements count, internally.
Any character sent to the screen for printing
gets added at the end of the buffer, in the (front
+ count) % 128 th position. Each character
added to the buffer increments count.
The screen object also contains an integer pend
which counts the number of characters it has
printed from its buffer but has not yet told the
CPU that it has printed, via IRQ. Every time a
character is printed to the physical display, pend
is incremented. Every time an IRQ is raised, pend
is set to zero again. In all usual circumstances,
the value will be 0 or 1 for pend.
There is also an integer count tot for the number
of IRQs the screen has sent out but not yet
received an IACK in return for. Each time the
screen sets IRQ it increments tot. Each time the
screen receives IACK, it decrements tot. Again, in
all usual circumstances the value of tot will be 0
or 1.
And finally there is a count of the number of
characters discarded since the last IRQ because
the buffer was already full when they were added.
It is the instance variable discard.
You might like to consider a slightly different
version of the screen printer, one which never
suffers from the full buffer problem (incrementing
discard every time that happens), but which
instead simply silently overwrites the oldest
unprinted character still in the buffer when it
has to. I'm not sure how relatively pleasant or
unpleasant that would be for someone programming
for the revised screen hardware. See below.
Programming for the
screen
The plan is that the console object to which the
memory unit redirects those single byte writes
which are addressed to PUTCHAR_ADDRESS should call
the screen's print method. This corresponds to a
physically wired connection in the hardware. The
screen's print method will place the character on
its buffer, if there is room, incrementing count.
If there is no room in the buffer, it will discard
it, incrementing discard.
Some time afterwards, the screen may issue an
IRQ. The IRQ usually will signify "I have printed
it", but detail can be obtained by interrogating
the screen from the interrupt handler.
Blow-by-blow data on what has happened is
available for inspection via the control port(s).
If the character was printed, pend will have been
incremented (and count decremented again back down
to its previous level). If the screen is still
waiting to print it, count will have been
incremented and pend not. If the character did not
even make it into the buffer, the discard count
alone will have been incremented.
The wise and conservative programmer should plan
on writing program code which sends a single
character to the console, then waits (possibly
doing something else useful if the programmer is
kind to the sensibilities of the person sitting at
the keyboard) for an IRQ in response. The handler
should then check with the console control port(s)
to see that the character has been printed. Then
the programmer should arrange that the code sends
another character. If not printed, the wait should
be continued or the character should be resent
according to circumstances. And so on.
A more foolhardy and adventurous programmer may
choose to send more than one character blind. So
long as the programmer arranges that not more than
128 characters are sent before an IRQ has been
received, and carefully counts the number of
characters reported received and those reported
printed in the IRQ handler code and balances that
against the characters sent so that no more than
128 are outstanding at any time, everything should
work fine. But it's quite a delicate piece of
programming. I would be the conservative
programmer! There's a real danger that we might be
sending too fast for the screen to notice because
of the greatly different clock rates between the
two pieces of equipment, and what do we do if we
see from the totals that some have been missed?
Which ones are they?
The totally incautious programmer will send
characters without attempting to control the flow
at all according to the data received during IRQs,
and in consequence many characters will arrive at
a filled buffer and be discarded by the screen
before ever being printed, resulting in documents
which look lk thi ne.
This last option is effectively what we have
programmed for you as a base standard for the
handler. Tough luck. See below.
Screen internals
Here's the implementation of the screen's print
method. It's not where the difficulty lies. It
just fills in to the buffer as best it is able:
int available() {
return buffer.length - count; // amount of room in buffer
}
int print(char c) {
if (buffer.length <= count) {
discard++;
return 0;
}
buffer[(front + count++) % buffer.length] = (byte)c;
return 1; // return number of characters accepted
}
The available method returns how many characters
may be accepted in the immediate future for
printing without mishap. These are exactly the
same functionalities as provided by Java's
System.out.print and System.out.available methods.
The tricky code is the screen's private output
method. This method runs continuously in parallel
in the screen's own independent Java thread in
this simulation, and reflects what happens inside
the chunk of machinery that is a console on your
desk.
If the screen thread has some characters in the
internal buffer ("count > 0") it prints the
character at the front of the buffer to the
display. That's visible at the start of the
routine:
private void output() {
if (count > 0) { // if there are chars in the buffer ...
System.out.print (
(char) buffer[front++ % buffer.length]); // print the front char to media
count--; // one less char in buffer
pend++; // one more unsignalled char printed
}
if (pend > 0 || discard > 0) { // if we have something to report
tot++; // we signal IRQ and wait for IACK
cpu.raiseIRQ();
// IRQ IACK
cpu.lowerIRQ(); // we lower IRQ and wait for !IACK
// !IRQ !IACK
pend = 0; // we reset pend, discard counts, etc.
discard = 0;
tot--;
}
}
public int signalled() { // handler uses after IRQ signalled ...
return tot; // return # issued IRQs still outstanding
}
public int printed() { // handler uses after IRQ signalled ...
return pend; // return # unsignalled characters printed
}
public int dropped() { // handler uses after IRQ signalled ...
return discard; // return # unsignalled chars dropped
}
If there are now (or were already) characters
printed to the display and unsignalled to the CPU
via an IRQ ("pend > 0"), then the routine tries
to issue an IRQ. Ditto if a character has been
dropped ("discard > 0"). Something interesting
has happened and the CPU needs to be told.
The screen may now be held up trying to raise IRQ
until no other peripheral has control of the IRQ
line. When eventually the screen gets control of
the line and does raise IRQ, then it waits for the
CPU to acknowledge by raising IACK. Then it lowers
IRQ since it is sure the CPU hs seen it. It waits
around until the CPU has lowered IACK to reset
pend, discard, etc., because the IRQ handler is
notionally reading them until it finishes which is
signalled by the CPU lowering IACK.
What happens in summary is
- screen's print cycle prints a character from
buffer to display
- screen's print cycle waits until it is free
to raise IRQ with CPU, and then raises it
- screen's print cycle waits for a while
because IRQ is now up and IACK is not yet up
- CPU notices IRQ and sets IACK
- screen's print cycle notices IACK and lowers
IRQ
- CPU runs IRQ handler which reads pend, etc.
from screen
- screen's print cycle waits while IACK is
still set
- CPU IRQ handler finishes and lowers IACK,
releasing screen to continue print cycle at 1.
- code running in CPU notices data left by
handler saying one character has been printed
- code in CPU writes another character to
screen buffer via memory-mapped print method and
cycle starts again
So each character sent via the CPU to the screen
like this generates its own IRQ. The screen data
is quiescent while the CPU checks it after getting
the IRQ.
There are things wrong with this screen design.
For one thing, the screen data is not merely
quiescent but the screen itself is totally
paralyzed while the CPU's handler runs! I'll ask
you to help develop better designs or to
strengthen this design so that it always works
well, whatever the context. It could continue
printing while being interrogated by the CPU (it
needs to snapshot its state and present it to the
CPU as a stable historical image while being
interrogated and the real state can continue
developing meanwhile).
Running the screen
The screen works. You can test it out like this:
% java CPU/Cpu5 -q hello_mips32 handler_mips32
Hello world
|
Running without the "-q" flag reveals the
episodes in which the (simplistic, do-nothing)
handler code in handler_mips32 (see below) is
being executed:
...
247: 0.000001189s: 0x80030000: lui $v1, -20480
248: 0.000001193s: 0x80030004: sb $a0, 0($v1) # write to screen data port
# \n is printed to display
249: 0.000001209s: 0x00000004: sll $zero, $zero, 0 # handler for IRQ received
250: 0.000001213s: 0x00000008: sll $zero, $zero, 0
251: 0.000001214s: 0x0000000c: sll $zero, $zero, 0
252: 0.000001217s: 0x00000010: rfe # return from handler
253: 0.000001232s: 0x80030008: jr $ra # return from printchar routine
...
There was a large hiatus of about 16 clock cycles
as the pipeline was flushed and before the first
handler code instruction at address 0x4 completed.
5 clock cycles of that would be the time taken for
the fetch, decode, etc. of the instruction, but
there is still a large pause to be explained! I
expect it is the program cache thrashing. The
address ranges of the program and the interrupt
handler overlap modulo 32768, which is the size of
the program cache. Indeed, the final lines of
debug output from "-d" show considerably degraded
program cache performance:
prog cache read hits 326/498, write hits 0/0
|
The figures for CPU4 and earlier model CPUs show
320/370 hits. Now only 6 more instructions are
being read successfully first time from cache and
128 more instructions are having to be fished for
in memory because they are not in the cache at the
time they are needed. We have only introduced 4
new instructions by way of the handler code, 3 of
which are nops. Undoubtedly the cache is
thrashing. Try moving the program's virtual
address placement!
The rfe causes another pipeline flush, and this
time 17 clock cycles are lost before the first
"ordinary" code instruction completes. Some of
that is perhaps simulation fluff, but it can still
be recognized that IRQ handling causes significant
pipeline stalls to occur, and it's good from the
CPU's point of view that IRQs are relatively
infrequent, peripherals being so much slower than
the CPU as a rule.
Interrupt Handler code
For the tests we've prepared the interrupt
handler code that does almost nothing at all - it
contains just the single MIPS assembler
instruction 'rfe' ("return from exception") - and
the simulator loads it at the 0x4 memory address
where interrupt handler code is expected to be
found. The handler code is compiled via
% mips-gcc -DMIPS -mips1 -mabi=32 -c handler_mips32.s
% mips-ld -Ttext 0x4 -e handler -o handler_mips32 handler_mips32.o
|
and the handler_mips32.s file contains the
single-instruction assembler code for 'rfe':
.text
.align 2
.globl handler
.ent handler
.type handler, @function
handler:
.set nomips16
.frame $fp,8,$31 # vars= 0, regs= 1/0, args= 0, gp= 0
.mask 0x40000000,-4
.fmask 0x00000000,0
rfe
.end handler
.ident "hand-written interrrupt handler"
resulting in a handler_mips32 file which
disassembles as follows:
00000004 (_ftext):
4: 00000000 nop
8: 00000000 nop
c: 00000000 nop
00000010 (handler):
10: 42000010 rfe
There are three no-ops preceding the single rfe
instruction in the code that will eventually be
loaded at address 0x4.
More sophisticated IRQ handlers should do more,
such as checking first which peripheral caused the
IRQ being handled!
Improving the screen
There are two aspects to improving the screen:
- improving the handler;
- augmenting the screen hardware.
A better IRQ handler should check which
peripheral squawked via dedicated "hardware"
connections to all peripherals. That's one basis
for suggesting that the hardware needs improving.
According to the protocol followed, only one
peripheral can send an IRQ at a time and the CPU
will maintain IACK while the handler is running,
thus keeping other perpherals from sending IRQ, so
there is in principle no danger of confusion about
which peripheral has sent IRQ while the handler is
running. But how to find it out? The handler code
has to use only legitimate MIPS instructions. Its
interrogations of peripherals for "who squawked"
must be done via memory address accesses.
Conclusion: the CPU's memory unit has to be
reprogrammed to map more addresses to peripheral
control ports. In particular, the memory unit
needs to hook up at least one address to the
screen's signalled method and hence its tot
variable, which indicates when this screen was the
one which sent the IRQ that has not yet been
acknowledged.
The handler should interrogate the screen tot
variable via the control port. A positive return
signals "yes, it's me who signalled IRQ".
The handler should use another mapping to access
the screen's printed method and pend variable in
order to determine how many characters have been
printed since the last IRQ.
The ports for these mappings need to be set up.
As a simplification, a good suggestion is that
reading from a single control port should return a
32-bit number composed of 5 bits each from the
screen's tot, pend, count, etc. instance
variables.
A keyboard
The IRQ-driven model also contains a Keyboard
class. An IRQ-driven keyboard is inserted into the
console component.
Like the Screen component, the Java code
implements the Runnable interface so it can be
launched as a separate thread. We've really only
sketched out the code, and it should not be
considered complete as it is. You want to test and
perfect it.
Keyboard implements Runnable
Keyboard () |
constructor |
void run () |
main keyboard loop |
int read (byte[], int, int) |
read bytes into array at offset for number
requested, return number actually got |
int available () |
return number of chars hoarded in keyboard
buffer |
The keyboard object runs a continuous cycle
reading characters typed on the console into an
internal buffer. The passage of time in the system
clock causes the code to check the console for
more typed characters:
public void run () {
while (Clock.running()) {
Clock.class.wait(); // wait for a new clock tick
input();
}
}
The private input method reads 0 or 1 characters
at a time into the internal keyboard buffer. The
detail of the code is entirely comparable with the
printer output method, down to the instance
variables.
private void input() {
if (count < buffer.length && System.in.available() > 0) {
System.in.read (buffer, (front + count++) % buffer.length, 1);
pend++;
}
if (pend > 0) {
tot++;
cpu.raiseIRQ();
// IRQ IACK
cpu.lowerIRQ();
// !IRQ !IACK
tot--;
pend = 0;
}
}
public int available() {
return count;
}
public int signalled() {
return tot;
}
public int received() {
return pend;
}
The external functionality is provided by the
available and read methods, which work just like
System.in.available and System.in.read
respectively. The read method reads off the front
of the input buffer into an array supplied by the
programmer:
public int read(byte data[], int offset, int len) {
int n = 0;
while (len > 0 && available() > 0) {
data[offset + n++] = buffer[front++ % buffer.length];
count--;
len--;
pend++;
}
return n;
}
The console unit accesses the keyboard read
method for single-byte reads. The program code
need only read from the memory mapping for the
console keyboard data to receive a character from
the input buffer, or 0 if there was none. A prior
IRQ from the keyboard will have made available
precise information to the IRQ handler about how
many characters have been supplied and are
available for reading, how many have been dropped,
etc. It is the programmer's responsibility to
write handler code which maintains the proper
accounting. The IRQ handler will fill a program
buffer and the program code will later interrogate
that buffer.
But I've been very lax and supplied an IRQ
handler that does none of anything like that.
Still, so long as the keyboard cannot supply
characters with a code of 0 and you don't type too
fast, polling the keyboard data port works fine
with the keyboard and handler I've supplied as a
rough-and-ready mostly-works way of discovering
input characters! Please feel completely free to
experiment and tear down and replace any of my
ramshackle construction. You'll find much more
sophisticated peripheral designs than mine on the
Web and in your course books. I particularly think
that the first part of the keyboard's input method
code, reading from System.input, should be in a
separate thread so it can't be blocked waiting on
the CPU's acknowledgment via IACK. Not that any
human is likely to be able to type faster than a
CPU runs, but still ...
Exercises with the
IRQ-enabled model
Here are some suggestions for getting to know the
IRQ-enabled CPU5 processor model.
- Comment the IRQ-enabled bits of the
IRQ-enabled java emulator CPU5 java class source
code.
- Transfer the IRQ facility (rfe, mfc0, mft0
commands and IRQ coprocessor registers) to the
simpler java simulator source codes.
- Augment the screen interface by mapping the
screen signalled and printed methods to a single
control port address available through the
memory unit. Change the handler to check the
control port if it has been signalled by the
screen and sum the printed result into a fixed
place in memory.
- Change the CPU and the screen/keyboard to use
one unique interrupt line for each peripheral,
replacing the CPUs single IRQ and IACK booleans
with two vectors of 16 booleans each. That makes
it unnecessary for the handler to spend time
figuring out which peripheral caused the
interrupt.
- Get the keyboard code working well.
- Google for "MIPS syscall" and figure out how
to get the very roughly sketched-in syscall
functionality working in the CPU.
|