jMIPS

an open source MIPS processor in Java


Prev Top

Contents

Adding an IRQ handler to the caching, pipelined model

We've added an extra final stage to the pipeline in which a check is made for interrupts and other kinds of exceptions. That's the CPU5 model.

The design rationale is explained in detail below, but for the moment just note that an interrupt causes existing instructions to be flushed from the pipeline and a jump to a handler at address 0x4 to occur instead.

The Cpu5 Java code evokes this model. The interface is syntactically the same as for all the other CPUs, but there are a few semantic differences which will be enumerated below:

Cpu5
static void main (String args[]) entry point; handles command line options

We have created some peripherals which will run simultaneously in a different Java thread, talking across to the CPU in an unpredictable manner (from the CPU's point of view!).

The peripherals we've prepared are a screen and a keyboard unit. They engage in IRQ-mediated communications with the CPU. I've embedded one screen and one keyboard together in a new console object instantiating a new Console5 class. The main code in Cpu5 now takes care to embed one of the new-style consoles instead of an old-style console in the memory unit's address intercept table:

        cpu.memory.console = console = new Console5(cpu);

Separate threads of computation corresponding to the screen and keyboard are launched:

        
        Thread sh = new Thread (console.screen);
        sh.start();
        Thread kh = new Thread (console.keyboard);
        kh.start();

and from then on its up to the correctness of the simulation of the IRQ-driven I/O to keep things running smoothly.

What else is new?

The IRQ-enabled emulator's Java code is to be found in the CPU5 class, which runs the basic fetch decode execute pipeline. At the interface level, there is no change:

CPU5
CPU5 () CPU builder
void run () fetch/decode/execute cycle

There are internal changes, however.

In particular, interrupts are initiated by a peripheral setting a new IRQ boolean in the CPU. There's a precise protocol involved which I'll describe in further detail below. And when an interrupt handler code finishes it will run the new MIPS 'rfe' instruction ("return from exception"), and that will finish off the CPU's interrupt acknowledgment part in the protocol. So there is a new instruction to be handled.

In summary, the IRQ-enabled simulator's Java code differs in the following ways from its predecessors:

  • It implements three new MIPS instructions in total: 'rfe' (return from exception); 'mfc0' (move from coprocessor 0, which is an extra hardware unit in the CPU intended to help deal with interrupts and other exceptions); 'mtc0' (move to coprocessor 0).
  • There are three new ('IRQ coprocessor') registers, STATUS, CAUSE and EPC, involved with these instructions.
  • The register unit has been extended to deal with the extra registers and the Decode stage has been taught to deal with the extra instruction formats.
    • The STATUS register is used for turning the servicing of interrupts on and off. When bit 1 is clear, interrupts are ignored by the processor.
    • The CAUSE register is set by the CPU to indicate the reason for which an exception handler is being run.
    • The EPC register is where the CPU saves a copy of the current value of the PC register while a handler is being run.

STATUS, CAUSE, EPC are registers $12, $13, $14 respectively in the interrupt coprocessor (0) register index space. You can use the abbreviations $status, $cause, $epc, respectively.

The mfc0 and mtc0 instructions respectively read from and write to the new STATUS, CAUSE and EPC registers (to/from the standard registers $0-$31), and that's how the MIPS programmer checks and changes them.

For example, "mfc0 $1, $status" reads from the STATUS register to general register $1. "mtc0 $status, $1" moves data the other way.

  • There are two new CPU boolean instance variables, IRQ and IACK, which serve to mediate the IRQ protocol.
    • The IRQ boolean is set by a peripheral wishing to alert the CPU.
    • The IACK boolean is set by the CPU to indicate that it has seen the IRQ.

Any code dealing with the IRQ or IACK booleans will be found inside a Java synchronized block, in order to make sure that only one thread at a time attempts to access these variables, which are shared between the threads. The synchronization is effected through the Clock class.

  • There are two new CPU methods which respectively lower and raise IRQ. They are meant to be used by peripheral I/O devices to signal to the CPU:
    public void lowerIRQ() {
        synchronized (Clock.class) {
            while (!IRQ || !IACK)
                wait();
            
            // IRQ IACK
            IRQ = false;
            notifyAll();

            // !IRQ IACK
            while (IACK)
                wait();
            // !IRQ !IACK
        }
    }
    public void raiseIRQ() {
        synchronized (Clock.class) {
            while (IRQ || IACK)
                wait();
            
            // !IRQ !IACK
            IRQ = true;
            notifyAll();

            // IRQ !IACK
            while (!IACK)
                wait();
        }
    }

Peripheral devices using these methods will be held up until "the coast is clear". The IRQ boolean cannot be set by a peripheral until it has first been unset, for example. A peripheral device wanting to raise IRQ via the raiseIRQ method will be forced to wait if IRQ is already set until both it and IACK become unset first. Then there is a further wait until the CPU sets IACK, indicating that it has seen IRQ and is running or going to run the handler. This semantics corresponds to the way the hardware built into I/O peripherals works.

  • When an I/O interrupt occurs (i.e. IRQ is set by a peripheral) and the CPU starts to handle it, the CAUSE register is immediately set to the value 0. Other settings indicate other kinds of exception, such as floating point overflows, that the CPU deals with in similar but slightly different ways. The code at the end of the pipeline that sets the CAUSE register looks like this:
    int status = STATUS.read();

    if ((status & 0x2) != 0                    // irqs are not masked when bit 1 is set
    &&  IRQ                                    // peripheral raised irq
    && !IACK                                   // we/handler haven't dealt with it yet
    ) {
                                               // set cause register to value 0
        CAUSE.write(0);
    }

Note the guard which checks bit 1 (using bitmask 0x2 which is ...0010 in binary and has bit 1 set and all other bits unset) of the STATUS register is set before starting to handle the IRQ. When bit 1 of the STATUS register is not set, we say that interrupts are masked. They will not be serviced.

And note that it is also required that the CPU's IACK flag not be set for the handler to be started. If it were set it would indicate that the CPU were already running the handler for an IRQ, and one does not want to interrupt an interrupt (it can be and is safely done, using interrupt priorities, in other architectures).

  • While handling the interrupt, the CPU turns off further interrupts by unsetting bit 1 of the STATUS register for the duration of the handler call. It saves the current status flags (the bottom 16 bits of the STATUS register) for later by shifting them out of the way further up the register:
  STATUS.write(status << 16);                // save status mask and zero current mask


We're not really cheating by implementing the shift in Java instead of via the ALU or another "hardware" object, because shifts are often implemented in hardware just by connecting the right wires together and waiting a cycle. Still, I haven't been very exact here. There is an extra cycle required to execute all these movements of data on receiving the interrupt, and we really need to modify the accounting to show it. The pipeline will be flushed, however, as discussed below, and that will introduce delays that are measured by the simulation and that are even longer than the unaccounted one cycle we are quibbling about now. And interrupts are relatively rare per clock on a 1GHz CPU! So whatever mistake we have made it's going to make a difference only on the order of one in a million in the accounting, "long" term. Nevertheless, kudos will go to somebody who studies the time accounting around the treatment of interrupts in the simulator, and who makes it better if it needs to be.

  • The CPU also copies the PC value to the new EPC register for safekeeping. Then it will jump to a predefined address (0x4 in this emulation), where some interrupt handler code will have been placed at CPU boot time. The jump is prepared by loading the PC with the destination address and flushing the pipeline:
  PC.write(0x4);                                     // prepare jump to Ox4
  conf0 = null;                                      // flush pipeline
  conf1 = null;
  conf2 = null;
  conf3 = null;

The next cycle the CPU will naturally fetch the first word of the handler code into the pipeline and execution of the handler will commence.

However, it is not that simple to copy the PC value for safekeeping because the PC may have been pre-incremented for Fetch several times after the "current" instruction was started. More details of how it is done will be found below. It's quite a little saga to figure out what it should be and you may like to consider if one perhaps needs some dedicated extra decode hardware to do it. I don't think so, but I may be mistaken in my by-eyeball evaluation.

When the interrupt handler code finishes, it executes rfe, which restores PC from EPC and shifts the STATUS register down again, thereby restoring its original state. The rfe instruction is pipelined as follows:

  1. Fetch ...
  2. Decode ...
  3. Read EPC and STATUS registers
  4. Execute ALU op to shift status value right 16 bits
  5. Write EPC value into PC and shifted status value into STATUS register, drop IACK if IRQ has already been dropped or else set a flag that indicates the CPU must drop IACK when IRQ is eventually dropped

and rfe needs no other handling in the pipeline beyond a flush that must follow on its writing the PC - the prefetched instructions trailing it in the pipeline are likely just random nonsense tagging on beyond the end of the handler code in memory, and they need to be purged:

  if (...                                  // Write stage termination code
  || conf3.op == J
     ...
  || conf3.op == RFE                       // flush pipeline after rfe reaches Write too!
  ) {
      conf0 = null;
      conf1 = null;
      conf2 = null;
  }

In case rfe had to set a flag to drop IACK later rather than being able to drop it at once (because IRQ is still set high by the peripheral when the handler finishes - the CPU is generally much, much faster than any peripheral), there is an extra check just at the end of each and every cycle. It checks to see if IACK should be dropped now because IRQ has finally dropped:

  if (!IRQ && pleaseLowerIACK) {              // carry out pending drop of IACK
      if (IACK) {
          IACK = false;
          notifyAll();                        // tell interested threads IACK has changed
      }
      pleaseLowerIACK = false;
  }

Until both IRQ and IACK have dropped no new IRQ will be issued by any peripheral.

  • The CPU only checks for interrupts at all and - perhaps - performs the actions detailed above immediately after some instruction has finished Write.

As was commented at the start of this section, that is accomplished by introducing an extra, final, pipeline stage called Irq. The stage checks for IRQ having been raised by some peripheral, performing the actions described in the paragraphs above. Entry of an instruction into the Irq processing final stage is guarded by a check of the IRQ value to see if it is even plausible that an IRQ might need processing now, otherwise the stage is skipped:

        if ((STATUS.read() & 0x2) == 0
        ||  !IRQ
        ||  IACK
        )
            return;

Why only check for IRQs after some instruction has exited the Write stage? And what does the CPU need to do that is based on the instruction rather than being completely generic? Why look at the instruction at all?

Firstly, the CPU cannot attend to an interrupt while the pipeline contains only partially completed instructions.

If we were to try that we'd find that we really wouldn't know what program address to return to with the rfe instruction, because it's not yet certain which if any of the jumps or branches in the pipeline at the time of the interrupt will be executed or not.

Worse, the value of the PC when the interrupt occurs is that corresponding to the instruction being pre-fetched into the pipeline, which is not the same as the next instruction that has yet to complete.

It's just not going to work.

You may wish to do a cleverer analysis than I, but I've settled for not handling an IRQ at all until some instruction has just completed for sure, so we know what the next instruction is going to be. And what is it going to be? It depends on the instruction just completed! That instruction needs to be examined, and that's why there is a final Irq stage. It is precisely to look at the just-completed instruction and set up the value of the PC to be saved according to what it is.

If the just-completed instruction is a jump (or rfe itself), then the next instruction has to be from the value of PC just set by the jump (or rfe) in the Write stage.

Likewise if the completed instruction is a branch that succeeded.

In all other cases the next instruction should be the instruction that comes 4 bytes after the address of the one that just completed.

Here's the code that sets the EPC in the Irq stage. It performs the instruction-based analysis detailed just above:

    if (
       (conf4.op == J
    ||  conf4.op == JAL
    || (conf4.op == 0 && (conf4.func == ALU_JALR || conf4.func == ALU_JR))
    ||  conf4.op == RFE
    ||  (s.z != 0 &&
           (conf4.op == BEQZ
         || conf4.op == BEQ
         || conf4.op == BNEZ
         || conf4.op == BNE
         || conf4.op == BLTZ
         || conf4.op == BGEZ
         || conf4.op == BLE
         || conf4.op == BGT
           )
        )
       )
    ) {                            // for jump or successful branch ...
        EPC.write(PC.read());      //   the PC just set in Write is what to come back to
    } else {                       // in all other cases ...
        EPC.write(conf4.pc);       //   want to come back to this instr's PC+4 
    }
  • The CPU also sets the IACK boolean early in the Irq stage to indicate to peripherals that it has now seen the IRQ flag and is definitely going to be executing the handler. Here's the code:
        if (IRQ && !IACK) {
            IACK = true;
            notifyAll();
        }

The notifyAll is the mechanism used to tell all interested peripherals running in other threads (in the simulation!) that the IACK flag has just changed. Look in the Java lang API documntation to see how it and wait work.

  • The peripheral that raised IRQ may afterwards choose to lower IRQ whenever it likes. But it won't do so until it has seen IACK.
  • The CPU will lower IACK again when the hander finishes, in the execution of the rfe instruction, or later still if IRQ has still not yet been lowered by the peripheral.
  • When both IRQ and IACK have been lowered, the same peripheral or another may raise another IRQ.

With the pipeline flushed, the PC set to the handler address, EPC containing the return address, STATUS shifted up 16 bits to blank the current mask, IACK set, the next CPU cycle will start with the fetch of the first word of the handler code into the now empty pipeline.

It's hard to see what one could do to assuage the pain of the pipeline flush, because any instruction may be followed by an IRQ handler sequence without warning, so one can't prefetch the forthcoming handler sequence into the pipeline at the Fetch stage. Can you find any ideas out there on the Web? All that occurs to me is to let the pipeline drain naturally before starting the handler. Or start the handler at the next jump or branch instruction, since the pipeline would have been flushed there too.

  • Note that the sequence of states CPU and peripheral jointly pass through is always
  !IRQ !IACK;    IRQ !IACK;    IRQ IACK;    !IRQ IACK;    !IRQ !ACK

The second transition marks the handler start. The handler finishes on the final transition. The CPU will not start handling another interrupt while the handler is running. It will not handle another interrupt until the sequence of states comes back to the start again.

Putting it another way, the cycle of events is always:

  1. peripheral sets IRQ
  2. CPU sets IACK and starts handler
  3. peripheral unsets IRQ
  4. CPU finishes handler and unsets IACK

The cycle starts from the situation in which IRQ and IACK are unset, and terminates with them unset again. Note that the peripheral controls the IRQ setting and the CPU controls the IACK setting.

The peripheral device must not deassert IRQ before it has seen the processor raise IACK (or else the interrupt may be missed by the CPU).

The CPU must not drop IACK before it has seen the peripheral drop IRQ, even if the handler has finished (or else the acknowledgment may be missed by the peripheral).

The peripheral must not raise IRQ again until it has seen the processor drop IACK (or else the CPU may see it as continuing to assert the last interrupt and thus miss the new one).

Peripherals

What do peripherals do and how do they work with the IRQ model protocol introduced above?

Peripherals are like small dumb CPUs in at least one way: they run a continuous cycle, like a CPU. But it's one in which they "do their own thing" and also occasionally try and tell the CPU about it by raising an IRQ. The IRQ means "you (the CPU) can learn something by looking here now".

A screen

Consider first a screen I/O peripheral. We've written it as a Screen class which implements the Runnable interface so it can be launched as a separate thread in Java. That means that it has a "main" routine called run:

Screen implements Runnable
Screen ( ) Constructor
void run ( ) runs the screen action loop
int print (char) sends a character to the screen buffer for later printing, returns number accepted
int available () returns number of characters it is still possible to write to screen buffer
int signalled () returns number of IRQs signalled to but not yet acknowledged by CPU
int printed () returns number of characters printed since last IRQ
int dropped () returns number of characters dropped since last IRQ

The run method contains the screen's working cycle. It's a loop that only stops when the CPU stops the clock. Until then it does nothing but wait on an event from the CPU clock, likely a clock tick forward, but possibly also a notification from the CPU on a change of value in the CPU's IACK variable or its STATUS register. When the notification reaches it, it runs its private output method, which may move either 0 or 1 characters out of its internal buffer and onto the screen's physical display area:

 public void run () {
 
     while (Clock.running()) {
         Clock.class.wait();     // wait for a clock tick
         output();               // perform the screen's next action - 0 or more chars printed
     }
 }

Characters are accumulated in the screen's internal buffer as code executing in the CPU writes to a specific memory address PUTCHAR_ADDRESS and that eventually translates to a call to the screen's print method via the CPU's iobus.

The screen's internal (circular) buffer is some 128 bytes in length. It is implemented via an index front that points to the front of the buffer, which is the first character due to be printed next, and an integer count that says how many characters there are in the buffer beyond front waiting to be printed.

Thus the last character in the buffer is the (front + count - 1) % 128 th, counting from 0. The first character in the buffer is the front % 128 th. That modular arithmetic calculation for the index position is what circularity means! The 128th character is also the 0th character in the buffer. Here's a picture:

Image:Circbuf.jpg

If one kept adding to the end of the buffer without cease (and the buffer were to accept that) one would eventually overrun the front of the buffer again, like a snake catching up with its own tail.

Characters are taken from the front of the buffer by the screen hardware and sent to the physical display. Each character printed increments front and decrements count, internally.

Any character sent to the screen for printing gets added at the end of the buffer, in the (front + count) % 128 th position. Each character added to the buffer increments count.

The screen object also contains an integer pend which counts the number of characters it has printed from its buffer but has not yet told the CPU that it has printed, via IRQ. Every time a character is printed to the physical display, pend is incremented. Every time an IRQ is raised, pend is set to zero again. In all usual circumstances, the value will be 0 or 1 for pend.

There is also an integer count tot for the number of IRQs the screen has sent out but not yet received an IACK in return for. Each time the screen sets IRQ it increments tot. Each time the screen receives IACK, it decrements tot. Again, in all usual circumstances the value of tot will be 0 or 1.

And finally there is a count of the number of characters discarded since the last IRQ because the buffer was already full when they were added. It is the instance variable discard.

You might like to consider a slightly different version of the screen printer, one which never suffers from the full buffer problem (incrementing discard every time that happens), but which instead simply silently overwrites the oldest unprinted character still in the buffer when it has to. I'm not sure how relatively pleasant or unpleasant that would be for someone programming for the revised screen hardware. See below.

Programming for the screen

The plan is that the console object to which the memory unit redirects those single byte writes which are addressed to PUTCHAR_ADDRESS should call the screen's print method. This corresponds to a physically wired connection in the hardware. The screen's print method will place the character on its buffer, if there is room, incrementing count. If there is no room in the buffer, it will discard it, incrementing discard.

Some time afterwards, the screen may issue an IRQ. The IRQ usually will signify "I have printed it", but detail can be obtained by interrogating the screen from the interrupt handler. Blow-by-blow data on what has happened is available for inspection via the control port(s). If the character was printed, pend will have been incremented (and count decremented again back down to its previous level). If the screen is still waiting to print it, count will have been incremented and pend not. If the character did not even make it into the buffer, the discard count alone will have been incremented.

The wise and conservative programmer should plan on writing program code which sends a single character to the console, then waits (possibly doing something else useful if the programmer is kind to the sensibilities of the person sitting at the keyboard) for an IRQ in response. The handler should then check with the console control port(s) to see that the character has been printed. Then the programmer should arrange that the code sends another character. If not printed, the wait should be continued or the character should be resent according to circumstances. And so on.

A more foolhardy and adventurous programmer may choose to send more than one character blind. So long as the programmer arranges that not more than 128 characters are sent before an IRQ has been received, and carefully counts the number of characters reported received and those reported printed in the IRQ handler code and balances that against the characters sent so that no more than 128 are outstanding at any time, everything should work fine. But it's quite a delicate piece of programming. I would be the conservative programmer! There's a real danger that we might be sending too fast for the screen to notice because of the greatly different clock rates between the two pieces of equipment, and what do we do if we see from the totals that some have been missed? Which ones are they?

The totally incautious programmer will send characters without attempting to control the flow at all according to the data received during IRQs, and in consequence many characters will arrive at a filled buffer and be discarded by the screen before ever being printed, resulting in documents which look lk thi ne.

This last option is effectively what we have programmed for you as a base standard for the handler. Tough luck. See below.

Screen internals

Here's the implementation of the screen's print method. It's not where the difficulty lies. It just fills in to the buffer as best it is able:

 int available() {

     return buffer.length - count;                        // amount of room in buffer
 }
 int print(char c) {

     if (buffer.length <= count) {
         discard++;
         return 0;
     }
     buffer[(front + count++) % buffer.length] = (byte)c;
     return 1;                                           // return number of characters accepted
 }


The available method returns how many characters may be accepted in the immediate future for printing without mishap. These are exactly the same functionalities as provided by Java's System.out.print and System.out.available methods.

The tricky code is the screen's private output method. This method runs continuously in parallel in the screen's own independent Java thread in this simulation, and reflects what happens inside the chunk of machinery that is a console on your desk.

If the screen thread has some characters in the internal buffer ("count > 0") it prints the character at the front of the buffer to the display. That's visible at the start of the routine:

    private void output() {

        if (count > 0) {                                     // if there are chars in the buffer ...
            System.out.print (
                (char) buffer[front++ % buffer.length]);     //   print the front char to media
            count--;                                         //   one less char in buffer
            pend++;                                          //   one more unsignalled char printed
        }
        
        if (pend > 0 || discard > 0)  {                      // if we have something to report
            tot++;                                           //   we signal IRQ and wait for IACK
            cpu.raiseIRQ();
            // IRQ IACK
            cpu.lowerIRQ();                                  //   we lower IRQ and wait for !IACK
            // !IRQ !IACK
            pend = 0;                                        //   we reset pend, discard counts, etc.
            discard = 0;
            tot--;
        }
    }

    public int signalled() {                                 // handler uses after IRQ signalled ... 

        return tot;                                          // return # issued IRQs still outstanding
    }
    public int printed() {                                   // handler uses after IRQ signalled ...

        return pend;                                         // return # unsignalled characters printed
    }
    public int dropped() {                                   // handler uses after IRQ signalled ...

        return discard;                                      // return # unsignalled chars dropped
    }

If there are now (or were already) characters printed to the display and unsignalled to the CPU via an IRQ ("pend > 0"), then the routine tries to issue an IRQ. Ditto if a character has been dropped ("discard > 0"). Something interesting has happened and the CPU needs to be told.

The screen may now be held up trying to raise IRQ until no other peripheral has control of the IRQ line. When eventually the screen gets control of the line and does raise IRQ, then it waits for the CPU to acknowledge by raising IACK. Then it lowers IRQ since it is sure the CPU hs seen it. It waits around until the CPU has lowered IACK to reset pend, discard, etc., because the IRQ handler is notionally reading them until it finishes which is signalled by the CPU lowering IACK.

What happens in summary is

  • screen's print cycle prints a character from buffer to display
  • screen's print cycle waits until it is free to raise IRQ with CPU, and then raises it
  • screen's print cycle waits for a while because IRQ is now up and IACK is not yet up
  • CPU notices IRQ and sets IACK
  • screen's print cycle notices IACK and lowers IRQ
  • CPU runs IRQ handler which reads pend, etc. from screen
  • screen's print cycle waits while IACK is still set
  • CPU IRQ handler finishes and lowers IACK, releasing screen to continue print cycle at 1.
  • code running in CPU notices data left by handler saying one character has been printed
  • code in CPU writes another character to screen buffer via memory-mapped print method and cycle starts again

So each character sent via the CPU to the screen like this generates its own IRQ. The screen data is quiescent while the CPU checks it after getting the IRQ.

There are things wrong with this screen design. For one thing, the screen data is not merely quiescent but the screen itself is totally paralyzed while the CPU's handler runs! I'll ask you to help develop better designs or to strengthen this design so that it always works well, whatever the context. It could continue printing while being interrogated by the CPU (it needs to snapshot its state and present it to the CPU as a stable historical image while being interrogated and the real state can continue developing meanwhile).

Running the screen

The screen works. You can test it out like this:

% java CPU/Cpu5 -q hello_mips32 handler_mips32
Hello world

Running without the "-q" flag reveals the episodes in which the (simplistic, do-nothing) handler code in handler_mips32 (see below) is being executed:

...
247:    0.000001189s:   0x80030000:     lui $v1, -20480
248:    0.000001193s:   0x80030004:     sb $a0, 0($v1)            # write to screen data port
                                                                  # \n is printed to display
249:    0.000001209s:   0x00000004:     sll $zero, $zero, 0       # handler for IRQ received
250:    0.000001213s:   0x00000008:     sll $zero, $zero, 0
251:    0.000001214s:   0x0000000c:     sll $zero, $zero, 0
252:    0.000001217s:   0x00000010:     rfe                       # return from handler
253:    0.000001232s:   0x80030008:     jr $ra                    # return from printchar routine
...

There was a large hiatus of about 16 clock cycles as the pipeline was flushed and before the first handler code instruction at address 0x4 completed. 5 clock cycles of that would be the time taken for the fetch, decode, etc. of the instruction, but there is still a large pause to be explained! I expect it is the program cache thrashing. The address ranges of the program and the interrupt handler overlap modulo 32768, which is the size of the program cache. Indeed, the final lines of debug output from "-d" show considerably degraded program cache performance:

prog cache read hits 326/498, write hits 0/0

The figures for CPU4 and earlier model CPUs show 320/370 hits. Now only 6 more instructions are being read successfully first time from cache and 128 more instructions are having to be fished for in memory because they are not in the cache at the time they are needed. We have only introduced 4 new instructions by way of the handler code, 3 of which are nops. Undoubtedly the cache is thrashing. Try moving the program's virtual address placement!

The rfe causes another pipeline flush, and this time 17 clock cycles are lost before the first "ordinary" code instruction completes. Some of that is perhaps simulation fluff, but it can still be recognized that IRQ handling causes significant pipeline stalls to occur, and it's good from the CPU's point of view that IRQs are relatively infrequent, peripherals being so much slower than the CPU as a rule.

Interrupt Handler code

For the tests we've prepared the interrupt handler code that does almost nothing at all - it contains just the single MIPS assembler instruction 'rfe' ("return from exception") - and the simulator loads it at the 0x4 memory address where interrupt handler code is expected to be found. The handler code is compiled via

%  mips-gcc -DMIPS -mips1 -mabi=32 -c handler_mips32.s
%  mips-ld -Ttext 0x4 -e handler -o handler_mips32 handler_mips32.o

and the handler_mips32.s file contains the single-instruction assembler code for 'rfe':

        .text
        .align  2

        .globl  handler
        .ent    handler
        .type   handler, @function

handler:
        .set    nomips16
        .frame  $fp,8,$31               # vars= 0, regs= 1/0, args= 0, gp= 0
        .mask   0x40000000,-4
        .fmask  0x00000000,0

        rfe

        .end    handler

        .ident  "hand-written interrrupt handler"

resulting in a handler_mips32 file which disassembles as follows:

00000004 (_ftext):
   4:   00000000        nop
   8:   00000000        nop
   c:   00000000        nop

00000010 (handler):
  10:   42000010        rfe

There are three no-ops preceding the single rfe instruction in the code that will eventually be loaded at address 0x4.

More sophisticated IRQ handlers should do more, such as checking first which peripheral caused the IRQ being handled!

Improving the screen

There are two aspects to improving the screen:

  • improving the handler;
  • augmenting the screen hardware.

A better IRQ handler should check which peripheral squawked via dedicated "hardware" connections to all peripherals. That's one basis for suggesting that the hardware needs improving.

According to the protocol followed, only one peripheral can send an IRQ at a time and the CPU will maintain IACK while the handler is running, thus keeping other perpherals from sending IRQ, so there is in principle no danger of confusion about which peripheral has sent IRQ while the handler is running. But how to find it out? The handler code has to use only legitimate MIPS instructions. Its interrogations of peripherals for "who squawked" must be done via memory address accesses.

Conclusion: the CPU's memory unit has to be reprogrammed to map more addresses to peripheral control ports. In particular, the memory unit needs to hook up at least one address to the screen's signalled method and hence its tot variable, which indicates when this screen was the one which sent the IRQ that has not yet been acknowledged.

The handler should interrogate the screen tot variable via the control port. A positive return signals "yes, it's me who signalled IRQ".

The handler should use another mapping to access the screen's printed method and pend variable in order to determine how many characters have been printed since the last IRQ.

The ports for these mappings need to be set up.

As a simplification, a good suggestion is that reading from a single control port should return a 32-bit number composed of 5 bits each from the screen's tot, pend, count, etc. instance variables.


A keyboard

The IRQ-driven model also contains a Keyboard class. An IRQ-driven keyboard is inserted into the console component.

Like the Screen component, the Java code implements the Runnable interface so it can be launched as a separate thread. We've really only sketched out the code, and it should not be considered complete as it is. You want to test and perfect it.

Keyboard implements Runnable
Keyboard () constructor
void run () main keyboard loop
int read (byte[], int, int) read bytes into array at offset for number requested, return number actually got
int available () return number of chars hoarded in keyboard buffer

The keyboard object runs a continuous cycle reading characters typed on the console into an internal buffer. The passage of time in the system clock causes the code to check the console for more typed characters:

    public void run () {
    
        while (Clock.running()) {
            Clock.class.wait();             // wait for a new clock tick
            input();
        }       
    }               

The private input method reads 0 or 1 characters at a time into the internal keyboard buffer. The detail of the code is entirely comparable with the printer output method, down to the instance variables.

    private void input() {

        if (count < buffer.length && System.in.available() > 0) {
            System.in.read (buffer, (front + count++) % buffer.length, 1);
            pend++;
        }

        if (pend > 0) {
            
            tot++;

            cpu.raiseIRQ();
            // IRQ IACK

            cpu.lowerIRQ();
            // !IRQ !IACK

            tot--;
            pend = 0;
        }
    }
 
    public int available() {

        return count;
    }
    public int signalled() {

        return tot;
    }
    public int received() {

        return pend;
    }

The external functionality is provided by the available and read methods, which work just like System.in.available and System.in.read respectively. The read method reads off the front of the input buffer into an array supplied by the programmer:

    public int read(byte data[], int offset, int len) {

        int n = 0;

        while (len > 0 && available() > 0) {
            data[offset + n++] = buffer[front++ % buffer.length];
            count--;
            len--;
            pend++;
        }
        return n;
    }

The console unit accesses the keyboard read method for single-byte reads. The program code need only read from the memory mapping for the console keyboard data to receive a character from the input buffer, or 0 if there was none. A prior IRQ from the keyboard will have made available precise information to the IRQ handler about how many characters have been supplied and are available for reading, how many have been dropped, etc. It is the programmer's responsibility to write handler code which maintains the proper accounting. The IRQ handler will fill a program buffer and the program code will later interrogate that buffer.

But I've been very lax and supplied an IRQ handler that does none of anything like that. Still, so long as the keyboard cannot supply characters with a code of 0 and you don't type too fast, polling the keyboard data port works fine with the keyboard and handler I've supplied as a rough-and-ready mostly-works way of discovering input characters! Please feel completely free to experiment and tear down and replace any of my ramshackle construction. You'll find much more sophisticated peripheral designs than mine on the Web and in your course books. I particularly think that the first part of the keyboard's input method code, reading from System.input, should be in a separate thread so it can't be blocked waiting on the CPU's acknowledgment via IACK. Not that any human is likely to be able to type faster than a CPU runs, but still ...

Exercises with the IRQ-enabled model

Here are some suggestions for getting to know the IRQ-enabled CPU5 processor model.

  • Comment the IRQ-enabled bits of the IRQ-enabled java emulator CPU5 java class source code.
  • Transfer the IRQ facility (rfe, mfc0, mft0 commands and IRQ coprocessor registers) to the simpler java simulator source codes.
  • Augment the screen interface by mapping the screen signalled and printed methods to a single control port address available through the memory unit. Change the handler to check the control port if it has been signalled by the screen and sum the printed result into a fixed place in memory.
  • Change the CPU and the screen/keyboard to use one unique interrupt line for each peripheral, replacing the CPUs single IRQ and IACK booleans with two vectors of 16 booleans each. That makes it unnecessary for the handler to spend time figuring out which peripheral caused the interrupt.
  • Get the keyboard code working well.
  • Google for "MIPS syscall" and figure out how to get the very roughly sketched-in syscall functionality working in the CPU.
Prev Top