Answer:
The concept of delayed branching is very common in pipelined RISC
architectures. Originally, pipelined architectures simply flushed
the prefetched contents of the pipeline upon encountering a branch.
But RISC designers realized that by the time a branch instruction
is decoded and the decision to branch is made, one or more instructions
following the branch instruction have also been decoded and partially
executed. RISC designers realized that they could simplify the pipeline
design and increase performance by executing the next instruction
or two following the branch instruction regardless of whether the
branch would be taken or not. These subsequent instructions are called
delay slots. If you are programming in assembly language on a processor
with delayed branching, you will need to know how many delay slots
exists.
Initially, you may want to code your assembly language loops with
NOP instructions in the delay slots. Then, you can take a second pass
at the code, moving instructions from above the branch instruction
into the delay slots. Once you become more comfortable with the concept,
you can skip the first step.