Site icon Hardware Design and Verification

Computer Architecture Questions and Answer

Q What is difference between write-thru and write-back caches? What are the advantages and disadvantages?

 Write Thru Cache: In a write-thru cache, every write operation to the cache is also written to the main memory. This is simple to design as memory is always up to date with respect to cache, but comes with the drawback that memory bandwidth is always consumed for writes.

Write Back Cache: In a write-back cache, every write operation to the cache is only written to cache. Write to main memory is deferred until the cache line is evicted or discarded from the cache. Write back Caches are better in terms of memory bandwidth as data is written back only when needed. The complexity comes in maintaining coherent data if there are multiple caches in system that can cache a same address, as memory may not always have latest data.

Q What is the difference between an inclusive and exclusive cache?

Inclusive and exclusive properties for caches are applicable for designs that have multiple levels of caches (example: L1, L2, L3 caches). If all the addresses present in a L1 (Level 1) cache is designed to be also present in a L2 (Level 2) cache, then the L1 cache is called a strictly inclusive cache. If all the addresses are guaranteed to be in at-most only one of the L1 and L2 caches and never in both, then the caches are called exclusive caches. One advantage of exclusive cache is that the multiple levels of caches can together store more data. One advantage of inclusive cache is that in a multiprocessor system, if a cache line has to be removed from a processor’s cache, it has to be checked only in L2 cache while with exclusive caches, it has to be checked for presence in both L1 and L2 caches.

Q What are the different algorithms used for cache line replacement in a set-way associative cache?

1) LRU (Least Recently Used) Algorithm: This algorithm keeps track of when a cache line is used by associating “age bits” along with cache line and discards the least recently used one when needed.

2) MRU (Most Recently Used) Algorithm: This is opposite to LRU and the line that is most recently used in terms of age gets replaced.

3) PLRU (Pseudo LRU) Algorithm: This is similar to LRU except that instead of having aging bits (which is costly with larger and higher associative caches), only one or two bits are implemented to keep track of usage.

4) LFU (Least Frequently Used) Algorithm: This algorithm keeps track of how often a line is accessed and decides to replace the ones that are used least number of times.

 

Q What is the problem of cache coherency? 

In Shared Multiprocessor (SMP) systems where multiple processors have their own caches, it is possible that multiple copies of same data (same address) can exist in different caches simultaneously. If each processor is allowed to update the cache freely, then it is possible to result in an inconsistent view of the memory. This is known as cache coherency problem. For example: If two processors are allowed to write value to a same address, then a read of same address on different processors might see different values.

Q What is the difference between snoop based and directory based cache coherency protocol?

1) Snoop based Coherence Protocol: In a Snoop based Coherence protocol; a request for data from a processor is send to all other processors that are part of the shared system. Every other processor snoops this request and sees if they have a copy of the data and responds accordingly. Thus every processor tries to maintain a coherent view of the memory

2) Directory based Coherence Protocol: In a Directory based Coherence protocol; a directory is used to track which processors are accessing and caching which addresses. Any processor making a new request will check against this directory to know if any other agent has a copy and then can send a point to point request to that agent to get the latest copy of data.

Q What is a MESI protocol?

The MESI protocol is the most commonly used protocol for cache coherency in a design

with multiple write back caches. The MESI stands for states that are tracked per cache line

in all the caches and are used to respond to snoop requests. These different states can be

explained as below:

1) M (Modified): This state indicates that the cache line data is modified with respect to data in main memory and is dirty.

2) E (Exclusive): This state indicates that the cache line data is clean with respect to memory but is exclusively present only in this memory. The exclusive property allows the processor in which this cache is present to do a write to this line

3) S (Shared): This state indicates that the cache line data is shared in multiple caches with same value and is also clean with respect to memory. Since this is shared with all caches, the protocol does not allow a write to this cache line.

4) I (Invalid): This state indicates that the cache line is invalid and does not have any valid data.A cache can service a read request when the cache line is in any state other than Invalid.A cache can service a write request only when the cache line is in Modified or Exclusive

state.

Q What are MESIF and MOESIF protocols?

These are two extensions of MESI protocol which introduces two new states “F” and “O”

which are explained below:

1) F (Forward): The F state is a specialized form of the S state, and indicates that a cache should act as a designated responder for any requests for the given line by forwarding data. If there are multiple caches in system having same line in S state, then one of them is designated as F state to forward data for new requests from a

different processor. The protocol ensures that if any cache holds a line in the S state, at most one (other) cache only holds it in the F state. This state helps in reducing traffic to memory as without F state, even if a cache line is in S state in multiple caches, none of them cannot forward data to a different processor requesting a read or write. (Note that an S state line in cache can only service the same processors reads).

2) O (Owned): The O state is a special state which was introduced to move the modified or dirty data round different caches in the system without needing to write back to memory. A line can transition to O state from M state if the line is also shared with other caches which can keep the line in S state. The O state helps in deferring the modified data to be written back to memory until really needed.

Q What is a RFO?

RFO stands for Read for Ownership. It is an operation in cache coherency protocol that combines a read and invalidate broadcast. It is issued by a processor trying to write into a cache line that is in the Shared or Invalid states. This causes all other processors to set the

state of that cache line to I. A read for ownership transaction is a read operation with intent to write to that memory address. Hence, this operation is exclusive. It brings data to the cache and invalidates all other processor caches that hold this memory address.

Q What is the concept of Virtual memory?

Virtual memory is a memory management technique that allows a processor to see a virtual contiguous space of addresses even if the actual physical memory is small. The operating system manages virtual address spaces and the assignment of memory from secondary device (like disk) to physical main memory. Address translation hardware in the CPU, often referred to as a memory management unit or MMU, translates virtual addresses to physical addresses. This address translation uses the concept of paging where a contiguous block of memory addresses (known as page) is used for mapping between virtual memory and actual physical memory. Following diagram illustrates this concept.

Q What is the difference between a virtual memory address and a physical memory address?

The address used by a software program or a process to access memory locations in it’s address space is known as virtual address. The Operating System along with hardware then translates this to another address that can be used to actually access the main memory

location on the DRAM and this address is known as physical address. The address translation is done using the concept of paging and if the main memory or DRAM does not have this location, then data is moved from a secondary memory (like Disk) to the

main memory under OS assistance.

Q What is the concept of paging?

All virtual memory implementations divide a virtual address space into pages which are blocks of contiguous virtual memory addresses. A page is the minimum granularity on which memory is moved from a secondary storage to physical memory for managing virtual memory. Pages on most computer systems are usually at least 4 kilobytes in size. Some architectures also supports large page sizes (like 1MB or 4MB) when there is a need of much larger real memory.Page tables are used to translate the virtual addresses seen by the application into physical addresses. The page table is a data structure used to store the translation details of a virtualaddress to a physical address for multiple pages in the memory.

Q What is a TLB (Translation look aside buffer)?

A TLB is a cache that stores the recent address translations of a virtual memory to physical memory which can be then used for faster retrieval later. If a program requests a virtual address and if it can find a match in the TLB, then the physical address can be retrieved from the TLB faster (like a cache) and the main memory need not be accessed. Only, if the translation is not present in TLB, then a memory access needs to be performed to actually do a walk through the page tables for getting the address translation which takes several cycles to complete. Following diagram illustrates this, where if the translation is found in the TLB, the physical address is available directly without needing to go through any Page table translation process.

Q what is meant by page fault?

When a memory page that is mapped into Virtual Address Space but is not loaded into the main memory is accessed by a program, computer hardware [Memory Management Unit (MMU)] raises an interrupt. This interrupt is called Page Fault.

QIf a CPU is busy executing a task, how can we stop it and run another task?

The program execution on a CPU can be interrupted by using external interrupt sources.

Q What are interrupts and exceptions and how are they different?

Interrupt is an asynchronous event that is typically generated by an external hardware (an I/O device or other peripherals) and will not be in sync with instruction execution

boundary. For example: An interrupt can happen from a keyboard or a storage device or a USB port. Interrupts are always serviced after the current instruction execution is over, and the CPU jumps to execution of the Interrupt service routine.

Exceptions are synchronous events generated when processor detect any predefined condition while executing instructions. For example: when a program encounters a divide by zero or an undefined instruction, it can generate an exception. Exceptions are further

divided into three types and how the program flow is altered depends on the type:

1) Faults: Faults are detected and serviced by processor before the faulting instruction

2) Traps: Traps are serviced after the instruction causing the trap. The most common trap is a user defined interrupt used for debugging.

3) Aborts: Aborts are used only to signal severe system problems when execution cannot continue any longer.

Q What is a vectored interrupt?

A vectored interrupt is a type of interrupt in which the interrupting device directs the processor to the correct interrupt service routine using a code that is unique to the interrupt and is sent by the interrupting device to the processor along with the interrupt. For non-vectored interrupts, the first level of interrupt service routine needs to read interrupt status registers to decode which of the possible interrupt sources caused the interrupt and accordingly decide which specific interrupt service routine to be executed.

Q What are the different techniques used to improve performance of instruction fetching from memory?

1) Instruction Cache and Pre-fetch: An instruction cache and Prefetch algorithm will keep on fetching instructions ahead of the actual instruction decode and execute phases, which will hide the memory latency delay for instruction fetch stage in the design.

2) Branch Prediction and Branch Target Prediction: A Branch Prediction will help

in predicting if a conditional branch will take place or not based upon the history and A Branch Target Prediction will help predicting the target before the processor computes. This helps in minimizing instruction fetch stalls as the fetch algorithm can keep fetching instructions based on prediction.

Q What is meant by a superscalar pipelined processor?

A superscalar pipelined design uses instruction level parallelism to enhance performance of processors. Using this technique, a processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. If the processor can execute “N” instructions parallel in a cycle then it is called N-way superscalar.

Q What is the difference between in-order and out-of-order execution?

In-Order Execution: In this model, instructions are always fetched, executed and completed in the order in which they exist in the program. In this mode of execution, if

one of the instructions stalls, then all the instructions behind it also stall.

Out-of-Order Execution: In this model, instructions are fetched in the order in which they exist in the program, their execution can happen in any order, and their completion again happen in-order. The advantage of this model is that if one instruction stalls, then independent instructions behind the stalled instruction can still execute, thereby speeding up the overall execution of program.

Q What is the difference between a conditional branch and unconditional branch instruction?

A branch instruction is used to switch program flow from current instruction to a different instruction sequence. A branch instruction can be a conditional branch instruction or an unconditional branch instruction.

Unconditional Branch Instruction: A branch instruction is called unconditional if the

instruction always results in branching.

Example: Jump <offset> is an unconditional branch as the result of execution will always

cause instruction sequence to start from the <offset> address

Conditional Branch Instruction: A branch instruction is called conditional if it may or not cause branching, depending on some condition.

Example: beq ra , rb, <offset> is a conditional branch instruction that checks if two source registers (ra and rb) are equal, and if they are equal it will jump to the <offset>

address. If they are not equal, then the instruction sequence will continue in the same order

following the branch instruction.

Q What is branch prediction and branch target prediction?

A branch predictor is a design that tries to predict the result of a branch so that correct instruction sequences can be pre-fetched into instruction caches to not stall instruction execution after encountering a branch instruction in the program. A branch predictor predicts if a conditional branch will be taken or not-taken.A branch target predictor is different and predicts the target of a taken conditional branch or an unconditional branch instruction before the target of the branch instruction is

computed by the execution unit of the processor.

Q What is meant by memory mapped I/O?

Memory Mapped I/O (MMIO) is a method of performing input/output (I/O) between a CPU and an I/O or peripheral device. In this case, the CPU uses the same address bus to access both memory and I/O devices (the registers inside I/O device or any memory inside the device). In the system address map, some memory region is reserved for the I/O device and when this address is accessed by the CPU, the corresponding I/O devices that monitor this address bus will respond to the access. For Example: if a CPU has a 32 bit address bus: it can access address from 0 to 232, and in this region, we can reserve addresses (say from 0 to 210) for one or more I/O devices.

Exit mobile version