Cache memory is intended to give fast memory speed, while at the same time providing a large memory size at a less expensive price.
There are a large number of cache implementations, but these basic design elements serve to classify cache architectures: cache size; mapping function; replacement algorithm; write policy; line size; number of caches.
The principles of cache memory are to allow the processor to access memory as fast as possible while allowing for a large memory storage at a low cost.
The design constraints on a computer's memory can be summed up by three questions:
How much?
How fast?
How expensive?
Unfortunately, the relationship between these 3 is this: as the speed increases, the price increases; as the storage capacity increases, the price decreases, but the speed decreases.
Since we can't make very cheap memory that is fast and has a huge amount of storage space, we use what is called a memory hierarchy to solve this problem.
Memory Hierarchy
First Level
Registers
Second Level
Cache
Third Level
Main Memory
Fourth Level
Magnetic Disk
CD-ROM
CD-RW
DVD-RW
DVD-RAM
Fifth Level
Magnetic Tape
MO
WORM
Going down the hierarchy, the trend is: decreasing cost per bit; increasing storage capacity; increasing memory access time; decreasing frequency of access by the processor.
Using this model, the smaller, faster, more expensive memories are supplemented by larger, slower, cheaper memories. In order for this system to work, it is important that the processor accesses the larger memories less frequently than the smaller ones.
When a program executes, it typically contains loops and subroutines, so the memory references by the processor for both instructions and data tend to cluster. This principle is known as locality of reference and is what allows for larger memories being accessed less frequently by the processor as when transferring data from the larger memory to the smaller memory, it is usually done in ‘blocks’, having more probability that the next lot of instructions or data is available in the smaller memory.
The structure of cache memory is related to that of main memory. Main memory is made up of a number of addressable words, the maximum is determined by the number of address bits allocated to a word, such that the number of addressable words = 2address bits.
The words in main memory are split into a number of blocks of a fixed number of words each. So, the number of blocks = the number of addressable words ÷ the number of words per block.
The cache consists of a number of lines. Each line contains a tag of a few bits in length and a block of words corresponding to the size of the blocks in main memory, such that the line size = tag length + (the number of words per block x the number of bits per word).
The number of lines in cache is considerably smaller than the number of blocks in main memory. Because of this, an individual line cannot be permanently dedicated to a block in main memory. Therefore, each line has a tag, indicating which block in main memory it is currently storing.
Cache Read Operation
When a cache hit occurs, the address buffer is disabled and the data is taken from the cache to the processor. When a cache miss occurs, the desired address is loaded onto the system bus and the data is returned via the data buffer to both the processor and the cache symltaneously.