wow die 20 Mbyte sind schon eine Menge.
die 15 Mbyte für den 6 kerner sind ja auch schon riesig
wie viel der fläche fällt eigentlich auf den Speicher.
Du musst relativieren, im LLC sind auch der erste und zweite Cache drin (also der Inhalt).
2.5.1 L1 Instruction Cache
The out-of-order execution engine of AMD Family 15h processors contains a 64-Kbyte, 2-way setassociative L1 instruction cache. Each line in this cache is 64 bytes long. However, only 32 bytes are fetched in every cycle. Functions associated with the L1 instruction cache are instruction loads, instruction prefetching, instruction predecoding, and branch prediction. Requests that miss in the L1 instruction cache are fetched from the L2 cache or, subsequently, from the L3 cache or system memory. On misses, the L1 instruction cache generates fill requests to a naturally aligned 64-byte line containing the instructions and the next sequential line of bytes (a prefetch). Because code typically exhibits spatial locality, prefetching is an effective technique for avoiding decode stalls. Cache-line replacement is based on a least-recently-used replacement algorithm. Predecoding begins as the L1 instruction cache is filled. Predecode information is generated and stored alongside the instruction cache. This information is used to help efficiently identify the boundaries between variable length AMD64 instructions.
2.5.2 L1 Data Cache
The AMD Family 15h processor contains a 16-Kbyte, 4-way predicted L1 data cache with two 128- bit ports. This is a write-through cache that supports up to two 128 Byte loads per cycle. It is divided into 16 banks, each 16 bytes wide. In addition, the L1 cache is protected from single bit errors through the use of parity. There is a hardware prefetcher that brings data into the L1 data cache to avoid misses. The L1 data cache has a 4-cycle load-to-use latency. Only one load can be performed from a given bank of the L1 cache in a single cycle.
2.5.3 L2 Cache
The AMD Family 15h processor has one shared L2 cache per compute unit. This full-speed on-die L2 cache is mostly inclusive relative to the L1 cache. The L2 is a write-through cache. Every time a store is performed in a core, that address is written into both the L1 data cache of the core the store belongs to and the L2 cache (which is shared between the two cores). The L2 cache has an 18-20 cycle load to use latency. Size and associativity of the AMD Family 15h processor L2 cache is implementation dependent. See the appropriate BIOS and Kernel Developer’s Guide for details.
2.5.4 L3 Cache
The AMD Family 15h processor supports a maximum of 8MB of L3 cache per die, distributed among four L3 sub-caches which can each be up to 2MB in size. The L3 cache is considered a non-inclusive victim cache architecture optimized for multi-core AMD processors. Only L2 evictions cause allocations into the L3 cache. Requests that hit in the L3 cache can either leave the data in the L3 cache—if it is likely the data is being accessed by multiple cores—or remove the data from the L3 cache (and place it solely in the L1 cache, creating space for other L2 victim/copy-backs), if it is likely the data is only being accessed by a single core. Furthermore, the L3 cache of the AMD Family 15h processor also features a number of micro-architectural improvements that enable higher bandwidth.
DAS SUCKT!!!See the appropriate BIOS and Kernel Developer’s Guide for details.
Ich weiß schon, warum ich fast einen kompletten verregneten Sonntag an dem Ding hing
Allerdings muss ich zugeben, dass mir das Cache-Prinzip immer noch nicht wirklich klar ist, da braucht's weitere Informationen.