though Moore’s law continues to provide increasing transistor counts, the limited
on-chip power budget restricts the percentage of active transistors [Venkatesh et al.
2010; Esmaeilzadeh et al. 2011; Taylor 2012; Goulding-Hotta et al. 2012; Allred et al.
2012]. In recent years, an increasing percentage of those transistors are invested in the
large last-level caches (LLCs) utilized to bridge the gap between fast CPU cores and
slow off-chip memory accesses. Specifically, LLCs occupy as much as 50% of the chip
area and contribute to a significant amount of the chip’s leakage power [Kurd et al.
2010; Naffziger et al. 2006; Wendel et al. 2010; Wilkerson et al. 2010]. As shown in
Figure 1, a 16MB LLC consumes about 27% of on-chip power in a 16-core system, with
leakage power dominating the LLC’s power consumption. Hence, managing the power
consumption of LLCs has become an important design issue for future CMPs.
The high leakage power of the LLC comes from its large size, and its size comes
from conservative design-time choices that aim to accommodate most applications’
memory footprints. However, not all workloads running on CMPs need the entire
cache during their execution. Figure 2(a) illustrates the variable sensitivity of workloads
to changes in LLC capacity on a 16-core system. On the x-axis are multiprogrammed
workloads composed of benchmarks with different demands on capacity (see
Section 6 for workloads and simulation details). For example, workloads LL1 and
LL2 do not benefit from a larger capacity, while the performance of TH1 and TH2
improves significantly when a larger LLC is employed. Further, the required cache
size may also vary with different program phases, as shown in Figure 2(b). When
the required cache size is smaller, some parts of the LLC can be disabled to reduce
leakage power. In Figure 2(a), for example, if a 5% performance degradation is acceptable,
more than half of the LLC can be disabled to save power in all but two
workloads