Design and Implementation of Snoop Filters for Chip Multiprocessors

Speaker:  Valentina Salapura – NY, United States
Topic(s):  Applied Computing


As multi-core processors evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings.

            The Blue Gene/P chip multiprocessor (CMP) scales node performance using a multi-core system-on-a-chip design.  While in the past large symmetric multi processor (SMP) designs were sized to handle large amounts of coherence traffic, many modern CMP designs find this cost prohibitive in terms of area, power dissipation, and design complexity.  To ensure high efficiency of each quad-processor node in Blue Gene/P, taming the cost of coherence of traditional SMP designs was a key requirement.

            The Blue Gene/P chip multiprocessor exploits a novel way of reducing coherence cost by filtering useless coherence actions.  Each processor core is paired with a snoop filter which identifies and discards unnecessary coherence requests before they can reach the processor cores. Removing unnecessary lookups reduces the interference of invalidate requests with L1 data cache accesses, and reduces power by eliminating expensive tag array accesses. This approach results in improved power and performance characteristics.

            The Blue Gene/P snoop filters combine stream registers and snoop caches to capture both the locality of snoop addresses and their streaming behavior.   Simulations of SPLASH-2 benchmarks illustrate tradeoffs and strengths of these two techniques. Their combination is shown to be most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines. This translates into an average performance improvement of almost 20% for the NAS benchmarks running on an actual Blue Gene/P system.  Hardware power and performance measurements demonstrate the effectiveness of snoop filters.

About this Lecture

Number of Slides:  na
Duration:  60-90 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.