The compute market has always been hungry for memory bandwidth, particularly for high-performance applications in servers and datacenters. In recent years, the explosion in core counts per socket has further accentuated this need. Despite progress in DDR speeds, the available bandwidth per core has unfortunately not seen a corresponding scaling.

The stakeholders in the industry have been attempting to address this by building additional technology on top of existing widely-adopted memory standards. With DDR5, there are currently two technologies attempting to increase the peak bandwidth beyond the official speeds. In late 2022, SK hynix introduced MCR-DIMMs meant for operating with specific Intel server platforms. On the other hand, JEDEC - the standards-setting body - also developed specifications for MR-DIMMs with a similar approach. Both of them build upon existing DDR5 technologies by attempting to combine multiple ranks to improve peak bandwidth and latency.

How MR-DIMMs Work

The MR-DIMM standard is conceptually simple - there are multiple ranks of memory modules operating at standard DDR5 speeds with a data buffer in front. The buffer operates at 2x the speed on the host interface side, allowing for essentially double the transfer rates. The challenges obviously lie in being able to operate the logic in the host memory controller at the higher speed and keeping the power consumption / thermals in check.

The first version of the JEDEC MR-DIMM standard specifies speeds of 8800 MT/s, with the next generation at 12800 MT/s. JEDEC also has a clear roadmap for this technology, keeping it in sync with the the improvements in the DDR5 standard.

Micron MR-DIMMs - Bandwidth and Capacity Plays

Micron and Intel have been working closely in the last few quarters to bring their former's first-generation MR-DIMM lineup to the market. Intel's Xeon 6 Family with P-Cores (Granite Rapids) is the first platform to bring MR-DIMM support at 8800 MT/s on the host side. Micron's standard-sized MR-DIMMs (suitable for 1U servers) and TFF (tall form-factor) MR-DIMMs (for 2U+ servers) have been qualified for use with the same.

The benefits offered by MR-DIMMs are evident from the JEDEC specifications, allowing for increased data rates and system bandwidth, with improvements in latency. On the capacity side, allowing for additional ranks on the modules has enabled Micron to offer a 256 GB capacity point. It must be noted that some vendors are also using TSV (through-silicon vias) technology to to increase the per-package capacity at standard DDR5 speeds, but this adds additional cost and complexity that are largely absent in the MR-DIMM manufacturing process.

The tall form-factor (TFF) MR-DIMMs have a larger surface area compared to the standard-sized ones. For the same airflow configuration, this allows the DIMM to have a better thermal profile. This provides benefits for energy efficiency as well by reducing the possibility of thermal throttling.

Micron is launching a comprehensive lineup of MR-DIMMs in both standard and tall form-factors today, with multiple DRAM densities and speed options as noted above.

MRDIMM Benefits - Intel Granite Rapids Gets a Performance Boost

Micron and Intel hosted a media / analyst briefing recently to demonstrate the benefits of MR-DIMMs for Xeon 6 with P-Cores (Granite Rapids). Using a 2P configuration with 96-core Xeon 6 processors, benchmarks for different workloads were processed with both 8800 MT/s MR-DIMMs and 6400 MT/s RDIMMs. The chosen workloads are particularly notorious for being limited in performance by memory bandwidth.

OpenFOAM is a widely-used CFD workload that benefits from MR-DIMMs. For the same memory capacity, the 8800 MT/s MR-DIMM shows a 1.31x speedup based on higher average bandwidth and IPC improvements, along with lower last-level cache miss latency.

The performance benefits are particularly evident with more cores participating the workload.

Apache Spark is a commonly used big-data platform operating on large datasets. Depending on the exact dataset in the picture, the performance benefits of MR-DIMMs can vary. Micron and Intel used a 2.4TB set from Intel's Hibench benchmark suite for this benchmark, showing a 1.2x speedup at the same capacity and 1.7x speedup with doubled-capacity TFF MR-DIMMs.

Avoiding the need to push data back to the permanent storage also contributes to the speedup.

The higher speed offered by MR-DIMMs also helps in AI inferencing workloads, with Micron and Intel showing a 1.31x inference performance improvement along with reduced time to first token for a Llama 3 8B parameter model. Obviously, purpose-built inferencing solutions based on accelerators will perform better. However, this was offered as a demonstration of the type of CPU workloads that can benefit from MR-DIMMs.

As the adage goes, there is no free lunch. At 8800 MT/s, MR-DIMMs are definitely going to guzzle more power compared to 6400 MT/s RDIMMs. However, the faster completion of workloads mean that the the energy consumption for a given workload will be lower for the MR-DIMM configurations. We would have liked Micron and Intel to quantify this aspect for the benchmarks presented in the demonstration. Additionally, Micron indicated that the energy efficiency (in terms of pico-joules per bit transferred) is largely similar for both the 6400 MT/s RDIMMs and 8800 MT/s MR-DIMMs.

Key Takeaways

The standardization of MR-DIMMs by JEDEC allows multiple industry stakeholders to participate in the market. Customers are not vendor-locked and can compare and contrast options from different vendors to choose the best fit for their needs.

At Computex, we saw MR-DIMMs from ADATA on display. As a Tier-2 vendor without its own DRAM fab, ADATA's play is on cost benefits with the possibility of the DRAM die being sourced from different fabs. The MR-DIMM board layout is dictated by JEDEC specifications, and this allows Tier-2 vendors to have their own play with pricing flexibility. Modules are also built based on customer orders. Micron, on the other hand, has a more comprehensive portfolio / lineup of SKUs for different use-cases with the pros and cons of vertical integration in the picture.

Micron is also not the first to publicly announce MR-DIMM sampling. Samsung announced their own lineup (based on 16Gb DRAM dies) last month. It must be noted that Micron's MR-DIMM portfolio uses 16 Gb, 24 Gb, and 32 Gb dies fabricated in 1β technology. While Samsung's process for the 16 Gb dies used in their MR-DIMMs is not known, Micron believes that their MR-DIMM technology will provide better power efficiency compared to the competition while also offering customers a wider range of capacities and configurations.

POST A COMMENT

5 Comments

View All Comments

  • ballsystemlord - Thursday, July 18, 2024 - link

    I'm a bit confused as to why they only validated Granite Rapids. Is Zen not worth it? Or maybe Granite Rapids was just the fastest to verify and they're working on other server CPUs? Reply
  • ganeshts - Friday, July 19, 2024 - link

    The host controller needs to support the high-speed memory interface - Granite Rapids is first off the blocks with DDR5-8800 support (which is why it is able to support these MRDIMMs).

    I suspect other server CPU vendors will follow suit in supporting these high-speed DIMMs. Currently, I am not aware of any other server CPU with this support.
    Reply
  • ballsystemlord - Friday, July 19, 2024 - link

    Thanks. Even without Zen, it was still odd to me that no other Intel products were validated. I did not realize that the memory controllers of CPUs were not necessarily up to par with the latest high speed RAM. Reply
  • Dolda2000 - Saturday, July 20, 2024 - link

    I must admit I don't get the point of MR-DIMMs. We obviously already have chips that manage 8000 MT/s. I thought the whole challenge of DDR speeds in servers is about the memory controllers and mainboard routing, rather than the chips on the sticks, and I don't see how MR-DIMMs would alleviate those issues. Reply
  • ballsystemlord - Monday, July 22, 2024 - link

    Think of it this way, if each DDR RAM chip has to run at 1/2 the speed of the DRAM bus then you can use much slower chips for your DIMMs.
    Of course the other things you mention matter, but we're also apparently reaching the point where the RAM chip's analog front end is having trouble keeping up.
    Reply

Log in

Don't have an account? Sign up now