Skip to main content

Thoughts on Pools

·655 words·4 mins
Author
Grant Mackey
CTO
Academic Corner - This article is part of a series.
Part 3: This Article

These days the line between industry and academia are heavily blurred for computing. As CXL becomes more mature, the need for an ‘academic corner’ will likely diminish on this blog, but for now we’re going to spend some time here and there talking about cool papers and articles that are CXL related and CXL adjacent. Follow the anthropomorphic rabbit at the unsettling typewritter and let’s dive in:
Dynamic Capacity Service for Improving CXL Pooled Memory Efficiency

Summary
#

The authors from SK hynix introduce a dynamic capacity service (DCS) and hardware engine for CXL pooled memory enabling connected hosts the ability to dynamically alloc/dealloc memory on demand.

The article states that two challenges with DRAM utilization in servers today are DRAM BW bottlenecks and stranded dram, often because of the overprovisioning occurring to maximize dram BW.

Authors claim that CXL and pooled memory with implicitly address these challenges because CXL will increase DRAM channels in a system and pools of shared memory devices will resolve local dram stranding.

What makes this paper stand out?
#

Authors plumb a steel-thread example from application layer to hardware to prove their claim that DCSs can be useful in addressing DRAM BW and stranding issues.

  • An FGPA implementation of a multi-headed type 3 CXL memory device that implements their DCS engine and memory sanitization functionalities.
  • CXL pooled memory detection and mgmt: Device driver that detects CXL memory pools and registers pooled mem as memory and not an external device in the linux kernel
  • CXL mailbox mechanism for hosts to request memory regions from the DCS engine in the T3 device
  • A modified version of Kubernetes that is CXL aware and can dynamically alloc/dealloc CXL pool resources (an admitted hack by the authors, still very cool).

How is this any different from [insert idea]?
#

Not much, this article primarily explains some what/how of CXL then starts discussing the Proof of Concept DCS system with FPGA implementation.

A picture of the SK hynix Niagra CXL platform at it’s booth during SC'23
The SK hynix Niagra platform that I think the authors used for this work.

workloads considered
#

DCS cluster provisions Kubernetes with two worker nodes and pagerank as its benchmark application. Pod utilization on the DCS enabled k8 cluster can run all pods simultaneously while the baseline cluster must wait for some pods to complete before scheduling the remaining work, this is ~25% greater memory utilization than the baseline system.

Take-aways for me
#

The authors provide a ‘lessons learned’ for their demo.

  • “Offlining of a memory section fails frequently”
    • Migrating live memory pages and taking memory offline is hard lol, authors take a counter-based approach for selecting victim pages when moving memory and offlining swaths of memory. No commentary on the success rate, but it seems to get them through their demo?
  • “CXL pooled memory can be used as a slow-tier memory for multiple fast-tier memory nodes”
    • 5.15 of the linux kernel and beyond supports demoting pages from dram to other attached memories rather than paging to disk. CXL can take advantage of this as a pooled resource and serve as an additional paging layer before hitting disk.
  • “To enable QoS, a user needs a knob to control the capacity ratio between local DRAM and CXL pooled memory”
    • The authors implement a single knob for QoS adjustment that sets limits of local dram and CXL memory per container/instance/process/user-space thing
  • “Kubernetes manages system memory as a fixed resource”
    • The authors hacked on Kubernetes a bit so that kubelets can get updated values to their memory.max parameter as CXL resources are attached/detached so that the containers are prematurely exited for things like OoM exceptions.

Figures 4, 5, and 7 were the most interesting for me.

How do I do this?
#

Hi there, we’re Jackrabbit Labs (JRL for short). We build open source CXL things then help you use them. If you’re finding yourself scratching your head on day 2 of your CXL deployment, please reach out. We’d love to get to know you.

Academic Corner - This article is part of a series.
Part 3: This Article