These days the line between industry and academia are heavily blurred for computing. As CXL becomes more mature, the need for an ‘academic corner’ will likely diminish on this blog, but for now we’re going to spend some time here and there talking about cool papers and articles that are CXL related and CXL adjacent. Follow the anthropomorphic rabbit at the unsettling typewritter and let’s dive in:
Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters
Summary #
Memory fragmentation due to unmovable pages as server uptime increases creates a bottleneck in allocation of contiguous virtual memory, esp. as it relates to large (1GB) pages. Contiguitas proposes a new memory management scheme which forces the host OS to divide the physical memory space into regions of moveable and stationary memory allocation, with a floating demarcation line between the two regions as host workload needs vary. The work also contains a proposal for allowing unmovable pages to be relocated transparently via a novel hardware mechanism.
For Meta’s workloads, essentially web server and cache/cdn type services, this approach sees performance improvements in both light and highly fragmented memory spaces (2-9%; 7-18%) for about 25% of all of Meta’s servers. If you were to assume that Meta deploys 200k servers globally, this could reduce their fleet deployment by 9k servers without sacrificing performance. A CAPEX reduction of what, a million-ish dollars? And a power savings of single digit megawatt hours, over five years is like low 8 figure millions in power? Not bad for large scale deployments of servers.
What makes this paper stand out? #
Contiguitas proposes a hardware mechanism to manage transparent migration of ‘unmovable pages’ in memory. This feels very much like a mechanism that cxl attached device could service in a straight-forward way… and if you spend 5 minutes on the internet you find ‘TPP: Transparent Page Placement for CXL-Enabled Tiered Memory, ASPLOS 2023.’
How is this any different from [insert idea]? #
This is a well-loved space, and the authors provide a ton a references. I thought though that linux added OS support for migration of kernel/driver pages back in the 4.x kernel 2015 timeframe? It would be really cool to hear the authors come talk about how their HW enabled transparent migration differs/is better/etc. from a software implementation. As cool a usecase as carving up some T2 device space for this proposal sounds, if it’s just as straight-forward to let the host os handle it without major performance impact then the value drops.
Workloads considered #
Webserver and web caching workloads, specifically NGINX and memcached
Take-aways for me #
The paper provides some good insight/real world data into one of the world’s largest website and CDN providers, Meta. Readers can learn what challenges other web providers like Akamai, Cloudflare, Google (to a certain extent), and others face when servicing large volumes of cacheable web data.
Additionally if readers are unfamiliar with TLB and TLB shootdown topics, the references are of high value.
Figures 5-8, and 12 were the most interesting for me.
How do I do this? #
Hi there, we’re Jackrabbit Labs (JRL for short). We build open source CXL things then help you use them. If you’re finding yourself scratching your head on day 2 of your CXL deployment, please reach out. We’d love to get to know you.