Kove Redefines AI Infrastructure: Software-Defined Memory Unlocks Scalable KV Capabilities for Next-Generation Inference

Kove Ideas

John Overton CEO & Founder, KOVE

At AI Infra Summit, Kove demonstrated how Software-Defined Memory enables memory-bound KV workloads to scale significantly larger than local DRAM while maintaining like-local latency — a critical capability as AI inference grows increasingly memory constrained.

Watch the full AI Infra 2025 Keynote

The Compute Race Has Left Memory Behind

The AI infrastructure narrative in 2025 has been dominated by compute. Meta, NVIDIA, and AWS are racing to deliver faster accelerators, larger GPU clusters, and new memory hierarchies. Benchmarks like MLPerf continue to push theoretical ceilings ever higher.

But enterprises running real workloads know the truth:

AI isn’t compute-bound anymore, it’s memory-bound.

Most latency stalls originate in memory, not compute.
GPUs frequently idle while waiting for data.
DRAM remains rigid, tied to individual servers, and chronically underutilized.

The result is an expensive paradox:

Organizations buy more compute, more GPUs, and more hardware, yet workloads still slow down because memory hasn’t kept pace.

At AI Infra Summit 2025, Kove CEO John Overton introduced a fundamentally different paradigm: Software-Defined Memory (Kove:SDM^™) — a platform that virtualizes DRAM across servers into a unified, elastic memory pool with latency equivalent to local DRAM, even when memory is served from across the data center.

Kove:SDM^™ delivers the next layer of AI infrastructure that has been missing: Elastic, scalable, like-local memory that eliminates DRAM ceilings.

Why Memory Has Become the Bottleneck

AI workloads are scaling in every dimension: context length, model size, concurrency, and real-time demands. Compute is no longer the limiting factor, memory is.

Traditional DRAM is:

Locked to a single server, unable to be pooled or shared.
Provisioned for peak, creating massive stranding and low utilization.
Inflexible, requiring oversized, high-memory servers that sit idle most of the time.

This leads to pervasive inefficiencies:

Training pipelines fragment to fit memory constraints.
Inference pipelines stall when memory limits are reached.
Enterprises overspend on hardware yet still hit DRAM ceilings.

The next major unlock isn’t more compute. It’s removing memory constraints altogether.

The Inference Memory Problem: KV-Style Access Patterns

Today’s inference engines — including vLLM, SGLang, TensorRT-LLM, and other large-context architectures — rely heavily on KV-style access patterns.

Even though Redis and Valkey are not the systems vLLM uses, the technical pattern is the same:

High-velocity key/value lookups
Strict latency sensitivity
Performance collapse when memory limits are reached
KV datasets growing rapidly as context windows expand

When inference workloads outgrow DRAM, CPU-side KV data spills into slower tiers or requires recomputation, which wastes GPU cycles and inflates infrastructure cost.

Expanding and sustaining DRAM-class performance for KV-style workloads is now one of the biggest unlocks for scalable inference.

This is where Kove:SDM^™ shines.

Benchmarking SDM: Redis & Valkey as Proxies for KV Performance at Scale

During AI Infra Summit, Kove shared benchmark results using Redis and Valkey, not because hyperscalers use them directly for inference engines like vLLM but because they are:

Widely adopted
Well-understood
Highly memory-bound
Latency-sensitive
Excellent proxies for evaluating KV-style workload behavior under DRAM pressure

These systems represent a clean, industry-recognized way to demonstrate how Kove:SDM^™ handles the very access patterns that dominate inference.

Key Insight: 

If Kove:SDM^™ can sustain DRAM-class latency and stability running KV workloads far beyond local DRAM limits, it can also sustain larger CPU-side KV structures for inference frameworks without requiring tiering or recomputation.

Benchmark Highlights

Redis Benchmark (General KV Workload Scaling)

Environment: Redis OSS v7.2.4 on Oracle Cloud Infrastructure

Results demonstrated that Kove:SDM^™ enabled:

Workloads approximately 5x larger than the server’s physical DRAM
Latency equivalent to or better than local memory in most operations
Stable throughput even as working sets expanded significantly

Why this matters:
Redis runs in-memory and has well-understood performance properties. Its performance under SDM validates that memory pooling can dramatically expand DRAM-limited workloads without sacrificing latency.

Valkey Benchmark (Relevant KV Pattern for Inference)

Environment: Valkey v8.0.2 in an Oracle RoCE test environment

Results demonstrated:

Support for workloads nearly 5x larger than local DRAM
Latency consistent with local DRAM behavior
Stable throughput as dataset size scaled

Why this matters:
Valkey’s performance under SDM confirms that large, latency-sensitive KV datasets can operate at DRAM-equivalent performance even when the memory footprint far exceeds local server capacity.

These results apply directly to the KV-style access behavior seen in modern inference systems, even though the systems themselves differ.

Why It Matters for the Future of Inference

As John Overton emphasized:

“Every recompute avoided is GPU time returned to the business. Memory limits create structural waste. Removing those limits creates structural efficiency.”
— John Overton, CEO of Kove

The takeaway is clear:

Redis proves SDM can scale KV workloads.
Valkey proves SDM maintains DRAM-class performance across expanded memory footprints.
Together, they demonstrate the ability to support larger CPU-side KV datasets, a key component of high-throughput, large-context inference engines.

This is the foundation for scaling AI inference sustainably.

The Business Impact

Memory ceilings inflate cost across the entire AI stack. Kove:SDM^™ reverses that dynamic.

Enterprises typically see:

$30–40M+

annual savings at
large scale

20–30%

reduction in hardware
spend by delaying
server refresh cycles

Up to 54%

reduction in power
and cooling

Higher system reliability

by eliminating memory overrun
conditions

Why Now

Inference demand is accelerating faster than compute supply:

Context windows are growing into the hundreds of thousands
Inference already surpasses training costs in many organizations
DRAM density and pricing can’t keep up with model growth
GPUs remain underutilized because they wait on memory, not compute

Without a new memory architecture, inference scaling becomes unsustainable.

With Kove:SDM^™, organizations can:

Scale workloads 5x larger on the same servers
Reduce GPU idle time by eliminating memory-induced recompute
Lower cost per token and per inference
Operate within existing power, space, and cost envelopes

Frequently Asked Questions (FAQs)

Q: What is the memory bottleneck in AI?

A: AI models generate enormous datasets that outstrip the capacity of server DRAM. When memory fills up, data is evicted, forcing recomputation or disk spills. This slows workloads and wastes expensive GPU cycles.

Q: What is KV Cache and why is it important for inference?

A: KV Cache stores key/value pairs of previously computed tokens during inference. By reusing this data, models avoid recomputation and generate responses faster. If the cache is too small, data is evicted and GPUs must redo the work.

Q: Why benchmark Redis and Valkey?

A: They are industry-standard KV systems and excellent proxies for stress-testing memory-bound KV access patterns. The results transfer to AI inference because the underlying memory behaviors are similar. 

Redis has long been the world’s most popular in-memory KV store. Valkey is an open-source fork optimized for performance and integrated into AI inference frameworks like vLLM via LMCache. Benchmarks on Redis prove the general case; Valkey benchmarks show direct relevance to inference workloads.

Q: How does Kove:SDM^™ improve Redis and Valkey?

A: By pooling DRAM across servers, SDM enables Redis and Valkey to handle 5x larger workloads at stable latency. This expands KV Cache capacity, reducing recomputes and improving inference throughput. 

 Q: How does Kove:SDM^™ help inference engines?

A: By removing DRAM ceilings for CPU-side KV structures, SDM reduces recomputation, lowers GPU idle time, and sustains throughput as context windows grow.

 Q: Is Kove:SDM^™ available today?

A: Yes. It runs on existing x86 servers. No rewrites, no kernel changes, and no new hardware required.

Q: How is Kove:SDM^™ different from other approaches (like CXL or storage tiering)?

CXL: Hardware-based, adds latency.
Storage tiering (NVIDIA, DDN, Weka, etc.): Offloads cache to SSD/storage, slower than DRAM.
Kove:SDM^™: Software-only, available today, pools DRAM across servers at local or lower latency, indistinguishable from local memory.

Kove Redefines AI Infrastructure: Software-Defined Memory Unlocks Scalable KV Capabilities for Next-Generation Inference

Watch the full AI Infra 2025 Keynote

The Compute Race Has Left Memory Behind

Why Memory Has Become the Bottleneck

The Inference Memory Problem: KV-Style Access Patterns

Benchmarking SDM: Redis & Valkey as Proxies for KV Performance at Scale

Benchmark Highlights

Why It Matters for the Future of Inference

The Business Impact

Why Now

Frequently Asked Questions (FAQs)

Q: What is the memory bottleneck in AI?

Q: What is KV Cache and why is it important for inference?

Q: Why benchmark Redis and Valkey?

Q: How does Kove:SDM^™ improve Redis and Valkey?

Q: How does Kove:SDM^™ help inference engines?

Q: Is Kove:SDM^™ available today?

Q: How is Kove:SDM^™ different from other approaches (like CXL or storage tiering)?

Get In Touch

Let's talk.

Kove Redefines AI Infrastructure: Software-Defined Memory Unlocks Scalable KV Capabilities for Next-Generation Inference

Watch the full AI Infra 2025 Keynote

The Compute Race Has Left Memory Behind

Why Memory Has Become the Bottleneck

The Inference Memory Problem: KV-Style Access Patterns

Benchmarking SDM: Redis & Valkey as Proxies for KV Performance at Scale

Benchmark Highlights

Why It Matters for the Future of Inference

The Business Impact

Why Now

Frequently Asked Questions (FAQs)

Q: What is the memory bottleneck in AI?

Q: What is KV Cache and why is it important for inference?

Q: Why benchmark Redis and Valkey?

Q: How does Kove:SDM™ improve Redis and Valkey?

Q: How does Kove:SDM™ help inference engines?

Q: Is Kove:SDM™ available today?

Q: How is Kove:SDM™ different from other approaches (like CXL or storage tiering)?

Get In Touch

Let's talk.

Q: How does Kove:SDM^™ improve Redis and Valkey?

 Q: How does Kove:SDM^™ help inference engines?

 Q: Is Kove:SDM^™ available today?

Q: How is Kove:SDM^™ different from other approaches (like CXL or storage tiering)?