Performance

O_DIRECT - The Problem That Grew Up With Multi-Threading

Introduction: A Problem Hiding in Plain Sight

Direct I/O (O_DIRECT) has been a contentious feature in Linux since its introduction. Linus Torvalds famously called it a design “by a deranged monkey on some serious mind-controlling substances” back in 2002. Yet for years, it continued to work—mostly. Applications used it, databases relied on it, and virtual machines benefited from its zero-copy performance.

But something fundamental has changed. As modern software has embraced multi-threading at every level—from applications to filesystems within the kernel itself—a problem that was once manageable has become critical. The truth is stark: with O_DIRECT, there is no way to guarantee that nobody will touch your I/O buffers during the operation.

eBPF UDP Load Balancer with Weighted Round-Robin

Introduction

I’ve been working on a new project that required high-performance UDP load balancing with dynamic weight adjustment. Traditional userspace load balancers introduce latency that’s unacceptable for our use case, so I decided to implement a kernel-level solution using eBPF (extended Berkeley Packet Filter).

The result is ebpflb_udp_wrr, an eBPF-based UDP load balancer that distributes incoming UDP traffic to local listeners using a weighted round-robin algorithm.

Why eBPF and XDP?

eBPF has revolutionized how we can extend kernel functionality without writing kernel modules or modifying the kernel source. Combined with XDP (eXpress Data Path), we can process packets at the earliest possible point in the networking stack—right when they arrive at the network interface—minimizing latency.