Zero-allocation HTTP servers in Ruby terms

The question

A Hacker News thread discussed Anil Madhavapeddy's httpz, a high-performance HTTP/1.1 parser written using OxCaml. The original article describes httpz as a parser that aims for no major heap allocation, and very little minor heap allocation, by using OxCaml features such as unboxed types, local allocations, and mutable local variables.¹

The Hacker News comment that triggered this explanation was about a "zero allocation HTTP server." In practice, that does not mean "the program never allocates anything." It means the server is designed so that the steady-state request path avoids new per-request heap objects where possible.

The core idea

Most HTTP servers receive bytes from a socket, parse those bytes into a request, route the request, and write a response. A straightforward implementation creates many temporary objects: strings for method names, strings for paths, hashes for headers, arrays for parsed fields, wrapper objects for slices, and so on.

That is convenient, but it creates garbage. In a garbage-collected runtime, garbage is not free. It must eventually be discovered, traced, swept, compacted, or otherwise accounted for by the runtime. Ruby exposes this directly through its GC module, which exists to control and inspect Ruby's garbage collector.²

A zero-allocation style tries to keep the hot path closer to this shape:

socket bytes -> reusable buffer -> integer spans -> route -> response

Instead of turning GET /users HTTP/1.1 into new Ruby strings immediately, the parser records where each part lives inside the buffer:

method: offset 0, length 3
path:   offset 4, length 6

Those offset/length pairs are often called spans, slices, or views, depending on the ecosystem.

The OxCaml angle

OxCaml matters because it gives the programmer language-level tools for representing these tiny pieces of parser state without ordinary heap allocation. Madhavapeddy's article uses a 32KB input buffer and narrow offset/length representations. The post shows an unboxed record of two 16-bit fields for a span-like value, because 16-bit positions are enough inside a 32KB request-header buffer.¹

The result is not merely fewer objects. It is a different lifetime model: if the request state lives on the call stack, then ending a connection can be as simple as returning from the function that handled it. The original article explicitly connects this to low garbage collector activity in the steady state.¹

Where Ruby differs

Ruby is object-heavy by design. A String is an object, and Ruby's own documentation defines it as an object containing an arbitrary sequence of bytes.³ That makes Ruby expressive, but it also means you should not expect OxCaml-style stack allocation or unboxed records from normal Ruby code.

Still, the technique can be approximated. You can reduce allocations by reusing buffers, scanning bytes directly, avoiding regular expressions on the hot path, and deferring substring creation until the application truly needs a Ruby object.

The Ruby shape

Here is a deliberately small parser sketch. It parses the request line and returns integer spans into the original buffer. It avoids creating substrings while parsing.

module SpanHTTP
  CR = "\r".ord
  LF = "\n".ord
  SP = " ".ord

  def self.match_token?(buf, off, len, token)
    return false unless len == token.bytesize

    i = 0
    while i < len
      return false unless buf.getbyte(off + i) == token.getbyte(i)
      i += 1
    end

    true
  end

  def self.parse_request_line(buf)
    i = 0

    method_off = i
    i += 1 while buf.getbyte(i) && buf.getbyte(i) != SP
    return nil unless buf.getbyte(i) == SP
    method_len = i - method_off
    i += 1

    path_off = i
    i += 1 while buf.getbyte(i) && buf.getbyte(i) != SP
    return nil unless buf.getbyte(i) == SP
    path_len = i - path_off
    i += 1

    while (b = buf.getbyte(i))
      if b == CR && buf.getbyte(i + 1) == LF
        i += 2
        break
      end
      i += 1
    end

    {
      method_off: method_off,
      method_len: method_len,
      path_off: path_off,
      path_len: path_len,
      next_i: i
    }
  end
end

buf = +"GET /users HTTP/1.1\r\nHost: example.test\r\n\r\n"
buf.force_encoding(Encoding::BINARY)

req = SpanHTTP.parse_request_line(buf)

if req && SpanHTTP.match_token?(buf, req[:method_off], req[:method_len], "GET")
  # Route using spans, or allocate a String only at the boundary where it is needed.
end

This is not truly zero-allocation Ruby. The hash returned by parse_request_line allocates; the module and constants already exist as objects; the buffer is a Ruby string. But the example shows the important part: do not allocate a new string for every parsed token unless you have to.

Pipelining complicates it

The hard part is not parsing GET /. The hard part is lifetime.

Suppose one TCP connection sends several HTTP/1.1 requests back-to-back. If the server starts processing request 1, then reads request 2, then request 3, the parser has to remember each request's method, path, headers, and body state. If those fields are merely offsets into one reusable buffer, the server cannot overwrite that buffer until all referenced data is finished.

That gives the implementation three choices:

Do not support HTTP/1.1 pipelining. Handle one request at a time per connection, then reuse the buffer.
Copy retained data. Allocate strings or structs for the parts that must outlive the current read.
Use a pool or ring buffer. Preallocate storage for multiple in-flight requests and manage lifetimes manually.

The Hacker News discussion around httpz touched exactly this issue: asynchronous or pipelined work makes zero-allocation claims more subtle, because suspended request state must live somewhere.⁴

What zero allocation means

In practical server engineering, "zero allocation" usually means zero allocation on the steady-state hot path, not metaphysical purity. A server may still allocate at startup, when accepting connections, when crossing into application code, when logging, when handling errors, or when storing long-lived request state.

The useful question is not "does this allocate ever?" It is:

Does this allocate per request?
Does this allocate per header?
Does this allocate per byte scanned?
Does it allocate only at explicit lifetime boundaries?
Can the allocation behavior be bounded under load?

Ruby takeaways

For Ruby, the realistic lesson is not "write a perfect zero-allocation HTTP server." It is to be deliberate about allocation in hot paths.

Use a binary string buffer. Prefer getbyte when scanning protocol bytes. Represent temporary parser state as offsets and lengths. Avoid creating substrings for data you only need to compare. Convert spans into Ruby strings only once you cross into application logic and truly need normal Ruby objects.

That approach fits Ruby's strengths without pretending Ruby is OxCaml, Rust, or C. You keep the high-level language, but you borrow a systems-programming habit: make object lifetimes explicit.

Where this leaves Ruby

A zero-allocation HTTP server is mostly a discipline around buffers, spans, and lifetimes. OxCaml makes that discipline unusually direct because it can place small parser structures outside the ordinary heap. Ruby cannot reproduce that model exactly, but the same mental model still helps: parse bytes in place, avoid temporary strings, and allocate only when the request data must survive beyond the current buffer.

Sources

Anil Madhavapeddy, "My (very) fast zero-allocation webserver using OxCaml," 1 February 2026. https://anil.recoil.org/notes/oxcaml-httpz
Ruby documentation, GC module. https://docs.ruby-lang.org/en/master/GC.html
Ruby documentation, String class. https://docs.ruby-lang.org/en/3.1/String.html
Hacker News discussion, "My fast zero-allocation webserver using OxCaml." https://news.ycombinator.com/item?id=46854534