Apple Silicon GPU memory limits, wired RAM, and practical testing

1. The original command

The command is:

sudo sysctl iogpu.wired_limit_mb

On Apple Silicon Macs, this reads the current limit for how much memory the Apple GPU driver is allowed to wire, measured in megabytes.

The related write form is:

sudo sysctl iogpu.wired_limit_mb=8192

That sets the GPU wired-memory cap to 8192 MB until reboot or until it is changed again. A value of 0 means macOS uses its default heuristic.

Plain English: this setting does not create VRAM. It changes how much unified memory the GPU is allowed to lock as non-pageable memory.

2. Why wired memory matters

Apple Silicon uses unified memory. CPU and GPU share the same physical RAM. There is no separate VRAM pool like on a discrete GPU.

The relevant term is wired memory. Wired memory cannot be paged out to disk. Once the GPU wires a large amount of memory, the rest of macOS has less room to maneuver.

This makes iogpu.wired_limit_mb a safety and capacity knob:

Higher values allow larger GPU-resident working sets.
Lower values stop the GPU from locking too much memory.
Neither setting changes the amount of physical RAM in the machine.

3. Does lowering the limit give the OS more RAM?

In one narrow sense, yes. If the GPU is under sustained high load and tries to wire a lot of memory, a lower cap prevents it from going past that ceiling. On a 16 GB Mac, setting the cap to 8192 MB means the GPU cannot wire more than about 8 GB.

But this is not the same as gaining RAM. GPU memory is not pre-reserved at boot. It grows when workloads need it. Lowering the cap only matters when a GPU-heavy app tries to exceed the cap.

Scenario	Effect of a lower cap
Normal desktop use	Usually no visible effect
Moderate GPU use	May reduce GPU headroom with little system benefit
Heavy GPU use	Can preserve OS headroom and reduce worst-case pressure
Large LLM or Metal workload	May cause earlier allocation failure or reduced offload

The asymmetry is important:

Raising the limit removes an artificial ceiling.
Lowering the limit adds an artificial ceiling.

For someone who does not care about games or maximum GPU performance, lowering the cap can still be useful as a stability bias.

4. Why local LLM tools use this setting

Tools such as LM Studio, Ollama, and llama.cpp use Metal on Apple Silicon. When they offload model layers to the GPU, they allocate Metal buffers backed by unified memory.

These tools can show a GPU/RAM split because they track their own allocations. They are not reading a global macOS "VRAM used" counter. They know:

which tensors they placed in CPU memory,
which tensors they placed in Metal buffers,
how many bytes those allocations require.

For LLMs, increasing iogpu.wired_limit_mb can allow more layers, a larger context, or a larger model configuration to fit in the Metal working set.

Risk: setting the value too high can leave too little memory for macOS and normal apps. The result can be severe memory pressure, swap churn, freezes, or reboots.

5. How to monitor the effect

macOS does not provide a clean equivalent of nvidia-smi for Apple Silicon GPU memory. Practical monitoring is indirect.

Signal	Command / tool	Meaning
Wired memory	`vm_stat`, Activity Monitor	GPU and kernel pinned memory trend
Compressed memory	`vm_stat`	System is squeezing memory to avoid swap
Swap used	`sysctl vm.swapusage`	Memory pressure has spilled to disk
GPU activity	`powermetrics --samplers gpu_power`	GPU is actively doing work, but not memory usage
Accurate per-app GPU allocations	Xcode Instruments / Metal tools	Best for Metal app profiling, not global monitoring

A minimal Ruby loop can sample wired memory once per second:

PAGE_SIZE = 16 * 1024

def wired_pages
  out = `vm_stat`
  line = out.lines.find { |l| l.start_with?("Pages wired down") }
  line.split(":").last.strip.to_i
end

def to_mb(pages)
  pages * PAGE_SIZE / 1024.0 / 1024.0
end

prev = wired_pages
loop do
  sleep 1
  current = wired_pages
  puts "wired: #{to_mb(current).round(1)} MB (delta #{to_mb(current - prev).round(1)} MB)"
  prev = current
end

6. Testing an 8 GB Mac-like GPU limit

An 8 GB Apple Silicon Mac does not literally have a fixed 5.6 GB VRAM pool. Still, a practical approximation is to set the GPU wired limit to around 5.5-6 GB.

sudo sysctl iogpu.wired_limit_mb=5632

That tests whether a workload behaves under a GPU ceiling similar to an 8 GB machine.

To make the test more realistic on a 16 GB Mac, add system memory pressure by allocating RAM. Otherwise, the CPU side still has far more room than an actual 8 GB machine.

7. Ruby scripts for measurement and reporting

The testing script evolved into a small benchmark harness with these goals:

test multiple iogpu.wired_limit_mb values,
optionally allocate a configurable amount of RAM,
sample wired memory, compressed memory, swap, free memory, and pageouts,
restore the original GPU limit when finished,
write raw CSV, summary CSV, JSON, and a Markdown report.

Example run:

ruby gpu_limit_report_local.rb \
  --limits-mb 5632,6144,7168,8192 \
  --hog-gb 6 \
  --duration 120 \
  --interval 1 \
  --warmup 8 \
  --report-prefix 16gb-test \
  --output-dir .

The reporting script should write to the current directory by default, with an optional --output-dir for explicit output placement.

A second script interprets the generated JSON or summary CSV:

ruby interpret_gpu_limit_report.rb --input 16gb-test-YYYYMMDD-HHMMSS.json

The interpretation should prefer values that keep compressed memory and swap stable during the user's normal workload. For a 16 GB Mac used mainly for browser, editor, terminal, Ruby, and occasional AI work, useful test values are:

5632 MB
6144 MB
6656 MB
7168 MB
8192 MB

8. Practical conclusion

The original command is useful, but not as a generic "make my Mac faster" tweak.

Best mental model: iogpu.wired_limit_mb controls how much unified memory the GPU may wire. Raising it helps large Metal workloads such as local LLMs. Lowering it can preserve system headroom under extreme GPU load, at the cost of GPU capacity.

For a 16 GB Mac where game performance does not matter, a conservative daily default around 5.5-6.5 GB is defensible. The right value is not theoretical; it should be chosen from observed compression, swap, and responsiveness under real workloads.