1. The original command
The command is:
sudo sysctl iogpu.wired_limit_mb
On Apple Silicon Macs, this reads the current limit for how much memory the Apple GPU driver is allowed to wire, measured in megabytes.
The related write form is:
sudo sysctl iogpu.wired_limit_mb=8192
That sets the GPU wired-memory cap to 8192 MB until reboot or until it is changed again. A value of 0 means macOS uses its default heuristic.
Plain English: this setting does not create VRAM. It changes how much unified memory the GPU is allowed to lock as non-pageable memory.
2. Why wired memory matters
Apple Silicon uses unified memory. CPU and GPU share the same physical RAM. There is no separate VRAM pool like on a discrete GPU.
The relevant term is wired memory. Wired memory cannot be paged out to disk. Once the GPU wires a large amount of memory, the rest of macOS has less room to maneuver.
This makes iogpu.wired_limit_mb a safety and capacity knob:
- Higher values allow larger GPU-resident working sets.
- Lower values stop the GPU from locking too much memory.
- Neither setting changes the amount of physical RAM in the machine.
3. Does lowering the limit give the OS more RAM?
In one narrow sense, yes. If the GPU is under sustained high load and tries to wire a lot of memory, a lower cap prevents it from going past that ceiling. On a 16 GB Mac, setting the cap to 8192 MB means the GPU cannot wire more than about 8 GB.
But this is not the same as gaining RAM. GPU memory is not pre-reserved at boot. It grows when workloads need it. Lowering the cap only matters when a GPU-heavy app tries to exceed the cap.
| Scenario | Effect of a lower cap |
|---|---|
| Normal desktop use | Usually no visible effect |
| Moderate GPU use | May reduce GPU headroom with little system benefit |
| Heavy GPU use | Can preserve OS headroom and reduce worst-case pressure |
| Large LLM or Metal workload | May cause earlier allocation failure or reduced offload |
The asymmetry is important:
- Raising the limit removes an artificial ceiling.
- Lowering the limit adds an artificial ceiling.
For someone who does not care about games or maximum GPU performance, lowering the cap can still be useful as a stability bias.
4. Why local LLM tools use this setting
Tools such as LM Studio, Ollama, and llama.cpp use Metal on Apple Silicon. When they offload model layers to the GPU, they allocate Metal buffers backed by unified memory.
These tools can show a GPU/RAM split because they track their own allocations. They are not reading a global macOS "VRAM used" counter. They know:
- which tensors they placed in CPU memory,
- which tensors they placed in Metal buffers,
- how many bytes those allocations require.
For LLMs, increasing iogpu.wired_limit_mb can allow more layers, a larger context, or a larger model configuration to fit in the Metal working set.
Risk: setting the value too high can leave too little memory for macOS and normal apps. The result can be severe memory pressure, swap churn, freezes, or reboots.
5. How to monitor the effect
macOS does not provide a clean equivalent of nvidia-smi for Apple Silicon GPU memory. Practical monitoring is indirect.
| Signal | Command / tool | Meaning |
|---|---|---|
| Wired memory | vm_stat, Activity Monitor |
GPU and kernel pinned memory trend |
| Compressed memory | vm_stat |
System is squeezing memory to avoid swap |
| Swap used | sysctl vm.swapusage |
Memory pressure has spilled to disk |
| GPU activity | powermetrics --samplers gpu_power |
GPU is actively doing work, but not memory usage |
| Accurate per-app GPU allocations | Xcode Instruments / Metal tools | Best for Metal app profiling, not global monitoring |
A minimal Ruby loop can sample wired memory once per second:
PAGE_SIZE = 16 * 1024
def wired_pages
out = `vm_stat`
line = out.lines.find { |l| l.start_with?("Pages wired down") }
line.split(":").last.strip.to_i
end
def to_mb(pages)
pages * PAGE_SIZE / 1024.0 / 1024.0
end
prev = wired_pages
loop do
sleep 1
current = wired_pages
puts "wired: #{to_mb(current).round(1)} MB (delta #{to_mb(current - prev).round(1)} MB)"
prev = current
end
6. Testing an 8 GB Mac-like GPU limit
An 8 GB Apple Silicon Mac does not literally have a fixed 5.6 GB VRAM pool. Still, a practical approximation is to set the GPU wired limit to around 5.5-6 GB.
sudo sysctl iogpu.wired_limit_mb=5632
That tests whether a workload behaves under a GPU ceiling similar to an 8 GB machine.
To make the test more realistic on a 16 GB Mac, add system memory pressure by allocating RAM. Otherwise, the CPU side still has far more room than an actual 8 GB machine.
7. Ruby scripts for measurement and reporting
The testing script evolved into a small benchmark harness with these goals:
- test multiple
iogpu.wired_limit_mbvalues, - optionally allocate a configurable amount of RAM,
- sample wired memory, compressed memory, swap, free memory, and pageouts,
- restore the original GPU limit when finished,
- write raw CSV, summary CSV, JSON, and a Markdown report.
Example run:
ruby gpu_limit_report_local.rb \
--limits-mb 5632,6144,7168,8192 \
--hog-gb 6 \
--duration 120 \
--interval 1 \
--warmup 8 \
--report-prefix 16gb-test \
--output-dir .
The reporting script should write to the current directory by default, with an optional --output-dir for explicit output placement.
A second script interprets the generated JSON or summary CSV:
ruby interpret_gpu_limit_report.rb --input 16gb-test-YYYYMMDD-HHMMSS.json
The interpretation should prefer values that keep compressed memory and swap stable during the user's normal workload. For a 16 GB Mac used mainly for browser, editor, terminal, Ruby, and occasional AI work, useful test values are:
- 5632 MB
- 6144 MB
- 6656 MB
- 7168 MB
- 8192 MB
8. Practical conclusion
The original command is useful, but not as a generic "make my Mac faster" tweak.
Best mental model: iogpu.wired_limit_mb controls how much unified memory the GPU may wire. Raising it helps large Metal workloads such as local LLMs. Lowering it can preserve system headroom under extreme GPU load, at the cost of GPU capacity.
For a 16 GB Mac where game performance does not matter, a conservative daily default around 5.5-6.5 GB is defensible. The right value is not theoretical; it should be chosen from observed compression, swap, and responsiveness under real workloads.