Skip to content

Go's Scheduler: How the G-M-P Model Works

Published: at 02:50 AM

Table of contents

Open Table of contents

The Core Problem: Why Are OS Threads Too Heavy?

In languages like Java or C++, applications typically map threads directly 1:1 to operating system (OS) threads. This is problematic for two main reasons:

  1. Memory Footprint: Each OS thread comes with a fixed-size stack of around 1MB to 2MB. Spawning 10,000 threads would require 10GB to 20GB of RAM just for their stacks.
  2. Context Switching Cost: When the CPU switches from running one thread to another, it has to save and load registers and memory states. This swap happens in kernel space and takes about 1,000 to 2,000 nanoseconds, which adds up fast.

Go uses an M:N scheduler. It maps N lightweight Goroutines onto M heavy OS threads.

FeatureOS Threads (1:1 Model)Goroutines (M:N Model)
Stack Size~1MB - 2MB (Fixed size)~2KB (Grows & shrinks dynamically)
Creation CostHigh (Requires OS kernel allocation)Very Low (User-space allocation)
Switching SpeedSlow (~1-2 μs, enters OS kernel)Fast (~100-200 ns, stays in Go runtime)
Managed ByOperating System KernelGo Runtime Scheduler

The Evolution of Go Stacks: Solving the “Hot Split”

To make Goroutines so cheap, Go had to solve a major stack memory problem:


Meet the G-M-P Model

Go’s scheduler uses three main components, represented by the letters G, M, and P. Think of them like workers in a factory:

The Concurrency Formula: N >= M >= P

The relationship between the number of Goroutines (N), OS threads (M), and logical processors (P) is governed by the following formula:

N (Goroutines) ≥ M (OS Threads) ≥ P (Logical Processors)

Loading graph...

Loading graph...

The Anatomy of a Context Switch: Registers & TLB Cache

Why is switching between Goroutines so much faster than OS threads? It comes down to what the CPU has to save at the hardware level:


How Go Schedules Work

Each desk (P) has a Local Run Queue (LRQ) containing up to 256 tasks (Gs) waiting to be run. There is also a shared Global Run Queue (GRQ) for any extra tasks.

When an OS thread (M) wants to work, it must sit at a desk (P) and look for a task (G) to run. It searches for tasks in this order:

  1. Check Starvation (The 61-Tick Rule): Every 61 scheduler iterations, P checks the Global Run Queue first. This ensures tasks in the global queue don’t get ignored forever.
  2. Check Local Run Queue: M grabs a G from its own local queue.
  3. Check Global Run Queue: If the local queue is empty, M checks the shared global queue.
  4. Check Network Poller: M checks if there are any network requests that just finished.
  5. Work Stealing: If there is still no work, M goes to other desks (Ps) and steals half of their waiting tasks.

Loading graph...


Handling Blocks and Bottlenecks

What happens when a task gets stuck (e.g., waiting for a file to read, a network request, or a lock)? Go handles these blocks in two primary ways:

1. Network I/O (The Netpoller)

If a Goroutine blocks on a network request, Go doesn’t block the OS thread (M).

2. Blocking System Calls (Thread Handoff)

Some operations, like reading files from a hard drive, block the entire OS thread (M). In this case:

Loading graph...


Preemption: Sharing CPU Time Fairly

What if a Goroutine is running a massive loop (e.g., for { }) and refuses to let other tasks run?


Best Practices for Developers

Now that you know how the scheduler works, you can write more efficient code:

  1. Use automaxprocs in Docker: By default, Go sets GOMAXPROCS to the number of host CPUs. In container environments (like Kubernetes), this can cause thread thrashing if your container is restricted to 1 or 2 CPU limits. Import go.uber.org/automaxprocs to automatically match container CPU limits.
  2. Observe with GODEBUG: You can print out the state of your scheduler in real-time by running your app with the schedtrace flag:
    $ GODEBUG=schedtrace=1000 ./my-app
    
    This prints statistics every 1000ms:
    SCHED 1002ms: gomaxprocs=4 idleprocs=1 threads=6 spinningthreads=0 idlethreads=2 runqueue=1 [1 0 0 0]
    
    • runqueue=1: One task in the Global Queue.
    • [1 0 0 0]: Number of tasks waiting at each of the 4 desks (Ps).

Conclusion

Go’s scheduler is a masterpiece of systems engineering. By decoupling tasks (G) from physical threads (M) using logical desks (P), Go minimizes context switches and memory overhead. Understanding these internals helps you build high-performance, container-friendly applications.