Go's Scheduler: How the G-M-P Model Works

Open Table of contents

The Main Idea
What Happens When You Start a Goroutine
How a P Chooses the Next Goroutine
Local Queues, Global Queue, and Runnext
Work Stealing
When a Goroutine Waits
Network I/O
Blocking Syscalls
Preemption
A Full Example
Mental Model

The Main Idea

An M runs code on a real OS thread, but it cannot run normal Go code unless it owns a P.

A P owns the scheduler state needed to run goroutines. The most important part is its local run queue: a small queue of runnable Gs waiting for CPU time.

GOMAXPROCS controls how many Ps exist. That means it controls how many OS threads can run Go code at the same time. The runtime may have more OS threads than GOMAXPROCS, but only threads with a P can execute Go code.

Loading graph...

In a busy program, the shape often looks like this:

many goroutines (G)
fewer OS threads (M)
a fixed number of processors (P)

So people often write it as G >= M >= P. That is a useful mental model, not a hard rule. The runtime can create extra Ms when threads block in syscalls, and some Ms can exist without a P.

The hard rule is this: an M needs a P to run Go code.

What Happens When You Start a Goroutine

The compiler turns a go statement into a runtime call that creates a new G.

In proc.go, newproc creates that goroutine and calls runqput(pp, newg, true). That means the runtime prioritizes the new goroutine by placing it directly in runnext. If that slot is already occupied, the existing goroutine in runnext is evicted and pushed to the current P’s local run queue (or to the global queue if the local queue is full) to make room. It also calls wakep so an idle thread can wake up if more execution capacity is needed.

The new goroutine does not run immediately just because you wrote go.

It becomes runnable:

Loading graph...

A runnable goroutine waits until some M with a P picks it.

How a P Chooses the Next Goroutine

The runtime function schedule asks findRunnable for work. That search is the heart of the scheduler.

The full path includes runtime work and normal goroutine work. In simple terms, findRunnable does this:

Check runtime work such as safe points, trace reader work, and GC worker work.
Check the global run queue once in a while.
Wake finalizer and cleanup goroutines if needed.
Check runnext, then the local run queue for the current P.
Check the global run queue again and move a batch to the local queue.
Poll the network poller without blocking.
Steal work from other Ps.
Try idle GC or wasm callback work.
If there is still no work, release the P, recheck for missed work, then block in netpoll or stop the M.

The global queue is checked every 61 scheduler ticks before the local queue. This avoids a bad case where goroutines on one local queue keep creating more goroutines and global work waits too long.

Loading graph...

Each search box returns immediately if it finds a goroutine to run. If none of them finds work, the M releases its P, sleeps or waits in netpoll, and later starts the search again.

The runtime also checks internal work such as tracing, GC workers, finalizers, cleanups, and timers. Those details matter inside the runtime, but the main user-level idea stays the same: the scheduler is always looking for a runnable G.

Local Queues, Global Queue, and Runnext

Each P has:

a local run queue with 256 slots
a runnext slot for a goroutine that should run very soon
access to the shared global run queue

When a new goroutine is created, the runtime usually tries to put it in runnext or the local queue of the current P. This keeps related work close together, which is good for CPU cache and reduces global lock use.

If the local queue is full, the runtime moves about half of that local queue plus the new goroutine to the global queue. This keeps one P from holding too much work.

When an M gets work from the global queue, it may take a batch and place the extra goroutines into its local queue. This reduces repeated locking on the global queue.

Work Stealing

If a P has no local work and the global queue has no useful work, the M becomes a spinning worker and tries to steal from other Ps.

Work stealing means:

Pick another P.
Look at that P’s local queue.
Steal about half of its local queue. In the code, runqgrab takes the older half.
Return one stolen goroutine to run now and keep the rest in the stealing P’s local queue.

Loading graph...

The runnext slot is only stolen as a last resort, after normal local queue stealing does not find work.

This is why goroutines spread across available CPU time without a single central queue becoming the bottleneck.

The runtime also limits how many spinning Ms exist. A spinning M uses CPU while looking for work, so too many spinning threads would waste power and make the program slower.

When a Goroutine Waits

A goroutine does not always run until it finishes. It may wait or leave normal Go execution for:

a channel send or receive
a mutex
a timer
network I/O
a syscall, which is tracked as _Gsyscall, not as a normal parked goroutine

When a goroutine waits inside the Go runtime, it is parked. Internally, gopark moves the current G from running to waiting, then the M goes back to the scheduler and looks for another runnable G.

Later, when the thing it waited for is ready, the runtime calls goready. That puts the goroutine back on a run queue so it can run again. Syscalls use a different path: entersyscall or entersyscallblock moves the goroutine to _Gsyscall, and exitsyscall brings it back.

The important point is that waiting usually blocks the goroutine, not the whole program.

Loading graph...

The syscall path can return straight to Running if exitsyscall gets a P; otherwise exitsyscallNoP makes the goroutine runnable again.

Network I/O

Network I/O is handled by the runtime netpoller.

If a goroutine waits for network data, the runtime can park that G and let the M keep using its P to run other goroutines. The OS notifies Go when the socket is ready. Then netpoll returns the ready goroutine list, and the runtime puts those goroutines back into scheduler queues.

That is why many goroutines can wait on network connections without needing one blocked OS thread per connection.

Blocking Syscalls

Some calls can block the OS thread itself. File I/O and some cgo calls can behave this way.

When a goroutine enters a blocking syscall, the runtime must protect the P. If the blocked M kept the P, other goroutines waiting on that P could not run.

So the runtime hands the P to another M.

Loading graph...

When the syscall returns, the old M tries to keep or get a P again. First it may still have its P. If not, it tries its old P, then an idle P. If that also fails, exitsyscallNoP changes the goroutine back to runnable, puts it on the global queue, and parks the M.

The sysmon thread also watches for long syscalls. If a P has been tied to a syscall for too long and there is work to do, sysmon can retake that P and hand it to another M.

Preemption

A goroutine can also run for too long without blocking. The runtime needs a way to stop one goroutine from taking a P forever.

sysmon watches running Ps. If the same scheduler tick runs for about 10 ms, the runtime asks that P to preempt the current goroutine.

Preemption is a request, not always an instant stop. The runtime sets preemption flags and, when supported, asks the OS thread to take an async preemption signal. Once the goroutine reaches a safe point, it is stopped, changed to a runnable state (_Grunnable), and is typically placed on the global run queue (or the local P’s runnext slot if preempted to cooperate with a pending Garbage Collector Stop-The-World request). This allows another goroutine to run.

This keeps CPU-bound goroutines from starving the rest of the program.

A Full Example

Imagine this program:

func main() {
    go readFromNetwork()
    go compressFile()
    go writeLog()

    select {}
}

Here is what the scheduler sees:

main creates three new Gs.
Those Gs are put on a local queue or runnext.
An M with a P picks one G and runs it.
If readFromNetwork waits on a socket, it parks and netpoll tracks the socket.
The same M can run compressFile or writeLog.
If compressFile uses CPU for too long, preemption can let another G run.
If writeLog enters a blocking syscall, its M may lose the P so another M can keep running Go code.
When network data or the syscall result is ready, that G becomes runnable again.

That is goroutine scheduling in practice: goroutines move between runnable, running, waiting, and done. Ps hold the run queues and execution rights. Ms are the OS threads that do the running.

Mental Model

Keep this model in your head:

A goroutine is work, not a thread.
A runnable goroutine waits in a queue.
An OS thread must own a P before it can run Go code.
GOMAXPROCS controls how many Ps can run Go code at once.
Local queues make scheduling fast.
The global queue keeps work shared.
Work stealing balances busy and idle Ps.
Parking blocks a goroutine, not necessarily an OS thread.
Blocking syscalls may block an OS thread, so the runtime hands the P away.
Preemption stops long-running goroutines from taking a P forever.

The scheduler is not magic. It is a loop that keeps finding runnable goroutines and matching them with OS threads that are allowed to run Go code.

Source checked while editing: runtime/proc.go in the Go repository.