Releasing the GIL

How to go vroom vroom without generating seg faults

Programming in python is quick – in a few hours you can set up things such as a website, implement an agent-based model or run some analysis. Python provides a rich ecosystems of libraries and modules that can extend the reach of your code. However, python is not the fastest language out there. For computational heavy tasks, such as running simulations, python can be slow, very slow in fact. One of the limitations of python is that it inherently a single threaded language. This means that only one thread can run at a time. This is a problem sinds it limits one of the major development of recent years – the emergence of many (many) cores. In this post, I will show you how to release the Global Interpreter Lock (GIL) and call python code from another language. In this way you can make use of the full power of a compiled language while still harnessing the availability of the rich ecosystem of python libraries – while taking a minor pefromance hit.

What is the GIL? And why is it a problem?

Every programming language has its own quirks and python is no exception. One of the most important quirks of python is the Global Interpreter Lock (GIL). The GIL is a mutually exlclusive (mutex) flag that protects access to python objects, preventing multiple threads from executing python bytecodes at once. This means that only one thread can run at a time. Inititally, the GIL was meant to solve a memory management issue of python. You see, python’s memory system uses a form called reference counting. Every object that is created or destroyed will increase or respectively decrease this reference count. In a multi-threaded setting, this can lead to issues as different threads create or destroy objects. For example say the reference count is 0 and two threads try to increase the reference count at the same time. Then the reference count will be 1 instead of 2. This can lead to memory leaks or worse, memory corruption. The GIL was introduced to prevent this from happening. However, the GIL has a major downside – it makes python single threaded. This is a problem sinds the number of cores in a computer has been increasing over the years. This means that python is not able to take advantage of the full power of a computer.

Reference counting in action

import sys
a = []
b = a
sys.getrefcount(a) # 3

Releasing the GIL

To release the GIL we need to interface with the c-api directly. Python at its roots in written in C. This means that we can call python code from C and vice versa. By releasing the GIL we allow other threads than the main thread to run python code while not messing up our reference counting and thus not creating any memory leaks.

Relelasing the GIL consists of three steps:

  1. Inititalize python in the main thread
  2. Release the GIL
  3. Acquire the GIL from a thread
    • And releasing it when you are done

Initializing python in the main thread

My language of choice is Nim, but the steps are similar in other languages. In Nim we can make use of the package Nimpy that provides a wrapper around the python c-api. To initialize python in the main thread we can use the following code:

import nimpy, nimpy/[py_lib, py_types]
initPyLibIfNeeded()
let py = py_lib.pyLib.module

In the file nimpy/py_lib an object can be found called pyLib. This object is a pointer to the loaded pymodule. There are different ways to load the pymodule and we could in principle run multiple different python interpreters in different threads. However, for now we will stick to one python interpreter.

Now that we have the python module loaded we need to tell Nim which function of the c-api to call.

import dynlib # exposes symaddr`
# Load necessary Python C API functions
let
  PyGILState_Ensure =
    cast[proc(): PyGILState_STATE {.cdecl, gcsafe.}](py.symAddr("PyGILState_Ensure"))
  PyGILState_Release = cast[proc(state: PyGILState_STATE) {.cdecl, gcsafe.}](py.symAddr(
    "PyGILState_Release"
  ))

The functions PyGilState_Ensure and PyGILState_Release are part of the c-api that python provides. The first function ensures that the GIL is acquired and the second function releases the GIL. Note that the GIL is relative to the thread we are running python on indicated by py in the code above. The symAddr function is part of Nim’s dynlib package and returns the address of the function in the c-api.

The functions contain a gcsafe pragma. This pragma tells the garbage collector that the function is safe to call. This is important sinds the garbage collector can move objects around in memory. If the garbage collector moves an object while a function is running that is not safe to call, the program will crash. The gcsafe pragma tells the garbage collector that the function is safe to call and that it should not move objects around in memory.

Now that we have the functions we can release the GIL. We can wrap interfacing with the GIL in a nice wrapper that will hold and release the GIL when we are done.

template withPyGIL*(code: untyped) =
  let state = PyGILState_Ensure()
  try:
    code
  except Exception as e:
    echo "Caught: ", e.msg
  finally:
    PyGILState_Release(state)

The code provides as template such that we can readily write python code and it is clear to the programmer that we are holding the GIL. The code starts by calling PyGILState_Ensure and storing the state in a variable. Then the code is executed. If an exception is raised the exception is caught and printed to the console. Finally, the GIL is released by calling PyGILState_Release. The final step is extremely important as it will prevent the GIL from being locked indefinitely if or when an error occurs within one of the thread. That is, if an error is raised inside the thread, the thread will exit while still holding the GIL, which creates a deadlock – a condition in which all the threads are waiting for some event to occur that allows them to continue, but this event will never happen.

This template can then be called as

withPyGil:
  # code to run that needs the GIL to be active such as creating or destroying objects

Example of Releasing the GIL

In my use case I wanted to create a graph on a thread that will be passed to my simulation function that runs on a different thread. The actually code is too complex to show here, but I will show you a simplified version of the code.

import nimpy
import nimpy/py_lib
import dynlib

type
  PyGILState_STATE* = distinct int
  PyThreadState* = pointer

initPyLibIfNeeded()
let
  py = py_lib.pyLib.module
  nx = pyImport "networkx"

# Load necessary Python C API functions
let
  PyGILState_Ensure =
    cast[proc(): PyGILState_STATE {.cdecl, gcsafe.}](py.symAddr("PyGILState_Ensure"))
  PyGILState_Release = cast[proc(state: PyGILState_STATE) {.cdecl, gcsafe.}](py.symAddr(
    "PyGILState_Release"

template withPyGIL*(code: untyped) =
  let state = PyGILState_Ensure()
  try:
    code
  finally:
    PyGILState_Release(state)

# Example usage
proc pyThreadSafeFunction() {.gcsafe.} =
  withPyGIL:
    # create objects
    let g = nx.path_graph(3) # note uses globally available nx
    # modify objects
    g.nodes()[0]["example_trait"] = "example_value"

import malebolgia # for multi-threading
if isMainModule:
  var m = createMaster() # create pool of threads

  m.awaitAll:
      # create many threads
      let n = 10000
      for idx in (0..n):
        m.spawn pyThreadSafeFunction()

The code above shows how to create a graph on a thread that will be passed to a simulation function that runs on a different thread. The code starts by importing the necessary modules and loading the python module. Then the necessary functions are loaded from the c-api. The withPyGIL template is defined that will hold and release the GIL when we are done. The pyThreadSafeFunction is defined that will create a graph on a thread. The function is marked as gcsafe to tell the garbage collector that it is safe to call. The function is then called in a loop to create many threads that will create a graph. The createMaster function is called to create a pool of threads. The awaitAll function is called to wait for all threads to finish. The spawn function is called to create a thread that will call the pyThreadSafeFunction function.

Conclusion

I hope this post shines some light on the GIL and potentially help somebody else venture into the c-api of python.Python programmers are not often exposed to the memory related issues that C and C++ programmers face. The GIL is one of the most important issues that programmers face when interfacing with python from another language. Luckily python provides a very well documented c-api. Working in-between languages often yields issues where you are not really sure where to turn as the inbetween is not well documented or is expected knowledge when coming from a different language. For me the post above is many frustrated hours of knowing what I wanted to achieved, but not entirely sure how to achieved it. I hope this post helps somebody else in the same situation.

With python 3.13 on the horizon, you will be able to unlock the GIL from python. I am exited to see if this results in any performance gains.

The GIL is a complex topic and I have only scratched the surface. I hope to write more about the GIL in the future. If you have any questions or comments, please let me know. I am always happy to help.