Releasing Python's GIL

And how to go vroom vroom without any seg faults

Understanding Python’s GIL and Memory Management

The Global Interpreter Lock (GIL) in Python has long been a subject of discussion in the world of concurrent programming. To understand why the GIL exists and how we can work around it, we first need to delve into Python’s memory management system, particularly its reference counting mechanism.

Python’s Reference Counting System

Python uses a reference counting system for memory management, complemented by a cyclic garbage collector. Each object in Python has a reference count, which is incremented when a new reference to the object is created and decremented when a reference is deleted.

Here’s a simple example to illustrate reference counting in Python:

import sys

# Create a list
a = [1, 2, 3]
print(sys.getrefcount(a) - 1)  # Subtract 1 for the temporary reference created by getrefcount()

# Create another reference to the same list
b = a
print(sys.getrefcount(a) - 1)

# Remove one reference
del b
print(sys.getrefcount(a) - 1)

This code might output:

1
2
1

The Global Interpreter Lock (GIL)

The reference counting mechanism is not thread-safe, which is one of the primary reasons for the GIL’s existence. The GIL ensures that only one thread executes Python bytecode at a time, preventing race conditions in reference counting and other internal operations.

While the GIL is crucial for maintaining the integrity of Python’s memory management, it becomes a bottleneck for CPU-bound tasks in multi-threaded environments. This limitation has led developers to seek alternative solutions for achieving true parallelism in Python applications.

Understanding the Python C API

The Python C API is a powerful interface that allows developers to extend and embed Python using C or C++. It provides a set of functions, macros, and structures that enable direct interaction with Python’s internals from compiled languages. This API is the foundation for creating Python extensions, which are modules written in C that can be imported and used like regular Python modules. It also allows for embedding Python within C/C++ applications, giving them the ability to execute Python code and manipulate Python objects. The C API offers fine-grained control over Python’s memory management, object creation, and execution, making it possible to implement performance-critical components in C while seamlessly integrating with Python code. However, using the C API requires careful management of reference counts and understanding of Python’s internal object model to avoid memory leaks and ensure thread safety, especially when dealing with the Global Interpreter Lock (GIL).

Bridging Python and Nim: A Solution for GIL-Free Concurrency

One powerful approach to overcome the GIL’s limitations is to leverage concurrency features of other languages like c++, c or otherwise, while still utilizing Python’s extensive ecosystem. As a test-case e we will use Nim, a statically typed compile-to-C language, offers seamless Python interoperability along with true parallelism.

Let’s explore a Nim-based solution that allows us to release the GIL and harness concurrent execution while still interacting with Python objects when necessary.

Managing the GIL

Just Give me the Code
# Load necessary Python C API functions
let
  PyGILState_Ensure =
    cast[proc(): PyGILState_STATE {.cdecl, gcsafe.}](py.symAddr("PyGILState_Ensure"))
  PyGILState_Release = cast[proc(state: PyGILState_STATE) {.cdecl, gcsafe.}](py.symAddr(
    "PyGILState_Release"
  ))
let
  PyEval_SaveThread =
    cast[proc(): PyThreadState {.cdecl.}](py.symAddr("PyEval_SaveThread"))
  PyEval_RestoreThread =
    cast[proc(tstate: PyThreadState) {.cdecl.}](py.symAddr("PyEval_RestoreThread"))

var mainThreadState: PyThreadState

proc initPyThread*() =
  # This should be called once at the start of your program
  mainThreadState = PyEval_SaveThread()

template withPyGIL*(code: untyped) =
  let state = PyGILState_Ensure()
  try:
    code
  finally:
    PyGILState_Release(state)

proc withoutPyGIL*(body: proc()) =
  let threadState = PyEval_SaveThread()
  try:
    body()
  finally:
    PyEval_RestoreThread(threadState)

# Example usage
proc pyThreadSafeFunction() {.gcsafe.} =
  withPyGIL:
    let nx = pyImport("networkx")
    let g = nx.path_graph(3)
    g.nodes()[0]["example_trait"] = "example_value"

# Initialize Python threading
initPyThread()

# Use in your code
import malebolgia
var m = malebolgia.createMaster()
m.awaitAll:
  for idx in (0 .. 10000):
    m.spawn:
      pyThreadSafeFunction()
pyThreadSafeFunction()

To manage the GIL we need to achieved three things

  1. Acquire the GIL from a Thread
  2. Do some work such as creating or deleting python objects
  3. Release the GIL

To acquire the GIL we make use of the c-API. To use these we must first tell our programming language of choice how to interact with them.

For Nim, some of the heavy lifting is already done in the package Nimpy. With Nimpy, there are functions to interface with python interpreters. We can use these functions to acquire and release the GIL, by loading the necessary functions from the Python C API directly.

import nimpy, nimpy/py_lib
import dynlib

type
  PyGILState_STATE* = distinct int
  PyThreadState* = pointer

initPyLibIfNeeded()
let py = py_lib.pyLib.module

This snippets defines the GIL state as a distinct integer. The python interpreter is loaded and initialized. The py variable is used to access the python interpreter and is internally defined in the Nimpy module.

Now that we have the necessary C API functions loaded, we need to set up our thread state management:

var mainThreadState: PyThreadState

proc initPyThread*() =
  mainThreadState = PyEval_SaveThread()

Here we declare a variable mainThreadState to hold our main Python thread state. The initPyThread procedure is crucial - it releases the GIL and saves the thread state. This procedure should be called once at the start of our program to initialize Python threading in our Nim environment.

Next, we’ll create a template to safely execute Python code with the GIL acquired:

template withPyGIL*(code: untyped) =
  let state = PyGILState_Ensure()
  try:
    code
  finally:
    PyGILState_Release(state)

This withPyGIL template is a key component of our GIL management strategy. It ensures that the GIL is acquired before executing any Python code, and then releases it afterward, even if an exception occurs. This template allows us to write Nim code that safely interacts with Python objects.

To complement this, we also need a way to release the GIL for CPU-bound Nim code:

proc withoutPyGIL*(body: proc()) =
  let threadState = PyEval_SaveThread()
  try:
    body()
  finally:
    PyEval_RestoreThread(threadState)

The withoutPyGIL procedure allows us to release the GIL, execute some Nim code, and then reacquire the GIL. This is useful when we want to perform CPU-intensive tasks in Nim without blocking other Python threads.

Now, let’s look at how we might use these constructs in practice:

proc pyThreadSafeFunction() {.gcsafe.} =
  withPyGil:
    let nx = pyImport("networkx")
    let g = nx.path_graph(3)
    g.nodes()[0]["example_trait"] = "example_value"

This pyThreadSafeFunction demonstrates how to use our withPyGIL template. It safely imports a Python module (networkx), creates a graph, and modifies a node attribute. The gcsafe pragma indicates that this function is safe to use in a multi-threaded context.

Finally, let’s see how we can use this in a concurrent setting:

initPyThread()

import malebolgia
var m = malebolgia.createMaster()
m.awaitAll:
  for idx in (0 .. 10000):
    m.spawn:
      pyThreadSafeFunction()
pyThreadSafeFunction()

Here, we first call initPyThread to set up our Python threading environment. We then use the malebolgia library to create a thread pool. Within this pool, we spawn 10,001 instances of our pyThreadSafeFunction, demonstrating how we can perform concurrent Python operations from Nim. After the concurrent execution, we call pyThreadSafeFunction once more outside of the concurrent context.

This setup allows us to efficiently manage the Python GIL in a Nim environment, enabling true concurrency while still leveraging Python’s rich ecosystem of libraries. By carefully controlling when we acquire and release the GIL, we can write high-performance concurrent code in Nim that seamlessly integrates with Python functionality.

Benefits and Considerations

By using this approach, we can write high-performance concurrent code in Nim that interacts with Python libraries when needed, while also allowing for true parallelism in CPU-bound sections. This technique offers several benefits:

  1. True Parallelism: We can perform CPU-intensive tasks in Nim without being constrained by the GIL.
  2. Python Ecosystem Access: We maintain the ability to use Python’s extensive library ecosystem when needed.
  3. Fine-grained Control: We have precise control over when to acquire and release the GIL.

However, there are also important considerations:

  1. Complexity: This approach adds complexity to the codebase and requires a good understanding of both Nim and Python internals.
  2. Safety: Care must be taken to ensure thread safety and proper GIL management to avoid race conditions or deadlocks.
  3. Performance Overhead: There’s some overhead in switching between Nim and Python contexts, which should be considered in performance-critical applications.

Finishing Thoughts

While Python’s GIL presents challenges for concurrent programming, especially in CPU-bound tasks, integrating Nim provides a powerful solution. By leveraging Nim’s concurrency features and its ability to interface with Python, we can achieve true parallelism while still benefiting from Python’s rich ecosystem.

This approach opens up new possibilities for performance optimization in projects that bridge Nim and Python, offering the best of both worlds. However, it requires careful management and thorough testing to ensure correct behavior across different execution scenarios.

The effort invested in creating this post was significantly greater than it might appear. Many Python programmers are rarely exposed to lower-level programming languages, which is unfortunate given the substantial performance gains they can offer. Python’s Global Interpreter Lock (GIL) is a major bottleneck that often goes unnoticed by the wider programming community.

While Python excels in rapid prototyping, it can hit performance limitations in computationally intensive tasks. This leads to what I call the “two-language problem”: developers find themselves needing to rewrite performance-critical parts of their code in a different, faster language. This challenge is prevalent in the industry but often overlooked.

I hope that with this post it can help someone start their journey on looking at the C API and hopefully kill some of the frustration that I had when I first started looking at it.

These are exciting times for Python. With python 3.13 on the horizon, the ground is tested for a new era of python programming where the GIL can be released. Some initial tests shows that threading performance can drastically improve pythons concurrent performance. However, it currently comes at a cost of performance on parallel tasks. Time can only tell if these speed-bumps will continue to exist. For now we can only hope that the future of python will be bright and that the GIL will be a thing of the past.

Casper van Elteren
Casper van Elteren
Computational scientist | Data scientist | Tinkerer

I am a computational scientist interested in data analysis, visualization and software engineering.