Profiling & Code Performance

Context

Profiling consists in dynamically analyzing a program (i.e. during execution) to monitor various aspects, most commonly performance. It helps identifying and fixing the code responsible for (performance) issues.

Two things to keep in mind:

Avoid premature optimization, i.e. performance optimization before it is clear what the program should do (see When to optimize).

A small gain in performance may not justify losing readability (see Trade-offs).

This page offers pointers to recommended profiling tools depending on language / framework.

Built-in Profilers

Go: the standard library comes with pprof
Python comes with cProfile, which is powerful but not very intuitive

Recommended Profilers

Python

scalene is the most complete profiler for python (in 2024)
- can profile cpu, gpu, ram, multiple threads
- chatGPT integration to provide code improvement suggestions
- clean web-interface
pyinstrument is very easy to use
- simple command line interface
- output easy to understand (ascii tree report)

tip

If you plan to routinely use a profiler in your python project, it makes sense to include it as a development dependency to ensure it runs in your project environment with the same interpreter and dependency versions you are using. Most python package managers support optional dependencies or dependency groups (e.g.: setuptools, poetry, uv, pdm)

R

profvis is the most popular proviler in R.
- directly integrated into Rstudio UI by default
- interactive flamegraph visualization
- line-by-line cpu and ram profiling

Rust

See The rust performance book for an up-to-date list of profilers (the ecosystem moves fast).

Usage

Most profilers can be used either on the command line to profile the entire program, or as a library to insert hooks into the code and profile specific functions or pieces of code.

Here is an example with pyinstrument:

# ./example.py
# code to be profiled
from time import sleep

def fast_hello(num: int) -> str:
    sleep(num)
    return "hello"

def slow_hello(num: int) -> str:
    sleep(num**2)
    return "hello"

if __name__ == "__main__":
    for i in range(3):
        fast_hello(i)
    slow_hello(3)

Command Line Usage

When executing the script through the profiler, we obtain a report with timing of all function calls during execution:

$ pip install pyinstrument
$ pyinstrument ./example.py

  _     ._   __/__   _ _  _  _ _/_   Recorded: 17:01:36  Samples:  7
 /_//_/// /_\ / //_// / //_'/ //     Duration: 12.003    CPU time: 0.003
/   _/                      v5.0.0

Program: example.py

12.001 <module>  example.py:1
├─ 9.000 slow_hello  example.py:7
│  └─ 9.000 sleep  <built-in>
└─ 3.000 fast_hello  example.py:3
   └─ 3.000 sleep  <built-in>

To view this report with different options, run:
    pyinstrument --load-prev 2024-10-23T17-01-36 [options]

Library Usage

We can also profile only a specific function to obtain a target report.

import pyinstrument

# example.py
def fast_hello(num: int) -> str:
    sleep(num)
    return "hello"

@pyinstrument.profile()
def slow_hello(num: int) -> str:
    sleep(num**2)
    return "hello"

if __name__ == "__main__":
    for i in range(3):
        fast_hello(i)
    slow_hello(3)

$ python ./example.py

pyinstrument ........................................
.
.  Function slow_hello at /tmp/./example.py:8
.
.  9.000 wrapper  pyinstrument/context_manager.py:52
.  └─ 9.000 slow_hello  example.py:8
.     └─ 9.000 sleep  <built-in>
.
.....................................................

Benchmarking

Some generic benchmarking tools such as hyperfine can be used to quickly compare the execution time of binaries.

The tool hyperfine has many useful features for benchmarking, such as warmup rounds, aggregate statistics over multiple runs, parameter scans, various output formats etc.

➜ hyperfine --warmup 1 -n ripgrep 'rg test' -n grep 'grep -R test'
Benchmark 1: ripgrep
  Time (mean ± σ):       8.4 ms ±   0.6 ms    [User: 6.9 ms, System: 6.6 ms]
  Range (min … max):     7.4 ms …  10.6 ms    283 runs

Benchmark 2: grep
  Time (mean ± σ):     763.9 ms ±   5.9 ms    [User: 380.8 ms, System: 380.0 ms]
  Range (min … max):   754.4 ms … 773.2 ms    10 runs

Summary
  'ripgrep' ran
   90.76 ± 6.44 times faster than 'grep'

Context​

Built-in Profilers​

Recommended Profilers​

Python​

R​

Rust​

Usage​

Command Line Usage​

Library Usage​

Benchmarking​