Profiling & Code Performance
Context
Profiling consists in dynamically analyzing a program (i.e. during execution) to monitor various aspects, most commonly performance. It helps identifying and fixing the code responsible for (performance) issues.
Two things to keep in mind:
Avoid premature optimization, i.e. performance optimization before it is clear what the program should do (see When to optimize).
A small gain in performance may not justify losing readability (see Trade-offs).
This page offers pointers to recommended profiling tools depending on language / framework.
Built-in Profilers
- Go: the standard library comes with pprof
- Python comes with cProfile, which is powerful but not very intuitive
Recommended Profilers
Python
-
scalene is the most complete profiler for python (in 2024)
- can profile cpu, gpu, ram, multiple threads
- chatGPT integration to provide code improvement suggestions
- clean web-interface
-
pyinstrument is very easy to use
- simple command line interface
- output easy to understand (ascii tree report)
If you plan to routinely use a profiler in your python project, it makes sense to include it as a development dependency to ensure it runs in your project environment with the same interpreter and dependency versions you are using. Most python package managers support optional dependencies or dependency groups (e.g.: setuptools, poetry, uv, pdm)
R
- profvis is the most popular proviler in
R.
- directly integrated into Rstudio UI by default
- interactive flamegraph visualization
- line-by-line cpu and ram profiling
Rust
See The rust performance book for an up-to-date list of profilers (the ecosystem moves fast).
Usage
Most profilers can be used either on the command line to profile the entire program, or as a library to insert hooks into the code and profile specific functions or pieces of code.
Here is an example with pyinstrument:
# ./example.py
# code to be profiled
from time import sleep
def fast_hello(num: int) -> str:
sleep(num)
return "hello"
def slow_hello(num: int) -> str:
sleep(num**2)
return "hello"
if __name__ == "__main__":
for i in range(3):
fast_hello(i)
slow_hello(3)
Command Line Usage
When executing the script through the profiler, we obtain a report with timing of all function calls during execution:
$ pip install pyinstrument
$ pyinstrument ./example.py
_ ._ __/__ _ _ _ _ _/_ Recorded: 17:01:36 Samples: 7
/_//_/// /_\ / //_// / //_'/ // Duration: 12.003 CPU time: 0.003
/ _/ v5.0.0
Program: example.py
12.001 <module> example.py:1
├─ 9.000 slow_hello example.py:7
│ └─ 9.000 sleep <built-in>
└─ 3.000 fast_hello example.py:3
└─ 3.000 sleep <built-in>
To view this report with different options, run:
pyinstrument --load-prev 2024-10-23T17-01-36 [options]
Library Usage
We can also profile only a specific function to obtain a target report.
import pyinstrument
# example.py
def fast_hello(num: int) -> str:
sleep(num)
return "hello"
@pyinstrument.profile()
def slow_hello(num: int) -> str:
sleep(num**2)
return "hello"
if __name__ == "__main__":
for i in range(3):
fast_hello(i)
slow_hello(3)
$ python ./example.py
pyinstrument ........................................
.
. Function slow_hello at /tmp/./example.py:8
.
. 9.000 wrapper pyinstrument/context_manager.py:52
. └─ 9.000 slow_hello example.py:8
. └─ 9.000 sleep <built-in>
.
.....................................................
Benchmarking
Some generic benchmarking tools such as hyperfine can be used to quickly compare the execution time of binaries.
The tool hyperfine
has many useful features for benchmarking, such as warmup
rounds, aggregate statistics over multiple runs, parameter scans, various output
formats etc.
➜ hyperfine --warmup 1 -n ripgrep 'rg test' -n grep 'grep -R test'
Benchmark 1: ripgrep
Time (mean ± σ): 8.4 ms ± 0.6 ms [User: 6.9 ms, System: 6.6 ms]
Range (min … max): 7.4 ms … 10.6 ms 283 runs
Benchmark 2: grep
Time (mean ± σ): 763.9 ms ± 5.9 ms [User: 380.8 ms, System: 380.0 ms]
Range (min … max): 754.4 ms … 773.2 ms 10 runs
Summary
'ripgrep' ran
90.76 ± 6.44 times faster than 'grep'