Multi-Core and Distributed Programming in Python

compute time

In the age of big data we often find ourselves facing CPU-intensive data processing tasks, therefore it is useful to understand how to harness all available CPU power to tackle a particular problem. Recently we came across a Python script which was CPU-intensive, but when the analyst viewed their overall CPU usage it was only showing ~25% utilization. This was because the script was only running in a single process, and therefore only fully utilizing a single core. For those of us with a few notches on our belts, this should seem fairly obvious, but I think it is a good exercise and teaching example to talk about the different methods of multi-core/multi-node programming in Python. This isn’t meant to be an all-encompassing tutorial on multi-core and distributed programming, but it should provide an overview of the available approaches in Python.