Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Ndarray and Python

Python is a very popular programming language for "big data," machine learning, and numerical analysis. NumPy is the de facto matrix library for Python, and the inspiration for Rust's ndarray. This portion of the tutorial demonstrates how to efficiently convert between Python's numpy arrays and Rusts's ndarray arrays.

You should familiarize yourself with how to use ndarray in Rust before working through this part of the tutorial.

Per this workaround, run binary examples with:

cargo run --no-default-features --bin name-of-example

Introduction

Python is a dynamically-typed, interpreted language with a REPL much like Matlab. Many programmers find it easier to use than compiled languages like C, C++, and Rust... or at least easier to get started with. As with Matlab, this ease of use comes with a performance tradeoff.

Python's numpy matrix algebra library is actually a Python extension module written in C that calls into the highly-optimized blas and lapack libraries. This allows one to achieve quite good performance in Python for vectorized matrix operations like dot product, decomposition, etc, since numpy calls into blas/lapack to do the actual operation, similar to Matlab. As with Matlab, Python runs into performance issues composing more and more complex matrix operations together, especially when memory contraints and parallelism are involed. This is where Rust and ndarray can help.

Fortunately, the memory layout of numpy's arrays is almost identical to the memory layout of Rust's ndarray arrays. This makes it easy to implement low- or zero-cost conversions between array types using Rust's PyO3 crate for generating Python bindings and numpy crate for interfacing with numpy arrays.

Building an Extension Module

Numpy is a Python extension module written in C to make numerical operations in Python more efficient. We can build on numpy and ndarray by writing our own Python extension in Rust to make more complex numerical operations run faster!

See the example in src/lib.rs (instructions for building included).

Calling Python from Rust

In addition to NumPy, Python has mature tools like NiBabel for working with neuroimaging data. We can take advantage of these tools by calling Python from inside our Rust programs. This will, of course, require the CPython interpreter and any Python module dependencies to be installed.

The example in src/bin/ni2npy64.rs is a fully-fledged command line utility for converting any neuroimaging file supported by nibabel to a 64-bit numpy file. It calls Python's nibabel to read the image data into an ndarray array. It uses Rust's ndarray_npy crate to save the array as a numpy file.

Memory Management

Rust and Python have very different ideas about how to manage memory. Rust frees memory immediately when a variable goes out of scope. In more complicated cases of memory use, such as heap-allocated memory behind reference-counted smart pointers like Rc, memory is freed immediately once the reference count goes to zero.

Python liberates the programmer from needing to think about memory management at all, but there are tradeoffs. Memory isn't freed at a predictable time in Python, rather, it is freed when the garbage collector runs and breaks cyclic dependencies. Within a multithreaded context, memory can only be modified while the global interpreter lock (GIL) is held to prevent race conditions.

See the example in src/bin/memory.rs for concrete examples of how Python lifetimes interact with Rust's notion of memory management through PyO3's API for occasionally unexepected results.

If you are writing a Python module extension with a custom class that holds references to Python memory, read about how to participate in garbage collection in the PyO3 guide.

Getting Help

Refer to the pyo3 guide and api documentation. Also refer to the Github page for Rusts's numpy and its api documentation.