Joblib: A Practical Guide to Caching and Parallelization in Python
Joblib is a powerful Python library that provides tools for lightweight pipelining in Python. It’s particularly useful for saving the results of time-consuming computations, implementing caching mechanisms, and parallelizing code execution. This guide covers the most common and practical use cases of joblib.
# Create a memory cache memory = joblib.Memory(location=".cache", verbose=0)
@memory.cache defslow_function(x): """A function that takes some time to execute.""" print("Computing slow_function...") time.sleep(2) # Simulate a time-consuming computation return np.sum(x)
# First call - will be computed and cached data = np.random.rand(1000) t0 = time.time() result1 = slow_function(data) print(f"First call took {time.time() - t0:.3f} seconds")
# Second call - will use cached result t0 = time.time() result2 = slow_function(data) print(f"Second call took {time.time() - t0:.3f} seconds")
Caching DataFrames with Custom Decorators
For pandas DataFrames, we can create a specialized caching decorator: