Weekly Shaarli

All links of one week in a single page.

Week 52 (December 25, 2023)

Unexpected Ways Memory Subsystem Interacts with Branch Prediction - Johnny's Software Lab
thumbnail

We investigate the unusual way memory subsystem interacts with branch prediction and how this interaction shapes software performance.

Let's Branch? Unlikely... - by Henrique Bucher

Recently, on LinkedIn, I read a post about an engineer who was surprised that his new, optimized version of a parser was slower than the original. The optimization consisted of removing the branches, which are the source of all evil according to the common knowledge in the street, right? His new version was slower, and a benchmark opened his eyes.

The LinkedIn DPH Framework
Breaking "DRM" in Polish trains- media.ccc.de
thumbnail

We've all been there: the trains you're servicing for a customer suddenly brick themselves and the manufacturer claims that's because you...

[2304.06835] Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms

We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels while performing 20--100$\times$ faster than the vectorizing map (vmap) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured -- supporting event handling, automatic differentiation, and incorporation of datasets via the GPU's texture memory -- allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance. We distribute the software as an open-source library https://github.com/SciML/DiffEqGPU.jl