Tue, 06 Mar 2018
14:00 - 14:30
Oliver Sheridan-Methven

The latest CPUs by Intel and ARM support vectorised operations, where a single set of instructions (e.g. add, multiple, bit shift, XOR, etc.) are performed in parallel for small batches of data. This can provide great performance improvements if each parallel instruction performs the same operation, but carries the risk of performance loss if each needs to perform different tasks (e.g. if else conditions). I will present the work I have done so far looking into how to recover the full performance of the hardware, and some of the challenges faced when trading off between ever larger parallel tasks, risks of tasks diverging, and how certain coding styles might be modified for memory bandwidth limited applications. Examples will be taken from finance and Monte Carlo applications, inspecting some standard maths library functions and possibly random number generation.

Please contact us with feedback and comments about this page. Last updated on 04 Apr 2022 14:57.