On this page, we introduce vectorized numerical quadrature: an interesting technique we developed for evaluating integrals numerically that, to our knowledge, has not been explored in-depth by anyone else. We’ll show that this technique can achieve surprising performance gains over traditional quadrature rules.
Simpson’s rule, vectorized
To start, let’s review the typical methods of numerical integral evaluation, also called numerical quadrature1. numerical quadrature rules are a collection of diverse methods used for approximating the value of an integral. One of the most widely-used methods of numerical quadrature is Simpson’s rule, which is actually a collection of several rules and even more variants.
Simpson’s (composite) 1/3 rule is given by:
Where , , and subintervals are used (and for the 1/3 rule, it must be an odd number). We can write this in a vectorized form as follows. Let be the vector defined by , where is a matrix and is a vector, defined as follows:
Thus, by expanding out we have:
The value of the integral is then given by:
Note: the factor of is there to absorb any fractional coefficients as well as absorbing the step size . By leaving it as a simple multiplication after the sum is computed, it allows the -matrix to be composed of purely integers, increasing computational efficiency.
Then, the Simpson’s 1/3 rule is simply taking the dot product of with a vector of all ones (which we notate as ), or equivalently, the sum of all its elements, and then multiplying by :
Doing it this way, we can write far more general code that can simply take in a stencil for a particular integral rule and construct the matrix , rather than needing to write a for-loop for every individual method, since the evaluation steps are the same:
- Construct the vector by evaluating on every point in the integration domain
- Construct the matrix and calculate via
- Compute to get the approximated value of the integral
For instance, Simpson’s 3/8 rule can be written in a very similar form; indeed, one only needs to change the matrix and constant to the following forms:
Where the central diagonals repeat in the form in the region represented by the symbol. The generalizability of the algorithm does not sacrifice simplicity; in fact, assuming that you’re using a language that supports vectorized operations, the code to implement the algorithm is so simple that it is under 10 lines of code(!) in pseudocode:
# Note: this is pseudocode but Python/Matlab-inspired
# Assuming that the matrix K
# and the function f(x) are defined already
# define integration domain & subintervals
a = 0
b = 1
N = 100
h = (b - a)/N
alpha = h/3
# construct grid and evaluate function
x = linspace(a, b, N)
f = f(x)
# perform the matrix-vector multiplication
I_s = matmul(K, f) # for Python we can use the special syntax K @ f
result = alpha*sum(I_s) # integral resultA functional (but un-optimized) example, using Python and SciPy, is given below. It evaluates the following integral, which has no elementary solution:
import numpy as np
import scipy.sparse as sp
N = 1000 # must be even!
a = 2
b = 5
h = (b-a)/N
# ensure that the number of subintervals
# is an even number
assert N % 2 == 0 # remove this
def integral_matrix(n):
# it is assumed that n is checked
# to be an even number prior to calling
# this function
central_diags = np.tile([2, 4], (n-2)//2)
diags = np.pad(central_diags, pad_width=(1, 1),
mode="constant", constant_values=1)
K = sp.block_diag(diags)
return K
# function to integrate
func = lambda x: np.log(x)**(-1/2)
def vectorized_simpsons(func, bounds, samples):
"""
func - function to integrate
bounds - bounds of integration
samples - number of subintervals
"""
a, b = bounds
alpha = h/3 # specific to Simpson's 1/3 rule
x = np.linspace(a, b, N)
f = func(x)
K = integral_matrix(N)
result = alpha * (K @ f).sum()
return result
print("Approximation for integral:", vectorized_simpsons(func, (a, b), N))In comparison, below is a “standard” example code for the Simpson’s method:
def standard_simpsons(func, bounds, samples):
a, b = bounds
h = (b-a)/samples
alpha = h/3
x = a
# approximation for integral with sum
# but dropping the h/3 factor until the end
int_sum = func(a)
for k in range(1, samples-1, 2):
# we step by two every time
int_sum += 4*func(x) + 2*func(x+h)
x += 2*h
return alpha * int_sumPerformance comparison
Having given our description of the vectorized and standard Simpson’s rules, we can now compare their performance. A very basic benchmark code to calculate the execution time of each algorithm is given as follows:
def timeit(N, method, func=func, a=a, b=b, h=h):
start_time = time.perf_counter()
if method == "standard":
_ = vectorized_simpsons(func, (a, b), N)
elif method == "vectorized":
_ = standard_simpsons(func, (a, b), N, h)
else: # no other case possible
return
end_time = time.perf_counter()
elapsed_time = end_time - start_time
return elapsed_time
# to use, just call timeit(some_number, "standard")
# or timeit(some_number, "vectorized")A full version of the code is available on this GitHub gist. The question is, is vectorized numerical quadrature really worth it? First, note that both methods are in computational complexity. The standard Simpson’s method is trivially since it is just a pure sum; meanwhile, the vectorized Simpson’s method uses a sparse diagonal matrix (which is to construct), meaning the matrix-vector multiplication reduces to just a dot product (which is also ) and the final sum is of course also . However, if we take our basic code (which, again, is not optimized), we can perform some (rough) benchmarks:
| Standard Simpson’s time (ms) | Vectorized Simpson’s time (ms) | |
|---|---|---|
| 10 | <0.1 | 75.3 |
| 100 | 0.1 | 79.3 |
| 1,000 | 1.0 | 78.9 |
| 10,000 | 7.4 | 74.0 |
| 50,000 | 37.6 | 81.7 |
| 100,000 | 84.3 | 78.7 |
| 250,000 | 191.3 | 78.6 |
| 1,000,000 | 749.1 | 74.5 |
| 5,000,000 | 3876.2 | 79.9 |
Note: Bear in mind this is vanilla Python, not a high-performance language like C/C++/Rust/Fortran that you’d actually use for running ultra-high-performance code. Also, these benchmarks are definitely not done according the best practices and the results should be taken with caution.
Notice how below , the standard Simpson’s method is faster, but after , the vectorized Simpson’s method is orders of magnitude faster (see the highlighted row in the table above). While these results should be taken with a grain of salt, it is quite likely that the vectorized method was able to take advantage of BLAS/SIMD for vectorization speedups and avoid the need for for-loops (which can drastically slow code down). For smaller , it’s likely that the allocation cost for initializing the sparse matrix for makes it slower, but for large the vectorization benefits take over.
Concluding remarks
The verdict? Vectorized numerical quadrature is absolutely a viable option2, particularly when evaluating integrals to very high precision is needed (that is, for large ). Furthermore, vectorized numerical quadrature can potentially benefit from hardware acceleration by running linear algebra on the GPU. In addition, another benefit of the vectorized method is that once constructed, the matrix can be continuously reused at no computational cost for subsequent integration, which is valuable when an integral needs to be rapidly re-evaluated many times over (for instance, if you are trying to compute a function that is defined by an integral, like the error function or Gamma function). Perhaps the best method is to use a mixed algorithm that uses the standard Simpson’s method for small , but switches to the vectorized version for large .
Footnotes
-
Here, when we speak of “numerical quadrature”, we mean the numerical evaluation of integrals. The similar term “numerical integration” is sometimes also used, but it can lead to confusion since “numerical integration” is also used to refer to solving a differential equation numerically. ↩
-
One big caveat is that this requires that you have a fast linear algebra library with support for sparse matrices. As long as you have that, though, you get all the benefits of the vectorization speed-up! ↩