Vectorized numerical quadrature

On this page, we introduce vectorized numerical quadrature: an interesting technique we developed for evaluating integrals numerically that, to our knowledge, has not been explored in-depth by anyone else. We’ll show that this technique can achieve surprising performance gains over traditional quadrature rules.

Simpson’s rule, vectorized

To start, let’s review the typical methods of numerical integral evaluation, also called numerical quadrature¹. numerical quadrature rules are a collection of diverse methods used for approximating the value of an integral. One of the most widely-used methods of numerical quadrature is Simpson’s rule, which is actually a collection of several rules and even more variants.

Simpson’s (composite) 1/3 rule is given by:

\int_{a}^{b} f (x) d x \approx \frac{h}{3} [f_{0} + 4 f_{1} + 2 f_{2} + 4 f_{3} + 2 f_{4} + \dots 2 f_{n - 2} + 4 f_{n - 1} + f_{n}]

Where $f_{i} \equiv f (x_{i})$ , $h = (b - a) / N$ , and $N$ subintervals are used (and for the 1/3 rule, it must be an odd number). We can write this in a vectorized form as follows. Let $I_{S}$ be the vector defined by $I_{S} = K f$ , where $K$ is a matrix and $f$ is a vector, defined as follows:

K = 100000000400000000200000 0004 ⋮ 000 0000 ⋱ 000 0000 ⋮ 200 0000 ⋮ 040 0000001, f = f_{0} f_{1} f_{2} f_{3} f_{4} ⋮ f_{n - 2} f_{n - 1} f_{n}

Thus, by expanding out $I_{S} = K f$ we have:

I_{S} = 100000000400000000200000 0004 ⋮ 000 0000 ⋱ 000 0000 ⋮ 200 0000 ⋮ 040 0000001 f_{0} f_{1} f_{2} f_{3} f_{4} ⋮ f_{n - 2} f_{n - 1} f_{n}

The value of the integral is then given by:

\int_{a}^{b} f (x) d x \approx α (I_{S} \cdot 1), α = \frac{h}{3}

Note: the factor of $α$ is there to absorb any fractional coefficients as well as absorbing the step size $h$ . By leaving it as a simple multiplication after the sum is computed, it allows the $K$ -matrix to be composed of purely integers, increasing computational efficiency.

Then, the Simpson’s 1/3 rule is simply taking the dot product of $I_{S}$ with a vector of all ones (which we notate as $1$ ), or equivalently, the sum of all its elements, and then multiplying by $h /3$ :

\int_{a}^{b} f (x) d x \approx \frac{h}{3} (I_{S} \cdot 1) = \frac{h}{3} j \sum (I_{S})_{j}

Doing it this way, we can write far more general code that can simply take in a stencil for a particular integral rule and construct the matrix $K$ , rather than needing to write a for-loop for every individual method, since the evaluation steps are the same:

Construct the vector $f$ by evaluating $f (x)$ on every point in the integration domain
Construct the matrix $K$ and calculate $I_{S}$ via $I_{S} = K f$
Compute $α \cdot sum (I_{S})$ to get the approximated value of the integral

For instance, Simpson’s 3/8 rule can be written in a very similar form; indeed, one only needs to change the matrix and constant to the following forms:

α = \frac{3 h}{8}, K = 100000000300000000300000 0002 ⋮ 000 0000 ⋱ 000 0000 ⋮ 300 0000 ⋮ 030 0000001

Where the central diagonals repeat in the form $(3, 3, 2), (3, 3, 2), (3, 3, 2), \dots$ in the region represented by the $⋱$ symbol. The generalizability of the algorithm does not sacrifice simplicity; in fact, assuming that you’re using a language that supports vectorized operations, the code to implement the algorithm is so simple that it is under 10 lines of code(!) in pseudocode:

# Note: this is pseudocode but Python/Matlab-inspired
# Assuming that the matrix K
# and the function f(x) are defined already
 
# define integration domain & subintervals
a = 0
b = 1
N = 100
h = (b - a)/N
alpha = h/3
 
# construct grid and evaluate function
x = linspace(a, b, N)
f = f(x)
 
# perform the matrix-vector multiplication
I_s = matmul(K, f) # for Python we can use the special syntax K @ f
result = alpha*sum(I_s) # integral result

A functional (but un-optimized) example, using Python and SciPy, is given below. It evaluates the following integral, which has no elementary solution:

\int_{2}^{5} \frac{1}{ln x} d x

import numpy as np
import scipy.sparse as sp
 
N = 1000 # must be even!
a = 2
b = 5
h = (b-a)/N
 
# ensure that the number of subintervals
# is an even number
assert N % 2 == 0 # remove this
 
def integral_matrix(n):
	# it is assumed that n is checked
	# to be an even number prior to calling
	# this function
    central_diags = np.tile([2, 4], (n-2)//2)
    diags = np.pad(central_diags, pad_width=(1, 1), 
        mode="constant", constant_values=1)
    K = sp.block_diag(diags)
    return K
 
# function to integrate
func = lambda x: np.log(x)**(-1/2)
 
def vectorized_simpsons(func, bounds, samples):
    """
    func - function to integrate
    bounds - bounds of integration
    samples - number of subintervals
    """
    a, b = bounds
    alpha = h/3 # specific to Simpson's 1/3 rule
    x = np.linspace(a, b, N)
    f = func(x)
    K = integral_matrix(N)
    result = alpha * (K @ f).sum()
    return result
 
print("Approximation for integral:", vectorized_simpsons(func, (a, b), N))

In comparison, below is a “standard” example code for the Simpson’s method:

def standard_simpsons(func, bounds, samples):
    a, b = bounds
    h = (b-a)/samples
    alpha = h/3
    x = a
    # approximation for integral with sum
    # but dropping the h/3 factor until the end
    int_sum = func(a)
    for k in range(1, samples-1, 2):
        # we step by two every time
        int_sum += 4*func(x) + 2*func(x+h)
        x += 2*h
    return alpha * int_sum

Performance comparison

Having given our description of the vectorized and standard Simpson’s rules, we can now compare their performance. A very basic benchmark code to calculate the execution time of each algorithm is given as follows:

def timeit(N, method, func=func, a=a, b=b, h=h):
    start_time = time.perf_counter()
    if method == "standard":
        _ = vectorized_simpsons(func, (a, b), N)
    elif method == "vectorized":
        _ = standard_simpsons(func, (a, b), N, h)
    else: # no other case possible
        return
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    return elapsed_time
 
# to use, just call timeit(some_number, "standard")
# or timeit(some_number, "vectorized")

A full version of the code is available on this GitHub gist. The question is, is vectorized numerical quadrature really worth it? First, note that both methods are $O (n)$ in computational complexity. The standard Simpson’s method is trivially $O (n)$ since it is just a pure sum; meanwhile, the vectorized Simpson’s method uses a sparse diagonal matrix (which is $O (n)$ to construct), meaning the matrix-vector multiplication reduces to just a dot product (which is also $O (n)$ ) and the final sum is of course also $O (n)$ . However, if we take our basic code (which, again, is not optimized), we can perform some (rough) benchmarks:

$N$	Standard Simpson’s time (ms)	Vectorized Simpson’s time (ms)
10	<0.1	75.3
100	0.1	79.3
1,000	1.0	78.9
10,000	7.4	74.0
50,000	37.6	81.7
100,000	84.3	78.7
250,000	191.3	78.6
1,000,000	749.1	74.5
5,000,000	3876.2	79.9

Note: Bear in mind this is vanilla Python, not a high-performance language like C/C++/Rust/Fortran that you’d actually use for running ultra-high-performance code. Also, these benchmarks are definitely not done according the best practices and the results should be taken with caution.

Notice how below $N = 100,000$ , the standard Simpson’s method is faster, but after $N > 100,000$ , the vectorized Simpson’s method is orders of magnitude faster (see the highlighted row in the table above). While these results should be taken with a grain of salt, it is quite likely that the vectorized method was able to take advantage of BLAS/SIMD for vectorization speedups and avoid the need for for-loops (which can drastically slow code down). For smaller $N$ , it’s likely that the allocation cost for initializing the sparse matrix for $K$ makes it slower, but for large $N$ the vectorization benefits take over.

Concluding remarks

The verdict? Vectorized numerical quadrature is absolutely a viable option², particularly when evaluating integrals to very high precision is needed (that is, for large $N$ ). Furthermore, vectorized numerical quadrature can potentially benefit from hardware acceleration by running linear algebra on the GPU. In addition, another benefit of the vectorized method is that once constructed, the matrix $K$ can be continuously reused at no computational cost for subsequent integration, which is valuable when an integral needs to be rapidly re-evaluated many times over (for instance, if you are trying to compute a function that is defined by an integral, like the error function or Gamma function). Perhaps the best method is to use a mixed algorithm that uses the standard Simpson’s method for small $N$ , but switches to the vectorized version for large $N$ .

Here, when we speak of “numerical quadrature”, we mean the numerical evaluation of integrals. The similar term “numerical integration” is sometimes also used, but it can lead to confusion since “numerical integration” is also used to refer to solving a differential equation numerically. ↩
One big caveat is that this requires that you have a fast linear algebra library with support for sparse matrices. As long as you have that, though, you get all the benefits of the vectorization speed-up! ↩

Elara Research Docs

Explorer

Vectorized numerical quadrature

Simpson’s rule, vectorized

Performance comparison

Concluding remarks

Graph View

Table of Contents

Backlinks

Elara Research Docs

Explorer

Vectorized numerical quadrature

Simpson’s rule, vectorized

Performance comparison

Concluding remarks

Footnotes

Graph View

Table of Contents

Backlinks