just for context by Kenny Peng

Rebuilding ESP32-fluid-simulation: the pressure projection step of the sim task (Part 5)

2024-09-26T00:00:00+00:00

It’s been a while since my last post, but let’s bring this series to its finale. Out of all the parts of the fluid sim, the last piece I have not explained is the pressure projection. The code alone doesn’t reveal the great deal of context that goes into its design. Here’s a hint: it’s actually a linear algebra routine involving a “sparse matrix”. Though it’s possible these days to implement the pressure projection without needing to know all that context, thanks to articles like Jamie Wong’s post, the GPU Gems chapter, and Stam’s “Real-Time Fluid Dynamics for Games”, achieving a believable fluid simulation on an ESP32 would have been impossible. I’ve personally tried it before, and after knowing? All I needed was to switch in a technically superior method. There was a reason why I dedicated airtime to this—let me explain.

In part 3, we covered the Navier-Stokes equations, and in part 4, we covered how parts of the equations were discretized differently. Well, to begin with I need to cover one last thing that involves continuous fields and not discrete arrays: the “Helmholtz-Hodge decomposition”.

It’s mentioned in the GPU Gems chapter, and for that Stam’s 1999 “Stable Fluids” paper is cited. In the end, I had found “Stable Fluids” to be the definitive place to start looking for context, and there I borrow much of my understanding. Now, it itself cites a section of the book A mathematical introduction to fluid mechanics by Chorin and Marsden. Building upon this section, I’ll get into what the “Helmholtz-Hodge decomposition” is. Though, I won’t get into why it is i.e. all of the book preceding that section. That’s outside the scope I want to take here, but that’s also outside what I can comfortably do—I’ll admit. With that said, what is it, and how do we apply it?

We can derive the pressure projection from the decomposition. To begin, the decomposition itself is this: given some vector field $\bold{w}$ that is defined over a bounded space, it can be taken as the following sum

\[\bold{w} = \bold{v} + \nabla p\]

where $\nabla p$ is the gradient of a scalar field and $\bold v$ is a vector field that has zero divergence everywhere in the space and zero flow across its boundary. In other words, every vector field can be broken down into two components, one with all of its original divergence and one with zero divergence, and the former is necessarily the gradient of some scalar field. In part 3, I mentioned that a linear projection $\mathbb{P}$ is applied to the velocity field in order to make it valid, and remember that “valid” means that it satisfies the incompressibility constraint by having zero divergence. A definition for that linear projection can be built upon the Helmholtz-Hodge decomposition.

Let $\bold w$ be the uncorrected velocity field. Since we want its zero-divergence component, we need to isolate $\bold{v}$. That gives us this expression.

\[\bold{v} = \bold{w} - \nabla p\]

However, before we can subtract $\nabla p$ from $\bold w$, we need to find what $p$ is in the first place! The Helmholtz-Hodge decomposition tells us that such a $p$ exists, but the problem is that it does not tell us how to find it. Therein lies the meat of the pressure projection. In this context, $p$ is not just some scalar field; $p$ stands for “pressure”.

Also in the Chorin and Marsden section, you’d find that the proof of the Helmholtz-Hodge decomposition builds upon the pressure being the solution to a particular “boundary value problem”, one that Stam noted to involve a case of “Poisson’s equation”. In a “boundary value problem”, we’re given a bounded space as a domain, the governing differential equations that a function on that space (i.e. a field) must satisfy, and the values that the function (and/or values of that function’s derivative) must take on the boundary i.e. the “boundary conditions”.

The three principal parts of a boundary value problem: a bounded domain, a governing equation, and the boundary condition(s).

Regarding “Poisson’s equation”, I’m not qualified to opine about it or anything, so I’ll just say that it’s a partial differential equation where one scalar field is equal to the Laplacian of another scalar field. If there’s an intuitive meaning to it, that’s currently lost on me.

Anyway, the boundary value problem is this: given $\bold{w}$, $p$ should satisfy the following case of Poisson’s equation

\[\nabla^2 p = \nabla \cdot \bold{w}\]

on the domain, whereas on the boundary its “directional derivative” is specified.

When I say “directional derivative”, I’d like you to recall that explanation of the partial derivative I did in Part 3, where I mentioned that $\frac{\partial p}{\partial x}$ and $\frac{\partial p}{\partial y}$ were just the slopes of two lines in the many that make up the plane tangent to the field. The “directional derivative” is the slope of a potentially different line, the one running in the direction that the derivative is being taken with respect to. (This definition also goes to show that the typical partials $\frac{\partial p}{\partial x}$ and $\frac{\partial p}{\partial y}$ are special cases of the directional derivative, being taken in the $+x$- and $+y$-directions.)

It is often denoted as $\frac{\partial f}{\partial \bold{v}}$, where $\bold{v}$ is a vector that points in the direction that the field $f$ is being taken with respect to. Furthermore, it is defined as the dot product between the gradient $\nabla f$ and $\tilde{\bold{v}}$, where $\tilde{\bold{v}}$ is the unit vector in that direction.

\[\frac{\partial f}{\partial \bold{v}} \equiv \tilde{\bold{v}} \cdot \nabla f\]

In this case, the directional derivative with respect to the normal (the direction perpendicular to the boundary and pointing outwards) is equal to the component of $\bold{w}$ that is in that direction. A specification of the derivative, not the value, of $p$ like this is said to be a “Neumann” boundary condition. Formally, the boundary condition is written like this:

\[\frac{\partial p}{\partial \bold{n}} = \bold{w} \cdot \bold{n} \enspace \text{on} \enspace \partial D\]

So, our boundary value problem involves a case of Poisson’s equation and Neumann boundary conditions, not to mention the domain, which should be representative of the screen, i.e. a rectangle. There are many ways to solve this, but we’ll start by again turning to discretization with a grid. This time, though, it’ll be an entire finite difference method.

Review: The finite difference discretization

I mentioned them in passing in Part 4, but you may or may not be familiar with the “finite difference” approximation of a derivative. A “finite difference” can be taken quite literally: it’s the “difference” (i.e. subtraction) between the values of a function at points separated by a “finite” (i.e. not infinitesimal) distance. It’s what happens when you don’t take the limit inside the definition of the derivative.

\[\frac{df}{dx} = \lim_{h \to \infty} \frac{f(x) - f(x-h)}{h} \approx \frac{f(x) - f(x-\Delta x)}{\Delta x}\]

A function, a center $x$, a tangent line there, and secant lines going through it and some other points of decreasing $\Delta x$. Notice how the slope gets closer to the slope of the tangent as $\Delta x$ decreases.

In general, there are many different ways to approximate the derivative using different pairs (or even larger combinations of) of neighboring points. Anyway, that’s out of scope. We should focus on the common cases. The above expression shows the “backward” difference, taking a point and its preceding point. The other two common cases are the “forward” difference and the “central” difference, which come from different, yet equivalent, definitions of the derivative.

The forward difference uses a point and its succeeding point,

\[\frac{df}{dx} \approx \frac{f(x + \Delta x) - f(x)}{\Delta x}\]

Notice how this also works when approaching from the other direction.

and the central difference uses the preceding point and the succeeding point.

\[\frac{df}{dx} \approx \frac{f(x + \Delta x) - f(x - \Delta x)}{2 \Delta x}\]

Notice how this also works when drawing a line through a preceding point and a succeeding point, approaching from each side.

All of these expressions converge on the same value as $\Delta x$ shrinks, but we’ll end up using the central difference the most. That said, the forward and backward differences also get involved in today’s scheme.

Anyway, because we discretized the fields with grids, this approximation is useful to us. First, extending this idea to partial derivatives is quite simple! Recall that it’s the derivative of a field with respect to one component while keeping all others constant. Because we discretized space with a grid, all points have left, right, top, and bottom neighbors (all besides the ones on the boundary, but we’ll get to them at some point). So, we can use the preceding and succeeding neighbors in each direction, $\frac{\partial f}{\partial x}$ using the left and right and $\frac{\partial f}{\partial y}$ using the top and bottom.

\[\frac{\partial f}{\partial x} \approx \frac{f(x + \Delta x, y) - f(x - \Delta x, y)}{2 \Delta x}\] \[\frac{\partial f}{\partial y} \approx \frac{f(x, y + \Delta y) - f(x, y - \Delta y)}{2 \Delta y}\]

The five-point stencil, showing all the points involved in our finite difference discretization

Finally, recall that a field that’s discretized with a grid can be represented by an array, and so we can rewrite the above central differences using indices.

\[g_x[i, j] = \frac{f[i+1, j] - f[i-1, j]}{2 \Delta x}\] \[g_y[i, j] = \frac{f[i, j+1] - f[i, j-1]}{2 \Delta y}\]

With this, approximating the differential operators is as easy as taking their definitions and replacing each partial derivative with its finite difference counterpart. Let’s write out the ones we’ll use for the rest of this article.

Discretizing the gradient and the divergence with the central differences gets the following:

\[\nabla f = \begin{bmatrix} \displaystyle \frac{\partial f}{\partial x} \\[1em] \displaystyle \frac{\partial f}{\partial y} \end{bmatrix} \approx \begin{bmatrix} \displaystyle \frac{f[i+1, j] - f[i-1, j]}{2 \Delta x} \\[1em] \displaystyle \frac{f[i, j+1] - f[i, j-1]}{2 \Delta y} \end{bmatrix}\] \[\nabla \cdot \bold{f} = \frac{\partial f_x}{\partial x} + \frac{\partial f_y}{\partial y} \approx \frac{f[i+1, j] - f[i-1, j]}{2 \Delta x} + \frac{f[i, j+1] - f[i, j-1]}{2 \Delta y}\]

That leaves the Laplacian. There happens to be a “but actually” to discuss here. You may recall from Part 3 that the Laplacian is the divergence of the gradient but also that it can be written as a sum of the second partial (non-mixed) derivatives.

\[\nabla^2 f \equiv \nabla \cdot (\nabla f) = \frac{\partial^2 \! f}{\partial x^2} + \frac{\partial^2 \! f}{\partial y^2}\]

One possible discrete approximation—one that uses what we’ve already mentioned—is to take the discrete gradient and then take the discrete divergence. Instead, the typical thing to do builds upon the latter definition. A second derivative can itself be approximated with finite differences, and the central difference way to do it is this:

\[\frac{d^2 \! f}{d x^2} \approx \frac{f(x+\Delta x) - 2f(x) + f(x - \Delta x)}{\Delta x^2}\]

After extending this approximation to the partial derivative and rewriting it in terms of array values, that latter definition becomes the following.

\[\frac{\partial^2 \! f}{\partial x^2} + \frac{\partial^2 \! f}{\partial y^2} \approx \frac{f[i+1, j] - 2 f[i,j] + f[i-1, j]}{\Delta x^2} + \frac{f[i, j+1] - 2 f[i, j] + f[i, j-1]}{\Delta y^2}\]

Finally, if the grid is square—that is, $\Delta x = \Delta y$—then the $f[i,j]$ terms can be combined cleanly, and it just reduces to this:

\[\frac{\partial^2 \! f}{\partial x^2} + \frac{\partial^2 \! f}{\partial y^2} \approx \frac{f[i+1,j] + f[i-1,j] + f[i, j+1] + f[i, j-1] - 4 f[i, j]}{\Delta x^2}\]

This is how the Laplacian is typically discretized, but it’s actually not equal to the discrete divergence of the discrete gradient. Going about it that way happens to get you the above but with $\Delta x$ and $\Delta y$ being twice as large and the neighbor’s neighbor being used (two to the left, two to the right, etc.). And, for the record, it didn’t look that good when I tried it.

Really, it goes to show that the “Stable Fluids” discretization will be simple but not very faithful to the original boundary value problem. Once before, years ago, I didn’t realize it could be flawed, and I lost many hours because of that. On another note, I reached out a year ago to Stam about it, and he mentioned that he instead used the “marker-and-cell (MAC) method” in his past professional work on Maya (now Autodesk Maya) and I managed to work out after that the MAC method does happen to make the two equal.

Anyway, that’s the finite difference discretization (or, well, a simple case of it) and how it is applied to the gradient, divergence, and Laplacian. It’s a fundamental concept to grasp because—as you will see—a discrete approximation of differential operators can turn our boundary value problem into something we can solve.

So, how do we use finite differences to solve the problem? At this point, the “Stable Fluids” paper goes on to use a technique that will not fly here. If you’re curious about it, Stam had originally defined a different problem that used periodic boundary conditions, i.e. wrap-around from right to left, top to bottom, etc., and that allowed him to use the fast Fourier Transform (FFT). Since we’d want the fluid to collide into and generally stay within the boundary, we’re stuck with Neumann boundary conditions and no FFT. Anyway, we can instead turn to the method found in Stam’s “Real-time Fluid Dynamics for Games” or also the GPU Gems chapter. Those sources essentially line up with “Stable Fluids” up to here. But first, I’ll take the time to emphasize some of the context.

To begin with, we need to apply the finite difference discretization to the boundary value problem. I’ll say here that we need to treat both the governing equation and the boundary conditions simultaneously.

The governing equation, being just a case of Poisson’s equation, is simple enough; we just replace the Laplacian and the divergence with their discrete counterparts. In those articles, the central difference version with a square grid is used, so that’s what we’ll use here. Overall, the governing equation becomes this:

\[\frac{p[i+1, j] + p[i-1, j] + p[i, j+1] + p[i, j-1] - 4 p[i, j]}{\Delta x^2} = d[i, j]\]

where $d[i, j]$ is defined as the discrete divergence of $\bold{w}$

\[d[i,j] = \frac{w_x[i+1, j] + w_x[i-1, j] + w_y[i, j+1] + w_y[i, j-1]}{2 \Delta x}\]

Notice here that $d[i, j]$ is purely a function of some elements of $\bold{w}$, which is the given uncorrected velocity. That means we can treat $d[i, j]$ as a known value. Also notice here that discretizing the Poisson equation causes us to not have a just one unknown anymore. Given some $i$ and $j$, there are five unknown elements of $p$ and one known element of $d$, and that’s not solvable.

Before we deal with that, the boundary conditions are the other half we need to discretize. You may have noticed a plot hole here already: there isn’t always a left, up, right, or down neighbor. $p[i-1, j]$ doesn’t exist if $i = 0$. Now, remember how we used ghost points to complete the bilinear interpolation at the boundary? I’ll show that we can discretize our Neumann condition with the same ghost points that we covered in part 4, and in doing so we’ll find out what to do here.

I do have to preface this with something real quick, though. As I brought up earlier, the directional derivative of $p$ with respect to the normal is equal to the component of $\bold w$ in that direction. This statement I pulled from the Chorin and Marsden book. However, in Stam’s “Stable Fluids” and the GPU Gems article, the same boundary value problem is presented but with the directional derivative stated as being equal to zero. It’s a clear discrepancy.

It comes from grid choice. Remember from Part 4 that the boundary is supposed to simulate a wall that runs between the last real row and the ghost row. That was accomplished by making the ghost rows and columns take the negative because that makes the value of the bilinear interpolation at the “wall” equal to zero by definition. If $\bold{w}$ at the boundary is zero, then $\bold{w} \cdot \bold{n}$ must be zero. If $\bold{w} \cdot \bold{n}$ is zero, then even when following the definition from Chorin and Marsden, $\frac{\partial p}{\partial \bold{n}}$ must be zero.

That out of the way, our boundary condition just manifests as setting $\frac{\partial p}{\partial x}$ or $\frac{\partial p}{\partial y}$ because the domain is rectangular. Let’s see what to do on the top side first. There, the normal direction is the $+y$-direction. That is, $\frac{\partial p}{\partial \bold{n}} = 0$ manifests as $\frac{\partial p}{\partial y} = 0$. We then replace $\frac{\partial p}{\partial y}$ with its finite difference counterpart, but there’s a twist. The forward difference, not the central difference, is used. That gets us the following.

\[\frac{p[i, N] - p[i, N-1]}{\Delta x} = 0\]

where $i$ is any horizontal index, $N-1$ is the vertical index of the last real row, and $N$ is the vertical index of the ghost row. The implication of this statement is obvious: the ghost row must take on values equal to that of the real row.

\[p[i, N] = p[i, N-1]\]

Though it may sound strange that we switched from applying the central difference to the forward difference, I think we can imagine together what it would imply if we kept it: at the top boundary, the central difference must be equal to zero

\[\frac{p[i, N] - p[i, N-2]}{2 \Delta x} = 0\]

and therefore the ghost row should take on the values of the second-to-last row.

\[p[i, N] = p[i, N-2]\]

That conclusion doesn’t sound as natural.

The same conclusion should be expected for all sides. For the bottom side, it’s reached by applying the backward difference on $\frac{\partial p}{\partial y} = 0$. For the right side, the forward difference on $\frac{\partial p}{\partial x} = 0$, and for the left, the backward difference on $\frac{\partial p}{\partial x} = 0$. It should sound quite similar to what we did for the bilinear interpolation, though back then the ghost row had to take on the negative.

To recap, the governing equation is discretized with finite differences, and the Neumann boundary condition is handled by reintroducing the ghost row but this time letting it essentially be the real row repeated (same going for the columns). That completes the finite difference version of our boundary value problem! Which leaves solving it, and with what else but a computer? Now, there’s that, and then there’s solving it with an ESP32 for the computer. If my goal was to explain the former, then quoting my sources would’ve been effective enough, but here’s comes the necessary context.

For one, didn’t I say this involved “sparse matrices”? Here they are.

Review: Sparse matrices

You may or may not be familiar with “sparse matrices”. If you are, then you would know that they enable a great deal of optimization on a computer—to a point where even the big-O complexity is lower—without changing any of the linear algebra stuff. With such drastic improvement for free on the table, “sparse matrices” are an essential topic. And that’s all for, well, “sparse matrices” just being matrices with a lot of zeroes.

An image created by taking the elements of a sparse, symmetric matrix (not the ones we'll soon see, but some other one) and filling the zero elements with white and the nonzero elements with black. By Oleg Alexandrov via Wikimedia Commons, where it was released into the public domain.

Generally speaking, zero can be effectively implemented by doing nothing or storing nothing. Conversely, actual computation and memory is focused on the non-zero elements.

I mentioned in the last part that it is more useful to think of fields—which are arrays when discretized—as very long vectors. Where was I going with that, exactly? Our arrays are a 2D arrangement of data, with the position of each element neatly corresponding to a 2D location. Now, we care less for this correspondence when we’re doing the linear algebra. Vectors are a 1D arrangement of data. To rearrange the array into a vector, convention tells us to read out the data like a book, increasing in i from left to right, then increasing in j from… bottom to top, actually, to stay within the Cartesian perspective. Once again, I already discussed the difference between matrix indexing and Cartesian indexing in Part 3. And of course, the conventional order isn’t the only order, and we’ll get to another one in this article soon enough.

Let’s make this concrete with an example. Consider a three-by-three grid and some scalar field $x$. Then, the discretization of $x$ on that grid is a three-by-three array. Reading out its elements gets a nine-element vector. All that is depicted below.

Let’s look at points 5 and 8. They’re right next to each other in 2D space, but now they’re so far apart on the vector! Such things happen with the conventional order.

More importantly, with arrays rearranged as vectors, we can show that our boundary value problem, when discretized the way we’ve done it, is a classic $Ax = b$ problem! If we let $x$ be the pressure vector and $b$ be the divergence, then the $i$-th row of $A$ represents the equation for some specific point with location $i,\; j$ (yes, unfortunately $i$ and $j$ have double meanings here, one for indexing arrays/grids and one for indexing matrices), relating the $i$-th element of the divergence vector to the $i$-th element of the pressure vector, plus an element for each neighbor. Because of the conventional order, those neighbors would be the $(i+1)$-th, $(i-1)$-th, $(i+N)$-th, and $(i-N)$-th elements, where $N$ is the horizontal length of the array.

Let’s demonstrate this by determining the $A$ for our previous three-by-three example.

Starting from the governing equation, we can rewrite the left side so that the division by $\Delta x^2$ is multiplication by $\frac{1}{\Delta x^2}$.

\[\frac{1}{\Delta x^2} \big( p[i + 1, j] + p[i - 1, j] + p[i, j + 1] + p[i, j - 1] - 4 p[i, j] \big) = d[i, j]\]

This helps save some repetitive writing by defining $A = \frac{1}{\Delta x^2} B$, where $B$ just has the same coefficients as $A$ but with $\frac{1}{\Delta x^2}$ factored out.

Next, we recognize that, in a three-by-three matrix, $N = 3$. It follows that, for some given center $p[i, j]$ in the array, which maps to $p_i$ in the vector, getting the down neighbor $p[i, j - 1]$ is to get $p_{i - 3}$.

Let’s see this in action. The center element $i=1,\;j=1$ is the only element with all-real neighbors, so it stands for the typical case where the boundary conditions don’t apply. Following the conventional order, it’s the 5th element of the vector, $p_{5}$, and the governing equation here can be written as this.

\[\frac{1}{\Delta x^2} \big( p_6 + p_4 + p_8 + p_2 - 4 p_5 \big) = d_5\]

We can then fill in the corresponding row of $A$ like this:

\[\frac{1}{\Delta x^2} \begin{bmatrix} \\ \\ \\ \\ 0 & 1 & 0 & 1 & -4 & 1 & 0 & 1 & 0 \\ \\ \\ \\ \! \end{bmatrix} \begin{bmatrix} p_1 \\ p_2 \\ p_3 \\ p_4 \\ p_5 \\ p_6 \\ p_7 \\ p_8 \\ p_9 \end{bmatrix} = \begin{bmatrix} d_1 \\ d_2 \\ d_3 \\ d_4 \\ d_5 \\ d_6 \\ d_7 \\ d_8 \\ d_9 \end{bmatrix}\]

In this row, the element corresponding to $i=1,\;j=1$ itself gets a $-4$, the ones corresponding to its neighbors get a $1$, and all other elements get a zero. In general, because the discretized governing equation, being the instance of Poisson’s equation that it is, just relates the five elements of the pressure vector to an element of the divergence vector, all the coefficients for the other elements–the rest of the $i$-th row of $A$–must be zero. $A$ is sparse!

With that said, for the points along the boundary, some of those neighbors don’t exist, and as a result, their rows end up slightly different. For example, one step to the left lands us on $i = 0,\;j = 1$, which is on the left boundary, so applying our discretized Neumann boundary condition gets us this equation

\[\frac{p[i+1, j] + p[i, j+1] + p[i, j-1] - 3 p[i, j]}{\Delta x^2} = d[i, j]\]

where the terms canceling gets us this $-3$ instead of a $-4$. Following the same treatment as before, it’s corresponding row in $A$ looks like this:

\[\frac{1}{\Delta x^2} \begin{bmatrix} \\ \\ \\ 1 & 0 & 0 & -3 & 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 & -4 & 1 & 0 & 1 & 0 \\ \\ \\ \\ \! \end{bmatrix} \begin{bmatrix} p_1 \\ p_2 \\ p_3 \\ p_4 \\ p_5 \\ p_6 \\ p_7 \\ p_8 \\ p_9 \end{bmatrix} = \begin{bmatrix} d_1 \\ d_2 \\ d_3 \\ d_4 \\ d_5 \\ d_6 \\ d_7 \\ d_8 \\ d_9 \end{bmatrix}\]

One step up lands on $i = 0, \; j = 2$, which is on both the left and top boundary, and that gets us the following equation,

\[\frac{p[i+1, j] + p[i, j-1] - 2 p[i, j]}{\Delta x^2} = d[i, j]\]

and its corresponding row looks like this.

\[\frac{1}{\Delta x^2} \begin{bmatrix} \\ \\ \\ 1 & 0 & 0 & -3 & 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 & -4 & 1 & 0 & 1 & 0 \\ \\ 0 & 0 & 0 & 1 & 0 & 0 & -2 & 1 & 0 \\ \\ \! \end{bmatrix} \begin{bmatrix} p_1 \\ p_2 \\ p_3 \\ p_4 \\ p_5 \\ p_6 \\ p_7 \\ p_8 \\ p_9 \end{bmatrix} = \begin{bmatrix} d_1 \\ d_2 \\ d_3 \\ d_4 \\ d_5 \\ d_6 \\ d_7 \\ d_8 \\ d_9 \end{bmatrix}\]

All the possibilities—in the three-by-three example or in any size of screen—is covered by these cases after adjusting for whichever neighbors are in a ghost row/column. Let’s fill in the rest of $A$.

\[\frac{1}{\Delta x^2} \begin{bmatrix} -2 & 1 & 0 & 1 \\ 1 & -3 & 1 & 0 & 1 \\ 0 & 1 & -2 & 0 & 0 & 1 \\ 1 & 0 & 0 & -3 & 1 & 0 & 1 \\ & 1 & 0 & 1 & -4 & 1 & 0 & 1 \\ & & 1 & 0 & 1 & -3 & 0 & 0 & 1 \\ & & & 1 & 0 & 0 & -2 & 1 & 0 \\ & & & & 1 & 0 & 1 & -3 & 1 \\ & & & & & 1 & 0 & 1 & -2 \end{bmatrix} \begin{bmatrix} p_1 \\ p_2 \\ p_3 \\ p_4 \\ p_5 \\ p_6 \\ p_7 \\ p_8 \\ p_9 \end{bmatrix} = \begin{bmatrix} d_1 \\ d_2 \\ d_3 \\ d_4 \\ d_5 \\ d_6 \\ d_7 \\ d_8 \\ d_9 \end{bmatrix}\]

Here, the empty elements are also zero. Blank elements are part of the typical notation for sparse matrices.

That completes this sketch of a specific $A$ and a description for how to construct $A$ for any particular grid. I should pause here for a moment and note that what we have so far is quite interesting. So far, we have

sought to achieve the incompressibility constraint using the Helmholtz-Hodge decomposition,
found that we need to solve the supporting boundary value problem,
applied a finite difference discretization to it, and
arrived at just a case of $Ax = b$.

When it comes to that $Ax = b$, $b$ takes from the divergence of the uncorrected velocity field, $x$ takes from the pressure field, and $A$ takes from the governing equation and the boundary conditions. Just by solving it for $x$, we find the pressure field that solves the boundary value problem. With that, the next step would be to use the Helmholtz-Hodge decomposition to extract out the divergence-free component of our velocity. Recall, that this means subtracting the gradient of the pressure from it. We implement this by subtracting the discrete gradient.

\[v_x[i, j] = w_x[i, j] - \frac{p[i+1, j] - p[i-1, j]}{2 \Delta x}\] \[v_y[i, j] = w_y[i, j] - \frac{p[i, j+1] - p[i, j-1]}{2 \Delta y}\]

And with the divergence-free velocity, the incompressibility constraint is achieved, giving realistic, fluid-like motion. It’s a real problem in a problem in a problem! But we still haven’t solved it yet. Remember that the Helmholtz-Hodge decomposition tells us that the pressure field exists, but not how to find it. Let’s now talk about getting that done.

The expression of the boundary value problem as just a case of $Ax = b$ was part of the context I needed to know. In my first (failed) attempt to get a fluid sim on an ESP32, I only knew Stam’s “Real-Time Fluid Dynamics for Games”, Wong’s post, and the GPU Gems article. I got quite far on a PC, but it was no good on an ESP32. Those sources didn’t mention the $Ax = b$ explicitly. But now, you and I can discuss how to solve the problem using “Jacobi iteration” but also “Gauss-Seidel iteration” or “successive over-relaxation” (SOR). In general, these are “iterative methods”, computational routines that can be said to “converge” onto the solution. With enough number-crunching, one can get arbitrarily close to the solution, though they never get the solution exactly.

First, what is “Jacobi iteration”, and how does it apply?

$A$ is a square matrix. That comes from the pressure and divergence vectors being constructed from the same grid of points, thus having the same number of elements. Assuming $A$ is furthermore invertible (technically it’s not because we only have Neumann conditions, but we’ll get to that) then perhaps we could solve by calculating $A^{-1} b$. However, that’s an $O(N^3)$ operation—far too expensive, considering in this case that $N$ is the total number of points on the grid! To get something faster, we must exploit the sparsity of $A$, and Jacobi iteration happens to let us do that.

So, we start by expressing $A$ as the sum of (1) a cut-out of $A$ along the diagonal, hereby called $D$, and the rest of $A$, otherwise describable as (2) an upper-triangular part that we’ll call $U$ plus (3) a lower-triangular part we’ll call $L$.

\[A = D + L + U\] \[\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} = \begin{bmatrix} a_{11} \\ & a_{22} \\ & & \ddots \\ & & & a_{nn} \end{bmatrix} + \begin{bmatrix} 0 \\ a_{21} & 0 \\ a_{31} & a_{32} & 0 \\ \vdots & \vdots & \ddots & \ddots \\ a_{n1} & a_{n2} & \cdots & a_{n(n-1)} & 0 \end{bmatrix} + \begin{bmatrix} 0 & a_{12} & a_{13} & \cdots & a_{1n} \\ & 0 & a_{23} & \cdots & a_{2n} \\ & & 0 & \ddots & \vdots \\ & & & \ddots & a_{(n-1)n} \\ & & & & 0 \end{bmatrix}\]

Consider $a_{ij}$ where $i = j$, or in other words $a_{ii}$. The $i$-th point in the $i$-th row, which represents the governing equation centered at the $i$-th point, must always be the center. It follows that the diagonal cut-out $D$, which takes the elements $a_{ii}$, is taking the coefficients of the centers. That’s the $-4$’s, $-3$’s, and $-2$’s. Now consider how the conventional order is left-to-right then bottom-to-top. The lower-triangular cut-out $L$ contains all elements preceding the center. That is, it contains the left and bottom neighbors! The upper-triangular cut-out $U$ contains all elements succeeding the center i.e. the right and top neighbors!

What each entry in the $A$ of our three-by-three example means

That aside, in this decomposition, it happens that $D$ is a diagonal matrix. As a result, we can say that its inverse $D^{-1}$ is just a matrix with reciprocals along the diagonal.

\[D^{-1} = \begin{bmatrix} a_{11}^{-1} \\ & a_{22}^{-1} \\ & & \ddots \\ & & & a_{nn}^{-1} \end{bmatrix}\]

With that in mind, we can derive the following equation from $Ax = b$,

\[\begin{align*} A x & = b \\ (D + L + U) x & = b \\ Dx + (L+U)x & = b \\ Dx & = b - (L+U) x \\ x & = D^{-1} (b - (L+U) x) \end{align*}\]

where $x$ appears in two places. From there, Jacobi iteration is to let the right-hand $x$ be some guess at the solution that we’ll call $x^{(k)}$ and let the left-hand $x$ be the updated guess $x^{(k+1)}$.

\[x^{(k+1)} = D^{-1} (b - (L+U) x^{(k)})\]

In a moment, we’ll show that—given a specific condition holds—$x^{(k+1)}$ is always a better guess than $x^{(k)}$.

First, let’s see how the expression $D^{-1} (b - (L+U)x)$ manifests in practice. How is it faster than inverting $A$? Where do the sparse matrices come in? We just need to go get one element of $x^{(k+1)}$ at a time.

We know that $D^{-1}$, like $D$, is diagonal, and multiplying a vector by a diagonal matrix happens to be equivalent to multiplying each $i$-th element with its corresponding $a_{ii}^{-1}$.

\[\begin{bmatrix} a_{11}^{-1} \\ & a_{22}^{-1} \\ & & \ddots \\ & & & a_{nn}^{-1} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} a_{11}^{-1} x_1 \\ a_{22}^{-1} x_2 \\ \vdots \\ a_{nn}^{-1} x_n \end{bmatrix}\]

So, to acquire some $i$-th element of $x^{(k+1)}$, we can at least compute $b - (L+U) x^{(k)}$ in its entirety, pick the $i$-th element of that, then multiply it by $a_{ii}^{-1}$. Next, to subtract $(L+U)x^{(k)}$ from $b$ is of course equivalent to subtracting each element of $(L+U)x^{(k)}$ from its corresponding element of $b$. That leaves $(L+U)x^{(k)}$ by itself.

Well, remember where we came from: $A$ is a sparse because each of its rows is just a discretized instance of Poisson’s equation. Said equation only involves the center point and its neighbors. If $D$ contained all the center coefficients, then $L+U$ is just the neighbor coefficients. That is, the $i$-th element of $(L+U)x^{(k)}$ is just a sum of the four neighbor terms.

\[\frac{p[i+1, j] + p[i-1, j] + p[i, j+1] + p[i, j-1]}{\Delta x^2}\]

It’s a sum of the four neighbor terms when there are four of them, anyway. So that I may keep it concise, whenever you see a $-4$ and four terms, please adjust it to a $-3$ and three terms or $-2$ and two terms at the boundary.

From the bottom to the top, $D^{-1}(b-(L+U)x^{(k)})$ can be computed one element at a time. It’s a composition of three operations: multiplication by $a_{ii}^{-1}$, subtraction from $b_i$, and the summing of the neighbor terms. If we put it all together, we get this:

\[p^{(k+1)}[i, j] = - \frac{\Delta x^2}{4} \left( d[i, j] - \frac{p^{(k)}[i+1, j] + p^{(k)}[i-1, j] + p^{(k)}[i, j+1] + p^{(k)}[i, j-1]}{\Delta x^2} \right)\]

Apart from a bit of algebra, this is identical to the derivations of Jacobi iteration in Wong’s post and the GPU Gems article. Not bad!

Now, why is this fast? Generally speaking, if $A$ was not sparse, then a row of $(L+U)$ wouldn’t contain the four neighbors but rather $N$ terms, and as expected, the complexity of its dot product with $x$ is $O(N)$. Doing that for each of the $N$ rows of $A$ gives the expected complexity of a matrix-vector multiply, $O(N^2)$. Now, if a row had only four non-zero coefficients? And if we could just skip all the zeroes? The complexity of that dot product falls to $O(1)$, and the complexity of a matrix-vector multiply falls to $O(N)$–linear time! And given that it’s in the service of solving a $Ax = b$ problem, that’s far better than the expected $O(N^3)$! The catch: if we want to iterate as many times as it’ll take to achieve a fixed amount of improvement, independent of the grid size, the total complexity is higher. Really, true linear complexity lies in the domain of multigrid methods, which is way outside when I’m familiar with and outside the scope of this article. We’ll instead iterate a fixed number of times and call it day.

Here’s what the code for that would look like. Please forgive my extensive use of pointers here.

struct pois_context {
    float *d;
    float dx;
};

static inline int index(int i, int j, int dim_x) {
    return dim_x*j+i;
}

static float pois_expr_safe(float *p, int i, int j, int dim_x, int dim_y,
        void *ctx)
{
    struct pois_context *pois_ctx = (struct pois_context*)ctx;
    int i_max = dim_x-1, j_max = dim_y-1;

    float p_sum = 0;
    int a_ii = 0;
    if (i > 0) {
        p_sum += *(p-1);
        ++a_ii;
    }
    if (i < i_max) {
        p_sum += *(p+1);
        ++a_ii;
    }
    if (j > 0) {
        p_sum += *(p-dim_x);
        ++a_ii;
    }
    if (j < j_max) {
        p_sum += *(p+dim_x);
        ++a_ii;
    }

    static const float neg_a_ii_inv[5] = {0, 0, -1.0/2.0, -1.0/3.0, -1.0/4.0};
    int ij = index(i, j, dim_x);

    return neg_a_ii_inv[a_ii] * (pois_ctx->dx * pois_ctx->d[ij] - p_sum);
}

void poisson_solve(float *p, float *div, int dim_x, int dim_y, float dx,
        int iters, float *scratch)
{
    float *wrt = p, *rd = scratch, *temp;
    int ij;

    for (ij = 0; ij < dim_x*dim_y; ij++) {
        p[ij] = 0;
        scratch[ij] = 0;
    }

    struct pois_context pois_ctx = {.d = div, .dims = dims, .dx = dx};

    for (int k = 0; k < iters; k++) {
        for (int j = 0; j < dim_y; j++) {
            for (int i = 0; i < dim_x; i++) {
                ij = index(i, j, dim_x);
                wrt[ij] = pois_expr_safe(&rd[ij], i, j, dim_x, dim_y, &pois_ctx);
            }
        }
        temp = wrt;
        wrt = rd;
        rd = temp;
    }

    if (wrt == scratch) {
        for (ij = 0; ij < dim_x*dim_y; ij++) {
            p[ij] = scratch[ij];
        }
    }
}

However! If we want to run this on an ESP32, that is too many if statements to be going through in the main loop. Instead, if we sweep through all the non-boundary elements first, we can skip those if statements. This creates a fast path.

Here’s what that looks like, on top of the previous code.

void domain_iter(float (*expr_safe)(float*, int, int, int, int, void*),
        float (*expr_fast)(float*, int, int, int, int, void*), U *wrt,
        T *rd, int dim_x, int dim_y, void *ctx)
{
    int i_max = (dim_x)-1, j_max = (dim_y)-1;  // inclusive!
    int ij, ij_alt;

    // Loop over the main body
    for (int j = 1; j < j_max; ++j) {
        for (int i = 1; i < i_max; ++i) {
            ij = index(i, j, dim_x);
            wrt[ij] = expr_fast(&rd[ij], i, j, dim_x, dim_y, ctx);
        }
    }

    // Loop over the top and bottom boundaries (including corners)
    for (int i = 0; i <= i_max; ++i) {
        ij = index(i, 0, dim_x);
        wrt[ij] = expr_safe(&rd[ij], i, 0, dim_x, dim_y, ctx);
        ij_alt = index(i, j_max, dim_x);
        wrt[ij_alt] = expr_safe(&rd[ij_alt], i, j_max, dim_x, dim_y, ctx);
    }

    // Loop over the left and right boundaries
    for (int j = 1; j < j_max; ++j) {
        ij = index(0, j, dim_x);
        wrt[ij] = expr_safe(&rd[ij], 0, j, dim_x, dim_y, ctx);
        ij_alt = index(i_max, j, dim_x);
        wrt[ij_alt] = expr_safe(&rd[ij_alt], i_max, j, dim_x, dim_y, ctx);
    }
}

static float pois_expr_fast(float *p, int i, int j, int dim_x, int dim_y,
        void *ctx)
{
    struct pois_context *pois_ctx = (struct pois_context*)ctx;

    float p_sum = ((*(p-1) + *(p+1)) + (*(p-dim_x) + *(p+dim_x)));

    int ij = index(i, j, dim_x);
    return -0.25f * (pois_ctx->dx * pois_ctx->d[ij] - p_sum);
}

void poisson_solve(float *p, float *div, int dim_x, int dim_y, float dx,
        int iters, float *scratch)
{
    float *wrt = p, *rd = scratch, *temp;

    for (int ij = 0; ij < dim_x*dim_y; ij++) {
        p[ij] = 0;
        scratch[ij] = 0;
    }

    struct pois_context pois_ctx = {.d = div, .dims = dims, .dx = dx};

    for (int k = 0; k < iters; k++) {
        domain_iter(pois_expr_safe, pois_expr_fast, wrt, rd, dim_x, dim_y,
            &pois_ctx);
        temp = wrt;
        wrt = rd;
        rd = temp;
    }

    if (wrt == scratch) {
        for (int ij = 0; ij < dim_x*dim_y; ij++) {
            p[ij] = scratch[ij];
        }
    }
}

That out of the way, why does the Jacobi method work in the first place?

Take the iteration rule and subtract the equation it comes from. Then, it reduces because the $b$ from each cancel.

\[\begin{align*} & & x^{(k+1)} & = D^{-1} (\cancel{b} - (L+U) x^{(k)} )\\ & -( & x & = D^{-1} (\cancel{b} - (L+U) x) & ) \\ \hline \\[-1em] & & x^{(k+1)} - x & = - D^{-1} (L+U) (x^{(k)} - x) \end{align*}\]

Consider what $x^{(k+1)}-x$ and $x^{(k)}-x$ mean. They’re the error of our guesses! With every iteration, this error is multiplied by $-D^{-1} (L+U)$! Let’s call this matrix $C_\text{Jac}$.

For ease of explanation, let’s assume that $C_\text{Jac}$ can be diagonalized (for the otherwise defective matrices, there’s another way to go about this that’s similar). Because it can be diagonalized, the error can be expressed as a linear combination of its eigenvectors,

\[x^{(k)} - x = c_1^{(k)} v_1 + c_2^{(k)} v_2 + \dots + c_n^{(k)} v_n\]

whereby each component gets multiplied by the corresponding eigenvalue in each iteration.

\[\begin{align*} x^{(k+1)} - x & = C_\text{Jac} (c_1^{(k)} v_1 + c_2^{(k)} v_2 + \dots + c_n^{(k)} v_n) \\ & = c_1^{(k)} C_\text{Jac} v_1 + c_2^{(k)} C_\text{Jac} v_2 + \dots + c_n^{(k)} C_\text{Jac} v_n \\ & = c_1^{(k)} \lambda_1 v_1 + c_2^{(k)} \lambda_2 v_2 + \dots + c_n^{(k)} \lambda_n v_n \end{align*}\]

Let’s focus on a single term here—say $c_1^{(k)} \lambda_1 v_1$. On the next iteration, it gets multiplied by $C_\text{Jac}$ again, meaning that it becomes $c_1^{(k)} \lambda_1^2 v_1$. You can see that doing this repeatedly gets us an exponential sequence. The same can be said for the other terms. If any of the eigenvalues have a magnitude that is greater than one, then Jacobi iteration wouldn’t work because the error would explode. However, if all the eigenvalues have magnitudes that are less than one, then the error would converge to zero. The point: $x^{(k+1)}$ would always be a better guess than $x^{(k)}$.

The largest magnitude of all the eigenvalues is said to be the “spectral radius” of the matrix. Some of the eigenvalues having a magnitude over one is the same as the spectral radius being over one. All of them having a magnitude under one is the same as the spectral radius being under one. In the latter case, the eigenvector with the largest magnitude is decaying the slowest, but another way of looking at it is that all error decays at least as fast. For example, if the spectral radius of $C_\text{Jac}$ i.e. “$\rho(C_\text{Jac})$” is 0.9, then 20 iterations would multiply the error by 0.9 20 times. Actually working out what 0.9 to the 20th power is, we get about 0.12, or that the error is cut by about 88 percent. Going further to, say, 40 iterations, we get 0.014, or that the error is cut by 98.5 percent. We could keep going until the result is as accurate as needed. All-in-all, so long as the spectral radius is under one, Jacobi iteration can be used to solve our boundary value problem, giving the pressure field that appropriately fits the Helmholtz-Hodge decomposition.

That said, how can we know that it is? What is our spectral radius, actually? Well, about that…

I don’t know any proofs of one way or the other, but I just did provide a general—though not analytic—description of $A$. If we take some specific case of $A$, then there are iterative, sparse algorithms for (approximately) finding the eigenvalues and eigenvectors. In Python, the SciPy package offers one, scipy.sparse.linalg.eigs. Many months back, I thought that it would be perfect for finding the spectral radius. I wrote a script, and here it is.

import numpy as np
from math import prod
from matplotlib import pyplot as plt
import scipy.linalg
from scipy.sparse import csr_array, eye
import scipy.sparse.linalg

DX = 1.0
N_EIGS = 20
N_ROWS = 60
N_COLS = 80


def plot_eigs(
    arr: csr_array, subplots: tuple[int, int], neg_first: bool = True, digits: int = 5
):
    n_eigs = prod(subplots)

    eigs, v = scipy.sparse.linalg.eigs(arr, k=n_eigs, which="LM")

    # I haven't seen non-real values, so don't show them for legibility
    if np.any(np.imag(eigs)) or np.any(np.imag(v)):
        print("warning: eigenvalues or eigenvectors have an imaginary component")
    eigs = np.real(eigs)
    v = np.real(v)

    # round the eigenvalues for legibility, but check for ambiguity
    eigs = np.round(eigs, digits)
    pairwise_matches = np.count_nonzero(
        eigs[:, np.newaxis] == eigs[np.newaxis, :], axis=1
    )
    if np.any((pairwise_matches > 1) & (eigs != 0)):  # ignore self-match and zero
        print("warning: rounding has made some eigenvalues ambiguous")

    # in case of a pos-neg pair, strictly show pos or neg first
    is_pos = eigs > 0
    if neg_first:  # pos first, then the stable descending sort flips that
        eigs = np.hstack([eigs[is_pos], eigs[~is_pos]])
        v = np.hstack([v[:, is_pos], v[:, ~is_pos]])
    else:  # neg first, then the stable descending sort flips that
        eigs = np.hstack([eigs[~is_pos], eigs[is_pos]])
        v = np.hstack([v[:, ~is_pos], v[:, is_pos]])
    permute = np.argsort(np.abs(eigs), stable=True)[::-1]
    eigs = eigs[permute]
    v = v[:, permute]

    MIN_RANGE = 1e-9
    for i in range(n_eigs):
        eigenvector = v[:, i].reshape((N_ROWS, N_COLS))
        v_min = np.min(eigenvector)
        v_max = np.max(eigenvector)
        if v_max - v_min < MIN_RANGE:  # probably a flat plane
            v_max += MIN_RANGE
            v_min -= MIN_RANGE
        plt.subplot(*subplots, i+1)
        plt.imshow(eigenvector, vmin=v_min, vmax=v_max, cmap="coolwarm")
        plt.axis("off")
        plt.title('$\\lambda$ = %0.*f' % (digits, eigs[i]))

    plt.tight_layout()


op = scipy.sparse.linalg.LaplacianNd((N_ROWS, N_COLS), boundary_conditions="neumann")
arr = (1 / (DX ** 2)) * csr_array(op.tosparse(), dtype=np.float64)

# Jacobi
diag_inv = arr.diagonal(0) ** -1
lu = arr.copy()
lu.setdiag(0, 0)
iter_matrix_jacobi = diag_inv[:, np.newaxis] * lu  # implements D^(-1) (L + U)
plot_eigs(iter_matrix_jacobi, (3, 4), neg_first=True)
plt.savefig("eigs_jacobi.png", dpi=300)
plt.cla()

# Richardson
omega = (DX ** 2) / -4
iter_matrix_richardson = csr_array(eye(arr.shape[0]), dtype=np.float64) - omega * arr
plot_eigs(iter_matrix_richardson, (3, 4), False)
plt.savefig("eigs_richardson.png", dpi=300)
plt.cla()

Given some $\Delta x$ and some grid, it calculates $A$, calculates $C_\text{Jac}$ from $A$ (and $C_\text{Richardson}$, but we’ll get to that) then finds the top twenty eigenvalues with the largest magnitude—plus the corresponding eigenvectors. Then, the spectral radius is just taking the first largest magnitude. It’s not useful for making a general statement about all cases of $A$, but it’s all I need for my specific case. I also went ahead here and reshaped the eigenvectors back into arrays, letting us visualize whatever eigenvector getting hit with so-and-so eigenvalue as the component of a field getting diminished.

In the case of ESP32-fluid-simulation, I was limited by the ESP32’s RAM to a $60 \times 80$ grid, and I happened to let $\Delta x = 1$. Given that, the script got this:

Note: red is positive and blue is negative.

Immediately, we can notice two things:

The spectral radius of $C_\text{Jac}$ appears to be not less than one because two of the eigenvalues are $1$ and $-1$.
There seems to be positive-negative pairs of eigenvalues, with the eigenvector associated with the positive one looking like the negative counterpart but multiplied by a checkerboard pattern.

From what I’ve gathered, both of these things are to be expected.

Regarding (2), following linked references on Wikipedia led me to a 1950 PhD thesis by David M. Young—an analysis of “successive over-relaxation” in the context of finite difference schemes. Setting aside what “successive over-relaxation” is for now, I worked out that $A$ happens to satisfy what Young calls “property A”. In fact, Young was deriving results from this “property A” with a finite difference problem like this in mind! The thesis is publicly available if you want to see how Young describes it exactly, but in this context, it’s this: given $A$ only has points interact with their neighbors, the grid can be divided into a succession of sets of points where each set only interacts with the preceding and the succeeding set.

The numbering of a three-by-five grid that works toward satisfying property A. Notice how a point labeled "three" only has points labeled "two" and "four" for neighbors.

Because property A is satisfied, much of the conclusions of that analysis follow. That includes a theorem that states that the eigenvalues of Jacobi iteration (which he called the “Kormes method”, apparently) will come in positive-negative pairs, associated with a pair of eigenvectors where one is just a copy of the other but multiplied by $1$ and $-1$ alternating. That’s exactly what we’re seeing here!

As you might expect, we’ll be interested in Property A and “successive over-relaxation” soon.

Regarding (1), I found a Stack Exchange post that states that an $A$ constructed from a boundary value problem with only i.e. “purely” Neumann boundaries must have infinitely many solutions, separated by a constant, because the entire boundary value problem, which is just the governing equations (which are partial differential equations) and the boundary conditions, then only concerns the derivative of the pressure field. It’s like tacking on “$+\;C$” to the end of an indefinite integral’s result. That means there needs to be a place to choose that constant.

Looking at the eigenvector corresponding to $-1$: it’s very flat. As a flat component that doesn’t diminish over iterations, it looks to me like setting the initial weight of this component is equivalent to choosing the constant part of the solution.

In my case, I always started with a zeroed-out field as the first guess, which zeroes out that component too. But in any case, the neat part is that the choice of constant doesn’t matter because we ultimately subtract the gradient of the pressure from the uncorrected velocity. The gradient consists of partial derivatives, and the constant part doesn’t change its value. The same can be said for its checkerboard counterpart because the discrete gradient, using central differences, always happens to use points with the same color, so to speak. In other words, we’re free to take the next largest eigenvalue as our “spectral radius”—to abuse the term a bit. For our 60 x 80 grid (and a $\Delta x$ of 1), that’s 0.9996. Since “spectral radius” is less than one, we can be sure that Jacobi iteration solves this particular boundary value problem.

One more “but actually” before we move on: I mentioned that my script can also calculate $C_\text{Richardson}$. I noticed in my sources that their codes for computing “Jacobi” iteration element-wise didn’t exactly do $D^{-1} (b - (L+U) x^{(k)})$. When it came time to multiply by $a_{ii}^{-1}$, which can be either $-\frac{\Delta x^2}{4}$, $-\frac{\Delta x^2}{3}$, or $-\frac{\Delta x^2}{2}$ depending on whether the $i$-th point is at the boundary, they just always multiplied by $-\frac{\Delta x^2}{4}$ instead. At the same time, instead of making sure to not take the top neighbor at the top boundary, left neighbor at the left boundary, et cetera, they pulled the ghost row/column value instead. I realized later that this was technically Richardson iteration.

Richardson iteration comes from a different splitting of $A$ into a diagonal matrix of constants and the remainder.

\[A = \alpha I + (A - \alpha I)\]

With this sum, we can do the following derivation from $Ax = b$, like we did for Jacobi iteration.

\[\begin{align*} Ax & = b \\ (\alpha I + (A - \alpha I)) x & = b \\ \alpha I x + (A - \alpha I) x & = b \\ \alpha I x & = b - (A - \alpha I) x \\ x & = \alpha^{-1} I (b - (A - \alpha I)x) \\ & = \alpha^{-1} b - \alpha^{-1} (A - \alpha I) x \\ & = \alpha^{-1} b + (I - \alpha^{-1} A) x \\ & \left\downarrow \text{Let }\omega = \alpha^{-1} \right. \\ & = \omega b + (I - \omega A) x\end{align*}\]

We can also turn that into the following iteration rule.

\[x^{(k+1)} = \omega b + (I - \omega A) x^{(k)}\]

Consider letting $\alpha = \frac{-4}{\Delta x^2}$ (conversely $\omega = \frac{\Delta x^2}{-4}$) so that $\alpha$ is exactly $a_{ii}$ if all four neighbors were there. Then, if $\alpha I$ is subtracted from $A$, a zero remains where $a_{ii}$ once was. Now, notice in the derivation how $(A - \alpha I)$ times $-\alpha^{-1}$ is $I - \omega A$. If there was a zero on the diagonal of $(A - \alpha I)$, there’s still a zero there in $I - \omega A$. This says the pressure at the center isn’t in play, like how it’s not in Jacobi iteration. Overall, it can be shown that Richardson iteration and Jacobi iteration happen to be identical in our problem, except at the boundary.

There, $a_{ii}$ is $\frac{-3}{\Delta x^2}$ or $\frac{-2}{\Delta x^2}$, and $-\frac{1}{4}$ or $-\frac{2}{4}$ is left, not zero, and so the pressure at the center point gets pulled in. When this is compared to how Stam’s “Real-Time Fluid Dynamics for Games” and the GPU Gems article pull a value from the ghost row/column, which in turn pulls from the center, it can be shown that what they’re doing is actually Richardson iteration! (Stam doesn’t do Richardson iteration to the tee, but we’ll get to that.)

When it comes to code, those articles proceed to use the ghost rows and columns as actual rows and columns in memory that are kept up-to-date. That lets them use the fast path on every point because every point does then have all four “neighbors”. This also has the same function run on all points, which is important for GPUs. But for us, that idea is incompatible with the C code I’ve shown so far. At the very least, Richardson iteration could still be achieved by updating the safe path code, and we wouldn’t need to update the fast path code because that part is the same.

static float pois_expr_safe(float *p, int i, int j, int dim_x, int dim_y,
        void *ctx)
{
    struct pois_context *pois_ctx = (struct pois_context*)ctx;
    int i_max = dim_x-1, j_max = dim_y-1;

    float p_sum = 0;
    p_sum += (i > 0) ? *(p-1) : *p;
    p_sum += (i < i_max) ? *(p+1) : *p;
    p_sum += (j > 0) ? *(p-dim_x) : *p;
    p_sum += (j < j_max) ? *(p+dim_x) : *p;

    int ij = INDEX(i, j, dim_x);
    return -0.25 * (pois_ctx->dx * pois_ctx->d[ij] - p_sum);
}

To be clear, this function wouldn’t work in my current code. It relies on the ghost rows and columns becoming actual, allocated memory, being constantly updated with the values of the real rows and columns in between iterations.

That’s beside the point that I wanted to make, though. Since they actually used Richardson iteration, I wanted to make sure that the spectral radius of $C_\text{Richardson}$ was also under one, so I added on to the script. To do that, I just needed to know what matrix the error gets multiplied by, and we can use the same equation subtraction as before to find that it’s $(I - \omega A)$.

\[\begin{align*} & & x^{(k+1)} & = \cancel{\omega b} + (I - \omega A) x^{(k)}\\ & -( & x & = \cancel{\omega b} + (I - \omega A) x & ) \\ \hline \\[-1em] & & x^{(k+1)} - x & = (I - \omega A) (x^{(k)} - x) \end{align*}\]

Here’s its eigenvalues and eigenvectors for the $60 \times 80$ grid and $\Delta x = 1$:

Though the eigenvectors are somewhat different, the spectral radius is completely within rounding error to that of Jacobi iteration in this case. It makes sense, given that $-\frac{4}{\Delta x^2}$ was the typical value on the diagonal. So, I wouldn’t be so concerned with the distinction, but I know I should at least point it out.

By showing exactly how Jacobi iteration manifests into code via sparse matrices and discussing the spectral radius, we’ve covered all the context that I once used to go beyond. Let’s start with a motivating question: why did I need to go beyond in the first place? The answer lies in the spectral radius we just found. With a spectral radius of $0.9996$, how many iterations would it take to cut the error by even just 90 percent? 5755 iterations! We can never hope to just use Jacobi iteration on an ESP32.

The first step beyond is to use “Gauss-Seidel” iteration instead. We’ll get to how to derive it, but I think it’s better to start with an interesting motivator. Classical “Gauss-Seidel” iteration is eerily similar to Jacobi iteration in implementation, despite how different it is on paper. Instead of element-wise assembling the next pressure array in an output memory, what if we compute it in the same memory as the current pressure array—overwriting one element at a time? Here’s what that would look like.

void poisson_solve(float *p, float *div, int dim_x, int dim_y, float dx,
        int iters, float omega)
{
    for (int ij = 0; ij < dim_x*dim_y; ij++) {
        p[ij] = 0;
    }
    struct pois_context pois_ctx = {.d = div, .dx = dx};
    for (int k = 0; k < iters; k++) {
        domain_iter(pois_expr_safe, pois_expr_fast, p, p, dim_x, dim_y,
            &pois_ctx);
    }
}

Simpler, no? And notice how p is passed in as both the input and the output. It sounds like taking a shortcut at the cost of correctness, but this happens to implement the classical case of “Gauss-Seidel”. By doing this, we happen to use the values of the next pressure array when we pull the bottom and left neighbors, and we can converge faster for doing so. That’s the quintessential part of it. Naturally, Stam’s method of choice in “Real-Time Fluid Dynamics for Games” was Gauss-Seidel, or—to be pedantic—a hybrid of Gauss-Seidel and Richardson iteration. Meanwhile, the GPU Gems article sticks with doing many Jacobi iterations, probably because immediately using elements that were just calculated makes GPU parallelization awkward.

What does matter is this: in Jacobi iteration, we could have looped through the elements in any order with no impact, but in Gauss-Seidel iteration, the order does have an impact. The code we just saw happens to loop through the elements in the conventional left-to-right bottom-to-top order, and doing so pulls from the bottom and left immediately. That’s the “classical” case, but there’s also “red-black Gauss-Seidel iteration”. Its only difference is in the order.

Imagine the grid being colored in a checkerboard red-black pattern, where black points only neighbor red points and red points only neighbor black points. “Red-black” iteration is to just visit all the red points and then visit all the black points after (or all black points then red points). Anyway, the result is this: in the first half, none of the neighbors are of the next pressure array, but in the second half, all of them are.

\[p^{(k+1)}[i, j] = \begin{cases} \displaystyle - \frac{\Delta x}{4} \left( d[i, j] - \frac{p^{(k)}[i+1, j] + p^{(k)}[i-1, j] + p^{(k)}[i, j+1] + p^{(k)}[i, j-1]}{\Delta x^{2}} \right) & \text{red } i, j \\[1em] \displaystyle - \frac{\Delta x}{4} \left( d[i, j] - \frac{p^{(k+1)}[i+1, j] + p^{(k+1)}[i-1, j] + p^{(k+1)}[i, j+1] + p^{(k+1)}[i, j-1]}{\Delta x^{2}} \right) & \text{black } i, j \end{cases}\]

For thinking about what the code for such a peculiar looping would be, here are the two key tricks:

all points one step (horizontal or vertical) away are the other color,
all points two steps away are the same color, and
if we assume that point $0,\;0$ is black, then every point where $i+j$ is even must be black.

The numbering from before, now colored according to red-black order, and it happens that the numbering is equal to $i+j+1$. So, this helps show how the point must be black when $i+j$ is even.

And now, here’s what the code for that would look like.

static inline int point_is_red(int i, int j) {
    return (i + j) & 0x1;
}

static void domain_iter_red_black(
    float (*expr_safe)(float*, int, int, int, int, void*),
    float (*expr_fast)(float*, int, int, int, int, void*), U *wrt, T *rd,
    int dim_x, int dim_y, void *ctx)
{
    int i_max = (dim_x)-1, j_max = (dim_y)-1;  // inclusive!
    int ij, offset;

    int bottom_left_is_red = point_is_red(0, 0),
        bottom_right_is_red = point_is_red(i_max, 0),
        top_left_is_red = point_is_red(0, j_max);

    int on_red = 0;
    repeat_on_red:  // on arrival to this label, on_red = 1

    // Loop over the main body (starting from 1,1 as black or 2,1 as red)
    for (int j = 1, offset = on_red; j < j_max; ++j, offset ^= 1) {
        for (int i = 1+offset; i < i_max; i += 2) {
            ij = index(i, j, dim_x);
            wrt[ij] = expr_fast(&rd[ij], i, j, dim_x, dim_y, ctx);
        }
    }

    // Loop over the bottom (including left and right corners)
    offset = (on_red == bottom_left_is_red) ? 0 : 1;
    for(int i = offset; i <= i_max; i += 2) {
        ij = index(i, 0, dim_x);
        wrt[ij] = expr_safe(&rd[ij], i, 0, dim_x, dim_y, ctx);
    }

    // Loop over the top (including left and right corners)
    offset = (on_red == top_left_is_red)? 0 : 1;
    for(int i = offset; i <= i_max; i += 2) {
        ij = index(i, j_max, dim_x);
        wrt[ij] = expr_safe(&rd[ij], i, j_max, dim_x, dim_y, ctx);
    }

    // Loop over the left (starting from 0,1 or 0,2)
    offset = (on_red == !bottom_left_is_red) ? 1 : 2;  // we're *adjacent* it
    for(int j = offset; j < j_max; j += 2) {
        ij = index(0, j, dim_x);
        wrt[ij] = expr_safe(&rd[ij], 0, j, dim_x, dim_y, ctx);
    }

    // Loop over the right (starting from i_max,1 or i_max,2)
    offset = (on_red == !bottom_right_is_red)? 1 : 2;
    for(int j = offset; j < j_max; j += 2) {
        ij = index(i_max, j, dim_x);
        wrt[ij] = expr_safe(&rd[ij], i_max, j, dim_x, dim_y, ctx);
    }

    if (!on_red) {
        on_red = 1;
        goto repeat_on_red;
    }
}

The change in going with this over classical Gauss-Seidel isn’t critical. I just did it because it made the sim look better. Though, to be frank, I’m not sure why. In fact, by starting from what’s presented in Young’s analysis, it can be shown that both methods have the same spectral radius. I’ll get to that at some point.

I think Gauss-Seidel iteration is better thought of as an improvement that emerges from taking that shortcut in the element-wise computation, but if you’re curious, here’s how it emerges when starting from the linear algebra. Recall the $A = D + L + U$ splitting, and let $S$ be the diagonal plus the lower-triangular parts, $D + L$.

\[\begin{align*} A & = D + L + U \\ & \left\downarrow \text{ Let } S = D + L \right. \\ & = S + U \end{align*}\] \[\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} = \begin{bmatrix} a_{11} & 0 & 0 & \cdots & 0 \\ a_{21} & a_{22} & 0 & \cdots & 0 \\ a_{31} & a_{32} & a_{33} & \ddots & \vdots \\ \vdots & \vdots & \vdots & \ddots & 0 \\ a_{n1} & a_{n2} & a_{n3} & \vphantom{\vdots} \cdots & a_{nn} \end{bmatrix} + \begin{bmatrix} 0 & a_{12} & a_{13} & \cdots & a_{1n} \\ & 0 & a_{23} & \cdots & a_{2n} \\ & & 0 & \ddots & \vdots \\ & & & \ddots & a_{(n-1)n} \\ & & & & 0 \end{bmatrix}\]

Like in Jacobi iteration and Richardson iteration, we derive something from $Ax = b$, as follows.

\[\begin{align*} Ax & = b \\ (S+U) x & = b \\ Sx + Ux & = b \\ x & = S^{-1} (b - Ux)\end{align*}\]

And then, we define an iteration rule based on that.

\[x^{(k+1)} = S^{-1} (b - U x^{(k)})\]

That said, we now have to do something slightly different when we’re going element by element. Calculating $b - U x^{(k)}$ is like calculating $b - (L + U) x^{(k)}$ from Jacobi iteration, just without the left and bottom neighbors, which were associated with $L$. But how on earth do we compute the multiplication of $b - U x^{(k)}$ with $S^{-1}$ when it isn’t diagonal anymore? We exploit the fact that $S$ is sparse and lower-triangular. Yes, even though $S$ includes the diagonal cut-out $D$, it is still lower-triangular. When it comes to definitions, “lower-triangular” can include a non-zero diagonal, whereas a “strictly lower-triangular” one cannot.

There are two questions at hand here. First, I stated that just modifying the Jacobi iteration procedure to overwrite old elements in the given array gives Gauss-Seidel iteration. This implies that Gauss-Seidel iteration also computes the elements of $x^{(k+1)}$ one by one. What kind of linear algebra construct also lends to such an implementation? Second, we don’t get to know what $S^{-1}$ is for free anymore because it’s not diagonal. What should we do about it?

The answer? $S$ being lower-triangular lets us kill two birds with one stone by using “forward substitution”.

Forward substitution is analogous to the latter half of Gaussian elimination, specifically the back-substitution phase that follows finding the row-echelon form. First, recognize that calculating the product of $S^{-1}$ and $(b - U x^{(k)})$ is equivalent to solving the equation $Sz = (b - U x^{(k)})$ for $z$. Solving this equation sidesteps needing to know $S^{-1}$. Forward substitution also happens to yield the elements of $z$ one by one. To show it, let’s first recast the problem into another $Ax = b$, where $A$ is $S$, $b$ is $(b - U x^{(k)})$, and $x$ is $z$.

Then, while recognizing that $S$ is lower-triangular, we should write out each individual equation.

\[\begin{align*} & a_{11} x_1 & & = b_1 \\ & a_{21} x_1 + a_{22} x_2 & & = b_2 \\ & & & \enspace \vdots \\ & a_{n1} x_1 + a_{n2} x_2 + \dots + a_{nn} x_n & & = b_n \end{align*}\]

The first equation has an obvious solution, $x_1 = \frac{b_1}{a_{11}}$. Then, we can substitute the value of $x_1$ into all the following equations. That makes the second equation solvable, yielding $x_2 = \frac{1}{a_{22}} (b_2 - a_{21} x_1)$. Substituting this result downward makes the third equation solvable, and so on.

After looking at how forward substitution yields the elements of $x$, consider now how it pulls elements of $b$. The first equation uses $b_1$, the second uses $b_2$, and so on. So, if $b$ was in fact the value of some expression, can we not interleave the computation of elements of $x$ and of $b$? We can calculate $b_1$ then $x_1$, $b_2$ then $x_2$, and so on. Now, remember that “$b$” here is $b - U x^{(k)}$ and “$x$” here is $z$ i.e. $S^{-1} (b - U x^{(k)})$. Computing some element of “$b_i$” involves the right and top neighbors of point $i$ because $U$ is upper-triangular, and computing “$x_i$” involves the forward-substituted elements of $x$ from the bottom and left neighbors (since $S$ is lower-triangular).

Now, what if I told you that forward-substitution could be implemented without any real, tangible substituting all the way down? It’s always interesting when something is invoked in the definition of a procedure but not explicitly realized!

First, let’s state the obvious: once $x^{(k+1)}$ is computed, $x^{(k)}$ is discarded. Now, we should chart the course that some given element $x^{(k)}_i$ takes to its ultimate fate. Unlike in Jacobi iteration, there is no $D x^{(k)}$ term, i.e. no $x^{(k)}$ term for the bottom and left neighbors. Instead, $x^{(k+1)}$ terms for those neighbors show up as part of the forward-substitution. Anyway, there is only a $U x^{(k)}$ term, and the result is this: once some point $i$ is no longer the top-neighbor or right-neighbor of any other point $j$ that hasn’t had its $x^{(k+1)}_j$ calculated yet, $x^{(k)}_i$ will never be used again. In other words, point $i$ is ready to be discarded after its left-neighbor and its bottom-neighbor are reached. Well, given the left-to-right, bottom-to-top order, reaching point $i$ implies that those neighbors were already reached. From a memory-saving perspective, can we not have it so that $x^{(k+1)}_i$ is where $x^{(k)}_i$ used to be?

Second, one might imagine forward substitution as literally constructing a series of equations with all the coefficients physically arranged like a triangle, but all the essentials of it is just to find one element of the solution at a time–yielding $x^{(k+1)}_i$ for $i$ from $1$ to $N$–by using the previously-found elements to find the next. In an actual procedure, those elements can be stored in whatever way. In our case, each step $i$ produces $x^{(k+1)}_{i}$ at a time when $x^{(k)}_i$ no longer necessary. To overwrite with each step is to progressively finish consuming elements of the input and put elements of the solution in its place. To top it all off, doing so makes the relevant previously-found elements very easy to find–just a step left or step down away.

Here we have a procedure that loops through the points, overwriting as it goes and pulling the just-overwritten left and bottom neighbors. Et voilà, we’ve arrived at the same Gauss-Seidel procedure as before!

Finally, the difference between this and red-black Gauss-Seidel is this: by looping through the red points then through the black points, we’ve permuted the elements of $x$ (from the original $Ax = b$) into a new order that follows this looping, an order distinct from the conventional left-to-right bottom-to-top order, mapping from 2D position to 1D index. That’s best demonstrated with our good old three-by-three example.

The color representation of $A$ from before, and a color representation of the equivalent matrix for an $x$ that was permuted in black-then-red order. (Fun fact: the three components can now be isolated by partitioning the matrix into a block matrix.)

Feel free to notice here how all the red points and black points are consecutive now!

Formally, this can be written as a matrix similarity between $A$ and what we’ll call $A_\text{red-black}$, with a permutation matrix $P$ as the change-of-basis.

\[P^{-1} A_\text{red-black} P = A\]

Then, $Ax = b$ can be written as $P^{-1} A_\text{red-black} P x = b$. This expresses how the elements of $x$ are permuted, put through $A_\text{red-black}$, then permuted back into conventional order. In practice, the red-black code doesn’t actually shuffle the memory. Rather, the changed looping does the work, like how the Gauss-Seidel code doesn’t actually do any substitutions to implement forward substitution.

That said, though this matrix is definitionally similar, its $D + L + U$ splitting takes on a starkly different meaning. No longer does $L$ mean the left and bottom neighbors and $U$ mean the top and right neighbors. What does still stand is the fact that $L$ stands for the preceding and $U$ stands for the succeeding. Consider how $x$ is all-red and then all-black. Constructing a Gauss-Seidel procedure with this order gives the all-red-then-all-black looping. And in this procedure, $L$ comes to mean the red points and $U$ comes to mean the black points.

Last but not least, the next step beyond Gauss-Seidel iteration is “successive over-relaxation”, or “SOR”. The intuition of it goes like this: if stepping from $p^{(k)}[i, j]$ to $p^{(k+1)}[i, j]$ is in the right direction, what if we went further in that direction? It’s a linear extrapolation from $p^{(k)}[i, j]$ to $p^{(k+1)}[i, j]$ and onward. That is,

\[p_\text{SOR}^{(k+1)}[i, j] = \omega p_\text{G-S}^{(k+1)}[i, j] + (1-\omega) p^{(k)}[i, j]\]

where $p_\text{G-S}^{(k+1)}[i, j]$ is the result from Gauss-Seidel iteration and $\omega$ is ordinarily a parameter between $0$ and $1$ that slides us between $p_\text{G-S}^{(k+1)}[i, j]$ and $p^{(k)}[i, j]$. It quite literally is the same expression as linear interpolation (yes, that lerp). But here, we’re free to push $\omega$ beyond $1$. Doing so can get us closer to the solution faster—much faster. Let’s look at its spectral radius.

So far, we have had nothing concrete to say about the spectral radius of Gauss-Seidel iteration—not what we can expect it to be, much less whether it is better than Jacobi iteration. The same can be said for SOR. To preface a bit first, you may have already realized that Gauss-Seidel can also be thought of as the $\omega = 1$ case of SOR. This means that an analysis of SOR will also cover Gauss-Seidel. That out of the way, Young found that, for matrices satisfying Property A, the spectral radius of SOR is a function of $\omega$ and the spectral radius of Jacobi iteration! Denoting $\rho(C_\text{Jac})$ as $\mu$ here, it goes like this:

\[\rho(C_\omega) = \begin{cases} \frac{1}{4} \left( \omega \mu + \sqrt{\omega^2\mu^2 - 4(\omega-1)} \right)^2 & 0 \leq \omega \leq \omega_\text{opt} \\ \omega - 1 & \omega_\text{opt} \leq \omega \leq 2 \end{cases}\]

where $\omega_\text{opt}$ is given by the following expression

\[1 + \left( \frac{\mu}{1+\sqrt{1-\mu^2}} \right)^2\]

and happens to be the value of $\omega$ that minimizes $\rho(C_\omega)$. The following plot make things clearer.

Plots of a couple $\rho(C_\omega)$ curves, varying in $\mu$. It's continuous at $\omega_\text{opt}$, even though the definition is piecewise. By HerrHartmuth via Wikimedia and modified by me (adjusted the text). Hereby released under Creative Commons Attribution Share-Alike 4.0.

With this, we can check out what the radius of Gauss-Seidel iteration is. Plugging $1$ into the formula, we get that

\[\begin{align*}\rho(C_1) & = \frac{1}{4} \left( \mu + \sqrt{\mu^2} \right)^2 \\ & = \frac{1}{4} (2\mu)^2 \\ & = \mu^2 \end{align*}\]

meaning that one Gauss-Seidel step is worth two Jacobi steps.

Now, let’s see how many Jacobi steps one SOR step is worth. Given our $60 \times 80$ grid and $\Delta x = 1$, we found earlier that our spectral radius was $0.9996$. Letting $\mu = 0.9996$, we get that $\omega_\text{opt} = 1.96$. Passing this into the formula for $\rho(C_\omega)$, we get that $\rho(C_{1.96}) = 0.96$. The number of steps is the exponent that takes $0.9996$ to $0.96$, or in other words $\log_{0.9996}(0.96) = 102.034$. One step of SOR is worth over a hundred Jacobi steps. Granted, at this point, I’d bet that we here have pushed the ideas of SOR to their breaking point. Here, I suspect that the “conjugate gradient” method makes more sense. In fact, in that response I got from Stam, he mentioned that it was what he used to find the pressure, along with using a MAC grid. Though the grids aren’t the same, it sounds like it could work, but that must be a side project for another day. This has been a long post, and this has been a long series.

This series has had more than a few tangents, and this one will be the last. I had seen the red-black order’s output looked much better than going in conventional order, so why did I assert that their spectral radii are the same?

First, actually, the order matters in Young’s derivation of SOR’s spectral radius. It only holds if the order is what he called a “consistent order”. After going through what makes a “consistent order”, he went on to show four cases. Red-black is in indeed one of them, but the conventional order is also one. To list them all, there is

$\sigma_1$, a left-to-right, top-to-bottom order (equivalent to left-to-right, bottom-to-top),
$\sigma_2$, a wavefront order,
$\sigma_3$, the red-black order, and
$\sigma_4$, a zigzag order.

You can see those orderings below. Points are in the order given by their numbers, though points with the same number can be evaluated in any order.

The four orderings Young described, all "consistent".

Because order doesn’t matter in Jacobi iteration, all orderings lead to the same $\rho(C_\text{Jacobi})$. Now, for a consistent order, $\rho(C_\omega)$ is a function of $\rho(C_\text{Jacobi})$. If the latter is the same, then so should be the former.

Yet they look different when I actually run the two. Perhaps they have different eigenvectors? I don’t know why that is, really, but I went ahead and picked red-black SOR because there was no arguing against the fact that it looked better.

Here’s the culminating piece, a red-black SOR code, the thing that made ESP32-fluid-simulation possible.

struct pois_context {
    float *d;
    float dx;
    float omega;
};

static float pois_sor_safe(float *p, int i, int j, int dim_x, int dim_y,
        void *ctx)
{
    struct pois_context *pois_ctx = (struct pois_context*)ctx;
    float omega = pois_ctx->omega;
    float p_gs = pois_expr_safe(p, i, j, dim_x, dim_y, ctx);
    return (1-omega)*(*p) + omega*p_gs;
}

static float pois_sor_fast(float *p, int i, int j, int dim_x, int dim_y,
        void *ctx)
{
    struct pois_context *pois_ctx = (struct pois_context*)ctx;
    float omega = pois_ctx->omega;

    float p_sum = (*(p-1) + *(p+1)) + (*(p-dim_x) + *(p+dim_x));

    int ij = index(i, j, dim_x);
    float p_gs = -0.25f * (pois_ctx->dx * pois_ctx->d[ij] - p_sum);

    return (1-omega)*(*p) + omega*p_gs;
}

void poisson_solve(float *p, float *div, int dim_x, int dim_y, float dx,
        int iters, float omega)
{
    for (int ij = 0; ij < dim_x*dim_y; ij++) {
        p[ij] = 0;
    }
    struct pois_context pois_ctx = {.d = div, .dx = dx, .omega = omega};
    for (int k = 0; k < iters; k++) {
        domain_iter_red_black(pois_sor_safe, pois_sor_fast, p, p, dim_x, dim_y,
            &pois_ctx);
    }
}

In summary, to compute a divergence-free velocity field, we applied the Helmholtz-Hodge decomposition, but the decomposition itself gave us a boundary value problem to solve. Through discretization, we turned that problem into a massive $Ax = b$ problem, but one where $A$ was sparse. We used that sparsity to our advantage, choosing to approximately solve it with an iterative method that lets us compute the next guess at $x$ one element at a time—skipping the zeroes along the way. The first one we saw was Jacobi iteration. I shared how I got its spectral radius. Then, we saw Gauss-Seidel iteration and SOR. Finally, we learned that, for the class of matrices satisfying Property A, including our $A$, Gauss-Seidel was faster and SOR was dramatically faster. All this context was necessary to figuring out how to run a fluid sim on an ESP32.

With that, the actual fluid sim mechanism in ESP32-fluid-simulation is entirely covered, and that caps off the tour through its use of FreeRTOS, the Cheap Yellow Display (CYD), and the Navier-Stokes equations. This was quite the epic undertaking for me, building the sim more than one year ago then going to write on and off about it since. I’d do it again, considering that—in trying to explain it—I ended up formalizing my own understanding of my own project and updated it accordingly along the way. And now that understanding is here on the internet.

That said, this project is far from the last one I’ll ever make then write about. Look out for the next series, to be announced some day. In any case and for the last time, you can see ESP32-fluid-simulation on GitHub!

Rebuilding ESP32-fluid-simulation: the advection and force steps of the sim task (Part 4)

2024-01-20T00:00:00+00:00

If you’ve read Part 2 and Part 3 already, then you’re as equipped to read this part as I can make you. You would have already heard me mention that we should be passing in touch inputs, consisting of locations of velocities. You also would have already heard that we’re getting out color arrays. Some mechanism should be turning the former into the latter, and it should be broadly inspired by the physics, which is written out as partial differential equations. This post and the next post—the final ones—are about that mechanism. To be precise, this post covers everything but the pressure step, and the next will give that step its own airtime.

With that said, if I miss anything, the references I used might be helpful. That’s the GPU Gems chapter and Jamie Wong’s blog post, but there’s also Stam’s “Real-time Fluid Dynamics for Games” and “Stable Fluids”.

Now, to tell you what I’m going to tell you, a high-level overview is this:

apply “semi-Lagrangian advection”, an implementation of the advection operator (for “advection”, see Part 3), to the velocities,
apply the user’s input to the velocities,
apply “divergence-free projection” to velocities in order to correct it (i.e. the pressure step briefed in Part 3 and to be explained in the next part) and finally,
apply semi-Lagrangian advection to the density array with the updated velocities.

The process has four parts, and each part corresponds to a part of the physics. Let’s recall the partial differential equations that we ended up with in Part 3, that is:

\[\frac{\partial \rho}{\partial t} = - (\bold v \cdot \nabla) \rho\] \[\frac{\partial \bold v}{\partial t} = - (\bold v \cdot \nabla) \bold v - \frac{1}{\rho} \nabla p + \bold f\] \[\nabla \cdot \bold v = 0\]

Setting aside the incompressibility constraint for now—that’s the third equation $\nabla \cdot \bold v = 0$—the equations can be split into four terms. That’s one term for each part of the process. To list them in the order of their corresponding steps, there’s the advection of the velocity $-(\bold v \cdot \nabla) \bold v$, the applied force $\bold f$, the pressure $- \frac{1}{\rho} \nabla p$, and the advection of the density $-(\bold v \cdot \nabla) \rho$.

Before we get into each term and its corresponding part of the process, there’s a key piece of context to keep in mind. We’re faced with the definitions of $\frac{\partial \rho}{\partial t}$ and $\frac{\partial \bold v}{\partial t}$ here, and they have solutions which are density and velocity fields that evolve over time. That’s not computable. Computers can’t operate on fields—the functions of continuous space that they are—much less operate on ones that continuously vary over time. Instead, time and space need to be “discretized”.

Let’s tackle the discretization of time first. Continuous time can be approximated by a sequence of closely-spaced points in time. Usually, those points in time are regularly spaced apart by a timestep $\Delta t$, In other words, we use the sequence $\{\; 0,\; \Delta t,\; 2 \Delta t,\; 3 \Delta t,\; \dots \;\}$. The result should be that a field at some time $t$ in the sequence can be approximately expressed in terms of the field at the previous time $t_0 - \Delta t$. That is, we should be able to calculate an update to the fields. You may see how this is useful for running simulations. This general idea is called “numerical integration”, the simplest case being Euler’s method—yes, that Euler’s method, if you still remember it. In other cases, methods like implicit Euler, Runge-Kutta, and leapfrog integration offer better accuracy and/or stability, but that’s out-of-scope.

Now, let’s tackle the discretization of space. Continuous space can be approximated by a mesh of points, each point being associated with the value of the field there. In the simplest case, that mesh is a regular grid. Also remember that fields are functions of location. Combining these two things, we get the incredibly convenient conclusion that discretized fields can be expressed as an array of values. For each index into the array $(i, j)$, there is a corresponding point on the grid $(x_i, y_j)$, and for every point, there is a value of the field at that point. If we wanted to initialize an array from some known field, we could just evaluate the field on each point of the grid and then assign the value to the associated cell of the array.

Using discretization with a grid, an array of the field's values can stand for the field itself. Grids are defined by their grid lengths $\Delta x$ and $\Delta y$. In this example, $\Delta x = \Delta y = 1$ is a special case where the field is evaluated at integer values of $x$ and $y$.

Side note: it’s a fair question to ask here why $(i, j)$ doesn’t correspond to—say—$(x_j, y_i)$ instead. Why does $i$ select the horizontal component and not the vertical one? This continues from my tangent about indexing in Part 2. The answer is that you could go about it that way and then derive a different but entirely consistent discretization. In fact, I originally had it that way. However, I switched out of that to keep all the expressions looking like how they do in the literature. So, in short, the reason is just convention.

Second side note: this is not to say that the array is a matrix. The array is only two-dimensional because the space is two-dimensional. If the space was three-dimensional, then so would be the array. And forget about arrays if the mesh isn’t a grid! So, most matrix operations wouldn’t mean anything either. It’d be more correct to think of discretized fields as very long vectors, but we’re encroaching on a next-post matter now.

A grid discretization is what Stam went with, and for that reason, it’s what’s used here.

A key result of discretizing space is that the differential operators can be approximated by differences (i.e. subtraction) between the values of the field at a point and its neighbors. In particular, using a grid can make the partial derivatives into something incredibly simple: $\frac{\partial}{\partial x}$ into the value of the right neighbor minus the value of the left neighbor and $\frac{\partial}{\partial y}$ into the top minus the bottom. But that’s getting into “finite difference methods”, of which the pressure step is one such method. We’ll get to that in the next post. For now, it’s enough to say that, in general, discretizing with a grid is the simple choice for making computers operate on fields.

With that said, we’ll soon see that “semi-Lagrangian advection” does something unique with the grid.

To sum up this “just for context” moment, to compute an approximate solution to the presented partial differential equations, we need two levels of discretization. First, we need to discretize time, turning it into a scheme of updating the density and velocity fields repeatedly. Then, we need to discretize space to make the update computable. And so, time is replaced with a sequence, and space is replaced with a grid. All this is because computers cannot handle functions of continuous time nor functions of continuous space, let alone functions of both like an evolving field. Now, all this is quite abstract, and that’s because each part invokes the discretization of time and space slightly differently, and we’ll go into the details of each.

With all that said, in the face of our definitions of $\frac{\partial \rho}{\partial t}$ and $\frac{\partial \bold v}{\partial t}$, this generally means that the density/concentration field (which I’m currently just calling the density field out of expediency) and the velocity field become just density and velocity arrays, and we should be able to calculate their updates. In this situation, we can update the arrays in accordance with the partial differential equations by going term by term, hence why each step of the overall process corresponds to a single term. (Though, I’m not sure if the implicit assumption of independence between the terms that underlies going term by term is just an expedient approximation or our math-given right. Anyway…) Let’s go over the four parts, step by step.

The first step is the “semi-Lagrangian advection” of the velocities, implementing the $-(\bold v \cdot \nabla) \bold v$ term. A key highlight here: Stam’s treatment of the advection term is not a finite difference method, yet it still uses discretization with a grid! I’d also like to highlight a bit of how Stam arrived at this method. All that information is largely documented between “Stable Fluids” and “Real-Time Fluid Dynamics for Games”.

In “Stable Fluids”, there is a formal analysis that involves a technique called a “method of characteristics”. It’s a whole proof, but a sketch of it is this: at every point $\bold x$ (that’s the coordinate vector $\bold x$, as covered in Part 2), there is a particle, and that particle arrived there from somewhere. Let $\bold{p}(\bold x, t)$ be its path, such that the current location is defined by the equality $\bold{p}(\bold x, t_0) = \bold x$ where $t_0$ is the current time. Then, $\bold{p}(\bold x, t_0 - \Delta t)$ is where the particle was in the previous time.

Given some velocity field, the path of a particle and its locations at time $t_0$ and $t_0 - \Delta t$

As a result, the particle must have carried its properties along the way, and one of them is said to be momentum, or in other words, velocity. Therefore, an advection update looks like the assignment of the field value at the previous location, $\bold{p}(\bold x, t_0 - \Delta t)$, to the field at the current location, $\bold x$. This is the result of Stam’s analysis, and it can be written as the following:

\[\bold{v}_\text{advect}(\bold x) = \bold{v}(\bold{p}(\bold x, t_0 - \Delta t))\]

There is a unique time discretization here, but we’re not done yet! It’s still not computable because it’s missing a discretization of space. Of course, Stam presented one in “Stable Fluids” too. The calculation of $\bold{v}_\text{advect}$ can be done at just the points on the grid, and for each point, a “Runge-Kutta back-tracing” on the velocity field can be used to find $\bold{p}(\bold x, t_0 - \Delta t)$.

I won’t get into how that works, and I won’t have to in a moment. We’re one further approximation away from the method that appears in “Real-time Fluid Dynamics for Games” (and also the GPU Gems article and Wong’s post). Quite simply, if finding the path from $\bold{p}(\bold x, t_0 - \Delta t)$ to $\bold x$ can be called a “nonlinear” backtracing, then it’s replaced with a linear backtracing. The path is approximated with a straight line that extends from $\bold x$ in the direction of the velocity there:

\[\bold{v}_\text{advect}(x) = \bold{v}(\bold x - \bold{v}(\bold x) \Delta t, t)\]

or in other words $\bold x - \bold{v}(\bold x) \Delta t$ replaces $\bold{p}(\bold x, t - \Delta t)$

This expression is usually shown on its own, but it’s really three parts: a “method of characteristics” analysis that comprises a time discretization, a space discretization using a grid, and a further approximation using a linear backtracing.

Anyway, the found point almost certainly doesn’t coincide with a point on the grid, so Stam dealt with this by “bilinearly interpolating” between the four closest velocity values.

For more information on that, see the above link to Wikipedia. It’s got a better explanation of bilinear interpolation than one I can make—diagrams included. With that said, bilinear interpolation also amounts to very little code.

template<typename T>
using TPromoted = decltype(std::declval<T>() * std::declval<float>());

template <class T>
static TPromoted<T> lerp(float di, T p1, T p2)
{
    return p1 * (1 - di) + p2 * di;
}

template <class T>
static TPromoted<T> billinear_interpolate(float di, float dj, T p11, T p12,
                                          T p21, T p22)
{
    return lerp(di, lerp(dj, p11, p12), lerp(dj, p21, p22));
}

For this project, which in C++, that code uses templates. That lets it handle multiple types, which it will see because—remember—there are two advection terms! There’s one for velocity and there’s one for density. Much of the work presented for velocity will be reused, via templates. And if you’re wondering about the using TPromoted expression. The simplest way to write linear interpolation is with floating-point, and once we’re in floating point, we may as well stay in floating point, only converting back at the very end. This includes floating-point vector classes, which is why decltype is used instead of just a float type.

With that in mind, it’s simply a matter of computing the source and getting the bilinear interpolation with that point. Here’s a sample of code from the project that does this.

template <class T>
static T sample(T *p, float i, float j, int dim_x, int dim_y, bool no_slip) {
    bool x_under = i < 0;
    bool x_over = i >= dim_x - 1;
    bool y_under = j < 0;
    bool y_over = j >= dim_y - 1;

    bool x_oob = x_under || x_over;
    bool y_oob = y_under || y_over;

    float i_floor = floorf(i), j_floor = floorf(j);
    float di = i - i_floor, dj = j - j_floor;
    int ij;

    if (!x_oob && !y_oob) {   // typical case: not near the boundary
        ij = index(i_floor, j_floor, dim_x);
        return billinear_interpolate(di, dj, p[ij], p[ij + dim_x], p[ij + 1],
                                     p[ij + dim_x + 1]);
    }

    // OMITTED
}

template <class T, class U>
void advect(T *next_p, T *p, Vector2<U> *vel, int dim_x, int dim_y, float dt,
            bool no_slip)
{
    for (int i = 0; i < dim_x; i++) {
        for (int j = 0; j < dim_y; j++) {
            int ij = index(i, j, dim_x);
            Vector2<float> source = Vector2<float>(i, j) - vel[ij] * dt;
            next_p[ij] = sample(p, source.x, source.y, dim_x, dim_y, no_slip);
        }
    }
}

Here, the advect function loops over every point on the grid, calculating the source for each and assigning the result of the sample function. The sample function in turn calls the earlier given bilinear_interpolate function.

But I’ve omitted the rest of the sample function! Notice that it returns early if the source is not near the boundary. That creates a fast path, and that means the omitted code was for handling when it is near the boundary.

So, what have I hidden? I’ll answer with another question. What if the backtracing sends us to the boundary of the domain, or even beyond it?

It’s a fair question to ask, and it’s an important question because what we should do here directly depends on what the boundary is physically. In our case, the boundary is a solid wall. Here, I turn to the GPU Gems article, where it’s written that the “no-slip” condition hence applies, which just means the velocity there must be zero. Why the change again in my references? The “Stable Fluids” paper assumed a different physical condition, “periodic” boundaries that essentially imply a tile-able cell or a “toroidal” i.e. donut-shaped domain, but the “Real-time Fluid Dynamics for Games” paper chose a somewhat looser condition that was still based on solid walls. Between that and no-slip conditions, no-slip conditions are easier to build upon. For starters, a no-slip condition is more easily implemented inside a bilinear interpolation scheme—inside the sample function, that is.

For now, let’s focus on the bottom row. Below the bottom row, we can construct a ghost row that always takes the negative of its values. Therefore, any linear interpolation at the halfway point between the bottom row and the ghost row must be equal to zero. That is, the halfway line between them achieves the no-slip condition, thereby simulating the solid wall. From there, if the backtracing gives a position that is beyond the halfway line, it should just be clamped to it. Finally, this approach with the ghost row extends to all sides of the domain.

The ghost row exists inside the wall, and the value of the bilinear interpolation on the wall's surface must be zero

We also need to define the value of the ghost corner formed by a ghost row and ghost column. I didn’t see a rigorous treatment of them in my references, and I’ve seen that the corners might not matter much in practice. Still, the “no-slip” condition has a nice internal consistency that just gives us this definition. At the intersection of the halfway lines, the velocity there must also be zero. From this, we can form an equation involving the value of the ghost corner, and its solution is that the ghost corner should take on the value of the real corner—not its negative! Rather, it can be thought of as the ghost row taking the negative of the value at the end of the ghost column, which is itself a negative, and so making a double negative.

That said, the project doesn’t actually handle the boundary with ghost rows/columns, which is how it’s done in “Real-time Fluid Dynamics for Games” and the GPU Gems article. It used to, but now it uses something different but equivalent. To understand the difference, it is important to recognize that the negative values are just constructs on the way to defining the halfway line, where the no-slip condition is enforced and beyond which no source point is supposed to exist.

Let’s now draw the bilinear interpolation as a 3D surface. The construction of the halfway line ends up looking like this.

Dashed lines are obscured or purely virtual constructions.

The ghost rows and columns take on negative values, and so the surface does indeed flip in sign as it crosses the halfway line. As mentioned before though, source points are to never cross that line, and they’re to be clamped to it if they do. As a result, the samples of that surface will never flip in sign. In that sense, the negative values are just virtual—being part of the construction but not having its own impact.

It follows that, so long as the halfway line is constructed another way, the ghost rows and columns can be done away with entirely. The project does this by computing an “overshoot factor” that reduces the velocity to zero as the “overshoot” approaches 0.5, i.e. the halfway line. If it exceeds 0.5, then it too can be clamped so that the surface samples still never flip sign.

As a surface, it looks like this.

Again, dashed lines are obscured or purely virtual constructions.

And here’s its code—the rest of the sample function.

template <class T>
static T sample(T *p, float i, float j, int dim_x, int dim_y, bool no_slip) {
    // OMITTED

    // interpolate along the boundary
    T p_edge;
    if (x_oob && y_oob) {  // on a corner
        ij = index((x_under? 0 : dim_x - 1), (y_under? 0 : dim_y - 1), dim_x);
        p_edge = p[ij];
    } else if (x_oob) {  // on left or right boundary
        ij = index((x_under? 0 : dim_x - 1), j_floor, dim_x);
        p_edge = lerp(dj, p[ij], p[ij + dim_x]);
    } else {  // y_oob, on bottom or top boundary
        ij = index(i_floor, (y_under? 0 : dim_y - 1), dim_x);
        p_edge = lerp(di, p[ij], p[ij + 1]);
    }

    if (!no_slip) {
        return p_edge;
    }

    // apply discount to implement no-slip, with zero at the boundary and beyond
    float overshoot_factor = 1.0f;
    if (x_oob) {
        float overshoot_x = x_under ? -i : i - (dim_x - 1);
        overshoot_factor *= overshoot_x < 0.5 ? (1 - 2 * overshoot_x) : 0;
    }
    if (y_oob) { 
        float overshoot_y = y_under ? -j : j - (dim_y - 1);
        overshoot_factor *= overshoot_y < 0.5 ? (1 - 2 * overshoot_y) : 0;
    }
    return overshoot_factor * p_edge;
}

It’s worth noticing here that applying the overshoot factor is skipped if no_slip is false. That is the case for advecting things besides velocity, where reducing the value to zero at the boundary just isn’t physical. For example, false is passed for the density.

Altogether, we had a “method of characteristics” update that was discretized with a grid, linear backtracing, and bilinear interpolation, with boundaries being handled so that a no-slip condition is enforced. It’s enforced by constructing halfway lines where the velocity is made to be zero, be that by creating ghost rows and columns with negative values or by reducing the value by an overshoot factor. All this implements advection, one of the components of a fluid simulation, as demonstrated by it being a term in the differential equation.

An important comment about all of this: according to Stam, the “method of characteristics” update, before discretization, is “unconditionally stable” because no value in $\bold{v}_\text{advect}$ can be larger than the largest value in $\bold v$ because $\bold{v}_\text{advect}$ always is some value in $\bold v$. From there, his discretization with bilinear interpolation preserved the stability because $\bold{v}_\text{advect}$ is always between some values in $\bold v$ (or zero, if the boundary is involved).

In the past, I had written fluid simulations that didn’t have unconditional stability, and they all blew up unless I took small timesteps. Getting to take large timesteps here—not needing to do loop after loop just to cover a span of time equal the blink of an eye—is critical to running this sim on an ESP32.

And now, here’s some more general conclusions.

In “Real-time Fluid Dynamics for Games”, Stam goes on to state that “the idea of tracing back and interpolating” is a kind of “semi-Lagrangian method”. Doing linear backtracing, as opposed to something else like the aforementioned Runge-Kutta method, isn’t quintessential to that classification. It remains a useful approximation, though.
The key feature of this method is the unconditional stability that comes from the interpolation not exceeding the original values, and that’s a useful constraint to carry forward. For example, if the simulation blows up, something wasn’t done correctly.
Generally speaking, this advection method isn’t the end-all and be-all of advection methods, and the field of fluid simulation is much larger than that. And it escapes me—go look to other sources for those.

Moving on from the semi-Lagrangian advection of velocity (and density), the second step is to apply the user’s input to the velocity array. This corresponds to the $\bold f$ term, the external forces term. This isn’t something Stam had set in stone, since what makes up the external forces really depends on the physical situation being simulated. In our case, we want someone swirling their arm in the water, and so external forces must be derived from the touch data. That’s the touch data we had the touch task generate in Part 2, and here’s where it comes into play.

Recall that a touch input consists of a position and a velocity. Let $\bold{x}_i$ and $\bold{v}_i$ be the position and velocity of the $i$-th input in the queue. Naturally, we should want to influence the velocities around $\bold{x}_i$ in the direction of $\bold{v}_i$. Under this general guidance, I could have gone about it in the way that was done in the GPU Gems article. That was to add a “Gaussian splat” to the velocity array, and that “splat” was formally expressed as something like this

\[\bold{f}_i \, \Delta t \, e^{\left\Vert \bold{x} - \bold{x}_i \right\Vert^2 / r^2}\]

where $\bold{f}_i$ is a vector with some reasonably predetermined magnitude but with a direction equal to that of $\bold{v}_i$. From the multiplication $\bold{f}_i \Delta t$, you may notice that the time discretization in play is just Euler’s method and that the space discretization in play is to just evaluate it at the points of the grid. Across all the inputs in the queue, the update would have been

\[\bold{v}_\text{force}(\bold{x}) = \bold{v}(\bold{x}) + \sum_{i = 0}^N \bold{f}_i \, \Delta t \, e^{\left\Vert \bold{x} - \bold{x}_i \right\Vert^2 / r^2}\]

where $N$ is the number of items in the queue. I had two issues with it. First, I specifically wanted to capture how you can’t push the fluid immediately around your arm faster than the speed of your arm (though the offshoot vortices are free to spin faster). This was especially important when someone was moving the stylus very gently. Second, evaluating the splat at every single point would’ve been expensive. My crude solution to this was to just set $\bold{v}(\bold{x}_i)$ to be equal to $\bold{v}_i$. In code, that turns out to merely be the following

struct drag msg;
while (xQueueReceive(drag_queue, &msg, 0) == pdTRUE) {
    int ij = index(msg.coords.y, msg.coords.x, N_ROWS);
    Vector2<float> swapped(msg.velocity.y, msg.velocity.x);
    velocity_field[ij] = swapped;
}

where, if you’re confused about the apparent “axes swap”, see the section in Part 2, I can write this code as

\[\bold{v}_\text{force}(\bold{x}_i) = \bold{v}_i\] \[\bold{v}_\text{force}(\bold{x}) = \bold{v}(\bold{x}) \text{ for } \bold{x} \not= \bold{x}_i \text{ for any } i\]

The third step is the pressure step, corresponding to the $- \frac{1}{\rho} \nabla p$ term. Out of all the terms in the definition of $\frac{\partial \bold v}{\partial t}$, it must be calculated last, capping off the velocity update before we can proceed to the density update. I already discussed this in Part 2, but in short, the pressure does not represent a real process here. Rather, it is a correction term that eliminates divergence in the velocity field. This ensures the incompressibility constraint, $\nabla \cdot \bold v = 0$. (Technically, the specific formulation that Stam presents doesn’t eliminate it entirely, but it does eliminate most of it. We can get into that in the next part.) Since it’s not a real process, no time discretization is in play. Rather, the updated velocity field is straight-up not valid until the correction is applied.

It would be more correct to state that Stam’s fluid simulation follows the modified definition that he presents in “Stable Fluids”, that is

\[\frac{\partial \bold v}{\partial t} = \mathbb{P} \big( - (\bold v \cdot \nabla) \bold v + \nu \nabla^2 \bold v + \bold f \big)\]

where $\mathbb{P}$ is a linear projection onto the space of velocity fields with zero divergence. This definition clearly shows that $\mathbb{P}$ must be calculated last, though it hides the fact that calculating it does involve a gradient. Anyway, applying the reductions that we’ve been running with so far, that would just be

\[\frac{\partial \bold v}{\partial t} = \mathbb{P} \big( - (\bold v \cdot \nabla) \bold v + \bold f \big)\]

where, we’ve again set $\nu$ to zero.

This happens to be the pressure projection that is shown in the GPU Gems chapter. That said, to keep the notation simple, I won’t continue to use it. And on the matter of actually calculating it, there’s so much to say in the next part. I’ll provide the code then as well.

That just leaves the fourth and final step, the semi-Lagrangian advection of the density, corresponding to the term $-(\bold v \cdot \nabla) \rho$. Well, that’s the only term in the definition of $\frac{\partial \rho}{\partial t}$, and we already covered it when we covered the velocity advection! There are no more obstacles here. The only thing I’d mention is that extending the fluid sim to full color is quite trivial. Instead of advecting a single density field, we can advect three independent density fields—one for red dye, one for blue dye, and one for green dye—or equivalently a vector field of densities. The project happens to go with the latter approach.

That fills most of the outline, implementing every part of the reduced Navier-Stokes equations except for the pressure step. That’s the applied force and the semi-Lagrangian advection of the velocity and density. There, we paid special attention to the derivation and the no-slip boundary condition, since that comes from the physical situation being simulated. We also went a bit into the general idea of discretizing time (i.e. numerical integration) and discretizing space. That’s everything I know about those steps that I think could help their implementation. In the next and final post, we’ll go over what, exactly, the pressure step is, including the relevant linear algebra. Stay tuned!

revRSS: The basic infrastructure behind finding reverse split press releases and trading on them

2023-11-04T00:00:00+00:00

Note: Though this article mentions the idea of trading on reverse splits, the idea is given not for any compensation and not as personal financial advice for the reader’s specific financial situation.

A couple of years ago, I used to be subscribed to a mailing list called “Reverse Split Arbitrage”, and I remember being surprised that the trading tips that landed in my mailbox did make me a bit of money. The central idea of it was based on a kind of stock market technicality.

When a company executes a “reverse split”, it takes every X shares and merges them into a single share, thereby raising the price because the value of the company is divided among fewer shares. “X” is a number that comes from the announced ratio “1-for-X”. (This language is similar to the “X-for-1” ratio of stock splits, or in other words forward splits, though in that case the value is divided among more shares to lower the price.) Reverse splits typically happen because the price has fallen under $1, the minimum price set by the NYSE and Nasdaq to stay listed.

But here’s the big-money question: what if an investor has less than X shares left over? Under the given ratio, that would have to become a so-called “fractional share”. Companies typically take one of four approaches to this fraction:

pay cash for this fraction,
round it down to zero or one, whichever is nearer,
round it down to zero unconditionally, but most commonly,
round it up to one unconditionally.

Which option the company takes can almost always be found in the press release or SEC filing that is published shortly before the reverse split happens. These emails I had gotten from Reverse Split Arbitrage would alert me to these reverse splits that would round up, but after some time, I wasn’t getting them anymore.

Still, it turns out that plenty of reverse splits are still happening, and many of them are still rounding up. I wanted to get back into trading on them, but I didn’t have the mailing list to help me any more. I had to rig up something myself, of course! This was also something I wanted to share with others—for zero compensation especially. For now, I’m doing a soft launch of this at www.revrss.com, and it’s in a limited form that focuses only on press releases (not SEC filings) and requires the reader to read them themselves. The intention is to make it more public after overcoming these limitations, but I’ve been able to use it myself just fine.

With that said, even getting this far required quite a bit of infrastructure! If I were to describe what I’m doing in one phrase, referring to the technologies involved just by their name, it would be “a WebSub-enabled RSS news aggregator, served via nginx over Cloudflare Tunnels”.

The infrastructure of revRSS as of Nov 4th, 2023, with the primary server being in my home network. It's worth noting here that, if a powerful enough server was rented from a cloud provider, the primary server, the WebSub broker, and Cloudflare Tunnels could be replaced by that single server.

And now, I’ll say it again in longform.

Press releases about reverse splits (and whether they’ll round up) happen to be distributed by one of four newswires, Business Wire, PR Newswire, ACCESSWIRE, and Globe Newswire, though they may also be sent out via smaller newswires like Newsfile Corp, EIN Presswire, or Dow Jones Newswire. The first four are used dramatically more than the latter three.

Though newswires usually forward their news directly to their journalist clients, they also share it directly with the public via one channel or another. If a newswire made their news available in RSS, the standard format for distributing news from machine to machine, then I had written a program for interpreting that. (In the case of PR Newswire, I managed to talk to someone there about getting an RSS feed!) If they instead made it available via their website, then I had to resort to “web scraping”, in other words parsing the HTML code meant for web browsers.

Naturally, if a newswire offered RSS, I went for it over going to their website. In either case, though, I could use a Python library for parsing XML and HTML data, called Beautiful Soup, to do the heavy lifting. In general, both XML and HTML organize data into “tags”—“tags” being containers of many blocks of text, sub-tags, or even both at the same time. In the case of RSS, which is a kind of XML, the way a news article and its associated metadata is encoded with these tags is exactly defined in the RSS specification, and so the specification was a good reference in instructing Beautiful Soup to find the tags associated with said article. In the case of HTML though, an article ends up being encoded in ways varying from website to website, and so I ended up needing to pick through each website by hand to find the tags to give to Beautiful Soup.

Still, with effort, I could have myself a list of articles from all of the relevant newswires—each article with its title, link, published date, and excerpt. I just needed to filter it for press releases about reverse splits and then sort by latest. With that said, a dire wish of mine here is to achieve a better filter here. Because the language that declares a reverse split with round-ups varies, identifying such without making many false positives or false negatives would require good natural language processing. For now, I’m leaning toward more false positives, selecting only for reverse splits but not whether they’ll round up. That can be done with a simple keyword search. With this (admittedly faultily) filtered list, then sorted by latest, I could even begin to report something to the public.

My choice for how I did this was RSS again, not a full website nor a mailing list. With that said, serving RSS is not like the latter and much like the former. To be exact here, the relationship is identical to serving a “static website”, or in other words a website built on a set of fixed assets, including HTML, CSS, images, or even Javascript but not including responses of a database. As mentioned here, RSS is just a format, and so an RSS service is just a single file, served as if it were a logo on some corporate website. Consequently, I could construct this file using Beautiful Soup and then serve it using a configuration of the nginx program, which was designed for such static assets.

Speaking of static websites, I configured nginx to also serve the revRSS website (just a for-your-information site) which was a static website. For that, “static site generator” programs like Jekyll can autogenerate all the assets of a static website from plaintext files and configuration files (which can come from publicly available templates like Beautiful Jekyll). I think detailing how using Jekyll went for me is outside the scope of this article, but I mention this because I want to highlight how serving the site and serving the reverse splits feed are completely equivalent. In fact, the same nginx configuration serves both.

Anyway, a key disadvantage of RSS from mailing lists is that notifications are impossible because there is no list of subscribers to contact. This wouldn’t be a problem if—say—one made a habit of checking the feed every morning, but I don’t think that should be necessary. So, since I wanted to serve RSS but also deliver notifications, what was I to do? The answer was to use another program on the side that follows the WebSub (formerly PubSubHubbub) protocol. This other program maintains the list, and some apps like NewsBlur are capable of joining that list. It could be run on the same server that runs nginx, but I used a public “broker”. In particular, I used the one run by Google at pubsubhubbub.appspot.com.

Finally, I wanted to host everything on a powerful server at home, but my internet provider doesn’t allow me to open the standard ports for HTTP and HTTPS, 80 and 443. By “opening” ports, I mean accepting incoming connections there. Though opening other ports and manually punching in the port numbers may technically work for me, that wouldn’t work for the public. One solution for this I’ve done before is a reverse SSH tunnel, a type of SSH connection that one server makes to another server in order for the latter to act as a face of the former, accepting connections at its own ports for the former. In this scenario, a connection would be issued by my server (not accepted) and from there traffic is forwarded back, and this would get around my internet provider’s restriction. To do this, the other server could just be rented from a cloud provider like Google Cloud—possibly while staying within the limits of their free tier.

However, I went for something similar using Cloudflare Tunnels instead. The tradeoff: I don’t have to manage two servers, but I lose control of the other end to Cloudflare. With that said, I planned to proxy my traffic through them anyway because I wanted to use their content delivery network to serve the heaviest parts of the revRSS site for me, including fonts and images. To me, their Tunnels feature was icing on the cake.

So, that’s how I’m getting and trading on the latest press releases about potential reverse split round-ups as they happen. With this infrastructure, it’s also how—technically—you can too. It’s a basic infrastructure that actually needs to become more complex before it’s something I could count on more simply, really, and yet it invokes a wide range of concepts already. From file formats to servers to tunnels, each has a different role in transporting the news of a reverse split from the company to my phone.

I could end up adding more to this pipeline, and if I write a piece on it, you can click to it here.

Rebuilding ESP32-fluid-simulation: an outline of the sim task (Part 3)

2023-09-22T00:00:00+00:00

Okay, I wondered if I should have led this series with the physics, but I think saving it for last was the right call. As I was writing about the FreeRTOS tasks involved and their communication and the touch and render tasks specifically, I started to think about how I could write about this with the detail and approachability it deserves.

To start, I’ll be honest: I’m not presenting anything groundbreaking here. In 1999, Jos Stam introduced a simple and fast form of fluid simulation in his conference paper called “Stable Fluids”, and in 2003, he published a straightforward version of it in “Realtime Fluid Dynamics for Games”. Many people have written guides to “fluid simulation” that have been specifically based on these two papers since. Two key examples to me: a chapter of NVIDIA’s GPU Gems and a blog post by Jamie Wong. To be pendantic, I find now that the current field of fluid simulation is much, much larger than what any of these references imply. Still, these were the guides I followed when I first wrote ESP32-fluid-simulation. In both was Stam’s technique, and between everything I just linked to, you could probably write your own implementation of it eventually.

A tape about Stam's technique from circa 1999, available on Youtube since 2011!

That said, between then and when I rewrote it recently, I picked up some background knowledge that proved incredibly useful. I’m not saying here that I became an expert on fluid sims—I can’t advise you on designing a new technique from scratch. Rather, if I had known it back then, I wouldn’t have made nearly as many wrong turns. It turns out that implementing Stam’s technique gets easier when you understand the whats of the operations he would have you write, if not the whys.

Review: Vector fields and scalar fields

If you recall any vector calculus, then “vector fields” and “scalar fields” may be an obvious concept to you already, but if not, we can start with the fact that they’re a part of the foundation of fluid dynamics. For now, I’ll review what they are. However, I highly recommend picking up a total understanding of vector calculus somewhere else before looking at any fluid sim techniques besides Stam’s. In fact, perhaps fluid simulations make just the right concrete example to keep in mind while learning!

Anyway, let’s sketch out what vector fields and scalar fields are, and hopefully, the picture is filled in as you keep reading this article. The ordinary idea of a mathematical function is a thing that outputs a number when given a number input. Vector fields and scalar fields are functions—though of different kinds.

Consider a flat, two-dimensional space, and then consider a function that outputs a number when given a location in this space as the input. Furthermore, this location can be expressed as a pair of numbers if we used a coordinate system (three if we worked in three dimensions, and we could, but we won’t here). A concrete example of this would be a function of a location that gives the temperature there—the location being written as the latitude and longitude on the map. It’s 48 degrees Fahrenheit in Arkhangelsk and 84 degrees in Singapore. Considering that Arkangelsk can be found at 64.5°N, 40.5°E and Singapore at 1.2°N, 103.8°E, we can define a temperature function that gives $T(64.5, 40.5) = 48$ and $T(1.2, 103.8) = 89$. We can call it a temperature field, but more generally, it’s a “scalar field”. It’s a scalar-valued function of the location, possibly written as $f(x, y)$.

A weather forecast graphic, showing temperature across the United States. This can be thought of as a temperature field. Source: NOAA

Now on the other hand, a “vector-valued function” is any function that outputs a vector, and a vector-valued function of a location is a “vector field”! A concrete example of this would be a wind velocity field. For any location, such a field would give how fast the wind there blows and the direction in which it goes, and it would be given as the magnitude and direction of a single vector. Like for scalar fields, we could possibly write them as $\bold{f}(x, y)$, the boldface font meaning that we have a vector output.

A weather forecast graphic, showing wind speed and direction in the Southeastern US during Tropical Storm Ophelia, using color for magnitude and arrows for direction. Vector fields are typically shown using arrows of varying lengths. Source: NOAA

That said, though functions of location they are, written like one they are really not. Rather, the dependence on location is assumed, and then $f(x, y)$ and $\bold{f}(x, y)$ are just written as $f$ and $\bold{f}$ instead. Another thing to keep in mind: coordinates are just a pair of numbers, but we can also think of them as a single coordinate vector. Though we may never actually draw that arrow, the interchangeability is relevant. For example, I briefly talked in the previous post about the similarity between a velocity vector and a change in the coordinate vector over a finite period of time.

First, it would be helpful to picture what we want to simulate. The input and output are the touch and screen of a touchscreen, and the user dragging around the stylus on it should stir around the fluid on display. The physical scenario this should match is if someone stuck their arm into a bed of dyed water and then stirred it around. In such a scenario, the color would be determined by the concentration of the dye, but the dye itself moves! To capture this physical behavior with a computer simulation, we can start by describing it with a mathematical model.

In Stam’s “Real-time Fluid Dynamics for Games”, he wanted to capture smoke rising from a cigarette and being blown around by air currents. To do so, he ascribed a velocity field (a vector field) and a smoke density field (a scalar field) to the air. But that was it for his model: everything else about it he threw out. In the same way, we can reduce the bed of water to just a velocity field and a dye concentration field.

Now, what was the relationship between these two fields? Stam wrote that the density field undergoes “advection” by the velocity field. That’s the process of fluid carrying around (smoke particles, dye, or anything in general), and this happens everywhere. He also wrote that it undergoes “diffusion”, which is the spontaneous spreading of a thing in a fluid from areas of higher density without being carried by the velocity. He provided an “advection-diffusion” equation that captures both, and it’s a “partial differential equation”.

Review: Partial derivatives and the differential operators

Just like how we can take the derivative of your ordinary function, we can take a differential operator of a field. However, these differential operators don’t just mean the slope of a tangent line, but rather they each represent a different way the field changes over a change in location. The critical ones to understand here are the “divergence” and the “gradient”, but the “Laplacian” is also worth touching on. (A formal vector calculus course would also cover the “curl”, the identities, and the associated theorems.)

First of all, differential operators are constructed from the “partial derivatives”. These are the derivatives you already know, but we strictly take them with respect to one of the components while holding the others constant. The reason? Formally, your ordinary derivative is the limit of the change in your ordinary function $f(x)$ over the change in the input $x$ as that change in the input approaches zero.

\[\frac{df}{dx} = \lim_{\Delta x \to 0} \frac{f(x+\Delta x) - f(x)}{\Delta x}\]

However, in the case of fields, by doing this to only one of the components of the location coordinate, the partial derivative just formally means the change in the field $f(x, y)$ over the change in that component. Keeping the other components constant is naturally a part of measuring this change. In two dimensions, fields can have a partial derivative with respect to $x$ or one with respect to $y$. Then, $y$ or $x$ respectively is held constant.

\[\frac{\partial f}{\partial x} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x, y) - f(x, y)}{\Delta x}\] \[\frac{\partial f}{\partial y} = \lim_{\Delta y \to 0} \frac{f(x, y + \Delta y) - f(x, y)}{\Delta y}\]

A good example would actually be to perform a derivation. Given the function $f(x, y) = x^2 + 2xy + y^2$ as a field, let’s find the partial derivative with respect to $x$.

\[\begin{align*} \frac{\partial}{\partial x}(x^2 + 2xy + y^2) & = 2x + 2y + 0 \\ & \boxed{ = 2x + 2y } \end{align*}\]

Notice that—because $y$ is taken as a constant—$y^2$ drops out and $2xy$ is treated as an $x$-term with a coefficient of $2y$. And finally, to expand on this a bit with a geometric picture, we know that the derivative is the slope of the tangent line, but to be exact, it’s the line tangent to the curve of $f(x)$ at the point $(x, f(x))$. The partial derivative is still the slope of a line that is tangent to the surface of the field at the point $(x, y, f(x, y))$, but it is also strictly running in the $x$-direction for $\partial/\partial x$ or in the $y$-direction for $\partial/\partial y$. Technically, infinitely many lines satisfy the conditions of being tangent to the surface at that point, and these lines form a tangent plane, but we only concern ourselves with the two.

The surface plot of another scalar field $f(x, y) = x^2 + y^2$, which is like the plot of the curve of your ordinary function, along with the two lines tangent to it that have slopes equal to the partial derivatives.

That aside, taking a partial derivative with respect to some single component is not as useful as taking every partial derivative with respect to each component. This set is written like a vector of sorts (though a vector it is not) called the “del operator”. For two dimensions, that is

\[\nabla \equiv \begin{bmatrix} \displaystyle \frac{\partial}{\partial x} \\[1em] \displaystyle \frac{\partial}{\partial y} \end{bmatrix}\]

The constructions out of this set that we call the differential operators can absolutely be written without using the del operator, but you’d usually see that they are.

The “gradient” is the simplest construction: line up each and every partial derivative of a scalar field into a vector. Keeping in mind here that the partial derivative of a field (like $x^2+2xy+y^2$) is actually yet another function of the location (like $x^2 + 2y$), a vector composed of these will itself vary by the location. The gradient of a scalar field is a vector field! We can get to exactly how the gradient gets applied to our fluid sim later, but one useful fact to picture here is that it can be shown that the gradient always happens to point in the direction of steepest ascent in the scalar field. Walking in the direction of the gradient of the temperature field, for example, would warm you up the fastest!

\[\nabla f = \begin{bmatrix} \displaystyle \frac{\partial f}{\partial x} \\[1em] \displaystyle \frac{\partial f}{\partial y} \end{bmatrix}\]

Using the del operator, it looks kind of like scalar multiplication from the right.

In orange, the surface plot of a scalar field. Beneath it and in blue, the plot of the gradient, showing that it points in the direction of steepest ascent. Source: MartinThoma via Wikimedia Commons, CC0 1.0

And remember, the gradient is just one shockingly meaningful operator that we can construct from the partial derivatives, which were just slopes of tangent lines! The “divergence” is a slightly more complicated construction: if we write out a vector field using its components

\[\bold{f}(x, y) = \begin{bmatrix} f_x(x, y) \\ f_y(x, y) \end{bmatrix}\]

then we can take the partial derivative of each component with respect to its associated component of the coordinates (that’s $f_x$ to $\partial/\partial x$ and $f_y$ to $\partial/\partial y$) and then add them up. We should be able to recognize here that the divergence of a vector field is a scalar field. And what is the meaning of this scalar field? For now, it can be imagined as the degree to which the vectors surrounding an input location are pointing away from it, though Gauss’s theorem expresses this more formally (a bit out-of-scope for now).

\[\nabla \cdot \bold{f} = \frac{\partial f_x}{\partial x} + \frac{\partial f_y}{\partial y}\]

Using the del operator, it looks kind of like a dot product.

Three diagrams, the left showing positive divergence with predominantly outward-facing arrows, and the middle showing negative divergence with predominantly inward-facing arrows, the right showing zero divergence with a balance between the two. But again, Gauss's theorem gives the exact picture.

Finally, the “Laplacian” is actually the divergence of the gradient of a scalar field, and this ultimately means that it’s also a scalar field! It is also the sum of the second-order partial derivatives (besides the mixed ones, but that’s totally out-of-scope).

\[\nabla^2 f \equiv \nabla \cdot (\nabla f) = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2}\]

Using the del operator, some take the liberty of expressing this composition as a single $\nabla^2$ operator.

There is also the extension of the Laplacian onto vector fields, but it really is just the Laplacian on each component.

\[\nabla^2 \bold{f} = \begin{bmatrix} \nabla^2 f_x \\ \nabla^2 f_y \end{bmatrix}\]

The gradient, divergence, and Laplacian are all the differential operators that are relevant here, and hopefully these will become more concrete to you as we use them from here on to describe Stam’s fluid sim technique. However, I’d again recommend formally learning vector calculus if you’d like to look at other techniques.

A “partial differential equation” is kind of like a system of linear equations in this context. Here, they still relate known and unknown variables, and they still have a solution which is the value of the unknowns. However, these “values” are entire fields, not just numbers! Given this, partial differential equations also involve the differential operators of these field-valued variables.

The advection-diffusion equation that Stam provides is a simple example of one: advection and diffusion are independent terms, and their sum is exactly how the density field evolves over time. It is

\[\frac{\partial \rho}{\partial t} = -(\bold{v} \cdot \nabla) \rho + \kappa \nabla^2 \rho + S\]

where $\rho$ is the density field and $\bold{v}$ is the velocity field. $-(\bold{v} \cdot \nabla) \rho$ is the advection term, and $\kappa \nabla^2 \rho$ is the diffusion term—$\kappa$ being a constant for us to control the strength of the diffusion. $S$ is just a term that lets us add density (of smoke, or concentration of dye in our case) to the scene. Notice how this equation is a definition of $\partial \rho / \partial t$. It’s the partial derivative of the density field with respect to time, and it means that $\rho$ is a variable whose value is a function of location and time. However, it’s more useful for us to think of it equivalently as a field that evolves over time. An evolving density field is exactly what we want to show on the screen!

You may also notice that $(\bold{v} \cdot \nabla)$ is clearly some kind of construction from the partial derivatives because it uses the del operator $\nabla$. That is the “advection” operator. I’ve only seen it in fluid dynamics papers and yet still don’t totally understand it. Still, we’ll see how Stam treats it, but that’ll have to be in the next post.

All said though, where is the room in this model for the user’s input? Is the velocity field just a thing we get to set? (Right now, we have two unknowns, $\rho$ and $\bold{v}$, but one equation!) No, it’s more complicated than that: the way water and air move continues to change even after we stop stirring it. That leads to the missing piece to stirring digital water: we need a physical way to define $\partial \bold{v} / \partial t$ (a.k.a. the acceleration!) just like how $\partial \rho / \partial t$ was defined. That missing piece is the famous “Navier-Stokes equations”.

The “Navier-Stokes equations” are also partial differential equations. A definition of Navier-Stokes can be found in any fluid dynamics article, but the one Stam provided in “Stable Fluids” is the most directly relevant one.

\[\frac{\partial \bold{v}}{\partial t} = -(\bold{v} \cdot \nabla) \bold{v} - \frac{1}{\rho} \nabla p + \nu \nabla^2 \bold{v} + \bold{f}\] \[\nabla \cdot \bold{v} = 0\]

The first one is a definition of $\partial \bold{v} / \partial t$. Here, $-(\bold{v} \cdot \nabla) \bold{v}$ and $\nu \nabla^2 \bold{v}$ represent advection and diffusion again, though these are also known as “convection” and acceleration due to “viscosity”, respectively. That is to say, the velocity is carried around and diffused just like how the smoke density was. The only difference is that the constant $\nu$ here is the “kinematic viscosity”, and it’s higher for fluids like honey and lower for fluids like water. That aside, $\bold{f}$ represents the acceleration due to external forces, and there is the place in our mathematical model where the user input would go!

On the other hand, $-\frac{1}{\rho} \nabla p$ is an interesting term—it’s not independent here. Let me try to explain. Typically, like in gasses, it is an acceleration due to a difference in pressure, and the negative of the gradient represents the tendency for fluids to flow from regions of high pressure to regions of low pressure. (Since the gradient points in the direction of steepest ascent, then going in the opposite direction gives the steepest descent.) The pressure differences are in turn driven by things like temperature. But that’s not what we’re talking about today!

As Stam had put it, “[t]he pressure and the velocity fields which appear in the Navier-Stokes equations are in fact related”. Ultimately, $-\frac{1}{\rho} \nabla p$ is used like a correction term to guarantee that the second equation holds. Whereas the first equation reads as a sum of processes that make up the acceleration, the second equation, $\nabla \cdot \bold{v} = 0$, reads like this: the divergence of the velocity field (which is a scalar field, rememeber!) is equal to zero everywhere. Even as a fluid evolves, this is a constraint that it must satisfy throughout, and it’s termed the “incompressibility constraint”.

The incompressibility constraint is said to be critical for it to look like water. Unfortunately, knowing what it is turns out to not be the same as knowing why it is. That’s beyond what I can comfortably explain, and there’s enough to explain in regards to how the pressure term is used to satisfy it. There’s quite a lot to say on that front, actually, so it’ll be another matter to cover in the next posts.

That aside, I’m going to adjust the equations while we’re here. Because the overall project was about simulating dye in water on an ESP32 and not smoke in air on a GPU, I didn’t use the entire equation. Anyway, this can be thought of as an exercise in finding what part of the physics can be ignored while still looking sorta-physical, I suppose. I really have to wash my hands of any assertions I’m making at this moment, for I am no expert in this field. With that said, I can confirm that deleting the diffusion term by letting $\kappa = 0$ doesn’t look so egregious. We also don’t have to add more dye, so we can delete $S$ while we’re at it. That actually just leaves the advection alone.

\[\frac{\partial \rho}{\partial t} = -(\bold{v} \cdot \nabla) \rho\]

Furthermore, I also got away with letting $\nu = 0$, deleting that term and reducing the Navier-Stokes equations to the following.

\[\frac{\partial \bold{v}}{\partial t} = -(\bold{v} \cdot \nabla) \bold{v} - \frac{1}{\rho} \nabla p + \bold{f}\] \[\nabla \cdot \bold{v} = 0\]

So ends this post. With the governing equations (advection-diffusion and Navier-Stokes), we’ve laid out the fundamental outline of Stam’s technique. We’ve also reviewed the relevant vector calculus, though no more than that. Though I didn’t have all the authority I needed to get the whys, we should be equipped to understand the whats. In the last parts, we’ll fill in the outline to get an end-to-end fluid simulation. If you’re still here before the next post comes out though, there’s always the ESP32-fluid-simulation source code on GitHub.

Rebuilding ESP32-fluid-simulation: the touch and render tasks (Part 2)

2023-07-30T00:00:00+00:00

So, how exactly did my rebuild of ESP32-fluid-simulation do the touch and render tasks? This post is the second in a series of posts about it, and the first was a task-level overview of the whole project. But while it’s nice and all to know the general parts of the project and how they communicate in a high-level sense, the meat of it is the implementation, and I’m here to serve it. The next parts are dedicated to the sim physics, but we’ll talk here about the input and output: the touch and screen of a touchscreen.

The implementation starts from the hardware, naturally. As I established, I went with the ESP32-2432S032 development board that I heard about on Brian Lough’s Discord channel, where it was dubbed the “Cheap Yellow Display” (CYD). That choice guided the libraries that I was going to build on, and that defined the coding problems I had to solve.

The ESP32-2432S032, a.k.a. the Cheap Yellow Display

Materially, the only component of it that I used was the touchscreen, and it used an ILI9341 LCD driver and an XPT2046 resistive touchscreen controller. In some demonstrative examples, Lough used the TFT_eSPI library to interact with the former chip and the XPT2046_Touchscreen library for the latter chip, and these examples included what pins and configuration to associate with each. None of this setup I messed with.

We can cover the touch task first. To begin with, I already had a general idea for what it should do: a user had to be able to drag their stylus across the screen and then see water stirring as if they had stuck their arm into it and whirled it around in reality. With that in mind, what should we want to capture, exactly?

The objective can be split into three parts. First, we should obviously check if the user is touching the screen in the first place! Second, assuming that the user is touching the screen, we should obviously get where they touched the screen. Finally, if we keep track of the previous touch location, we can use it later to estimate how fast they were dragging the stylus across the screen—assuming they were, that is. We’ll get to that in a bit.

To deal with the first two matters, reading the documentation for the XPT2046_Touchscreen library takes us most of the way. A call to the .touched() method tells us whether the user touched the screen. Assuming this returns true, getting the where is just a call to the .getPoint() method. It returns an object that contains the coordinates of the touch—coordinates that we’ll need to further process.

First, we should quickly note that the XPT2046 always assumes that the screen is 4096x4096, regardless of what the dimensions actually are. That can just be fixed by rescaling. To be exact, the getPoint() method returns a TSPoint struct with members .x, .y, and .z. Ignoring .z, we first multiply .x by the screen width and .y by the height. (In fact, I multiplied them by a fourth of that because I had to run the sim at sixteenth-resolution, but that’s beside the point.) Then, we divide .x and .y by 4096. Rescaling in this way, multiplying before dividing, preserves the most precision.

With that said, you’re free to ask here: why should .x be multiplied by the width, and .y by the height? That would imply that .x is a horizontal component and .y is a vertical component, right? That’s correct, but a surprising complication comes from the fact that we’re feeding a fluid sim.

The second thing we need to do is recognize that the XPT2046_Touchscreen library is written to yield coordinates in the coordinate system established by the Adafruit_GFX library. It’s a somewhat niche convention that has tripped me up multiple times despite how simple it is, so I’ll cover it here.

The Adafruit_GFX library has set conventions that are now widespread across the Arduino ecosystem. Even up to function signatures (name, input types, output types, etc.), the way to interact with adhering display libraries doesn’t change from library to library—save a couple of lines or so. For example, my transition of this project from an RGB LED matrix to the CYD was trivial, yet there couldn’t be more of a gap between their technologies. This is because the libraries I used for them, Adafruit_Protomatter and TFT_eSPI respectively, adhered to the conventions.

One of these conventions is their coordinate system. When I say coordinate system, “Cartesian” might be the word that pops into your mind, but the Cartesian coordinate system was not what Adafruit_GFX used, even though they do refer to pixels by “(x, y)” coordinates. In the ordinary Cartesian system, the positive-x direction is rightwards, and the positive-y direction is upwards. They had them be rightwards and downwards respectively.

This should be compared to the way a 2D array is indexed in C. Given the array float A[N][M], A[i][j] refers to the element i rows down and j columns to the right. This notation is just a fact of C, but to keep things clear in a moment, I’ll refer to it as “matrix indexing”.

Note: i and j are represented in this diagram as "i, j"

In a way, we can think of i and j as coordinates. In fact, if we equate i to “y” (the downward-pointing one used by Adafruit_GFX, which I’m writing here in quotes) and j to “x”, then I’d argue that matrix indexing and the Adafruit_GFX coordinates are wholly equivalent—as long as we adhere to this rename. Well, we don’t end up sticking with it, actually.

We’ll cover this in more depth in the next posts, but it turns out that the type of fluid simulation I’m using is constructed on a Cartesian grid which doesn’t use matrix indexing. Points on the grid are referred to by their Cartesian coordinates (x, y), exactly as you’d expect. It’s also starkly different from the Adafruit_GFX coordinates “(x, y)”. (In this article, I’ll write (x, y) when I mean the Cartesian coordinates and “(x, y)” when I mean the Adafruit_GFX coordinates.) At the same time, i and j can be used to refer to the point on the grid at the i-th column from the left and the j-th row from the bottom. I’ll refer to it as “Cartesian indexing”.

Note: i and j are represented in this diagram as "i, j"

Increasing i moves you rightward, and increasing j moves you upward. In other words, i specifies the x-coordinate, and j specifies the y-coordinate. This correspondence between Cartesian coordinates and Cartesian indexing is flipped, roughly, from the correspondence between Adafruit_GFX coordinates and matrix indexing. It’s not an exact reversal because that positive-y means up while positive-“y” means down.

The axes of matrix and Cartesian indexing

What does this mean for us? What’s the consequence? We need to change coordinate systems i.e. transform the touch inputs. Fortunately, I’ve found a cheap trick for this. If you look at the i’s and j’s in the above diagram (and set aside the conflicting x’s and y’s), you may suspect that the transform we need to do with a rotation. I did try this, and it did work. That said, the trick is to know that the physics doesn’t change if we run the simulation on a space that is itself rotated.

With the trick, the screen and sim no longer operate on the same space, but corresponding points have the same coordinates/indices. Without the trick, points in the shared space have different coordinates/indices for the sim and screen.

Going about it this way, the i and j of a pixel on the screen, using matrix indexing, and the i and j of a point in the simulation space, using Cartesian indexing but also being rotated relative to the screen, are identical. With this trick, the transform is to do nothing! (If we speak in x, “x”, y, and “y”, instead, then that’s a swap of the axes, but it’s more like swapping labels.)

It also happens here that the actual arrays used for sim operations are now the same shape as the arrays used for screen operations. This comes from the correspondences we mentioned before being flipped.

Combining the physical rotation of the sim space with the scaling that also accounts sim being sixteenth-resolution, we now have taken a touch from the XPT2046 format to the sim space.

That leaves the third part to capture: an estimate of the velocity. There is nothing built in for this, so I had to tease out one out. An idea that I exploited to get it is that, as the stylus is dragged across the screen, it had to have traveled from where we last observed a touch to where we see a touch now. This is a displacement that we divide by the time elapsed to get an approximation of the velocity. To use vector notation, we can write this as the expression

\[\tilde {\bold v} = \frac{\Delta \bold x}{\Delta t}\]

where $\Delta t$ is the time elapsed and $\Delta \bold x$ is a vector composed of the change in the $x$-coordinate and the change in the $y$-coordinate. (We can use either coordinate system’s definition of x and y for this, trick or not.) This approximation gets less accurate as $\Delta t$ increases, but I settled for 20 ms without too much thought. I just enforced this period with a delay.

Approximating the current velocity with the previous displacement

That said, the caveat is that this idea doesn’t define what to do when the user starts to drag the stylus, where there is no previous touch. Strictly speaking, we can save ourselves from going into undefined territory if we code in this logic: if the user was touching the screen before and is still touching the screen now, then we can calculate a velocity, and in all other cases (not now but yes before, not now and not before, and yes now but not before) we cannot.

Finally, if we had detected a touch and calculated a velocity, then the touch task succeeded in generating a valid input, and this can be put in the queue to be served to the sim!

That leaves the render task, using the TFT_eSPI library. We’ll again cover this in a future part, but the fluid simulation puts out individual arrays for red, green, and blue, but they together represent the color. Let’s say that I had full-resolution arrays instead of sixteenth-resolution ones. Then, we’ve already set the sim up such that we need not do anything to change coordinate systems. Every pixel on the screen is some i rows down and some j columns to the right, and its RGB values can be found at (i, j) in the respective arrays. The approach would be to go pixel by pixel, indexing into the arrays with the pixel’s singular i and j, encoding the RGB values into 16-bit color, and then sending it out. It would have been as simple as that.

Now, let’s reintroduce the fact that we only have sixteenth-resolution arrays.

Because this now means that the arrays correspond to a screen smaller than the one we have, we have a classic upscaling problem. There are sophisticated ways to go about it, but I went with the cheapest one: each element in the array gets to be a 4x4 square on the screen. From what I could tell, it was all I could afford. Because it meant that the 4x4 square was of a single color, I could reuse the encoding work sixteen times! Really though, if I had more computing power, I suspect that this would’ve been an excellent situation for those sophisticated methods to tackle.

Hello from the future! It turned out that I could afford more than that. The project now performs an upscaling that is based on a particularly efficient approach to “bilinear interpolation”. This is something that I want to write more about in the future, but until then, here’s a link to the particular code that does this in the project.

This choice of upscaling alone might offer a fast enough render of the fluid colors, especially if we batch the 16 pixels that make up the square into a single call of fillRect(). That’s one of the functions that was established by Adafruit_GFX. However, I found that I needed an even faster render, so I turned to some features that were unique to TFT_eSPI: “Sprites” and the “direct memory access” (DMA) mode.

Now, googling for “direct memory access” is bound to yield what it is and exactly how to implement it, but to use the DMA mode offered by TFT_eSPI, we only need to know the general idea. That is, a peripheral like the display bus can be configured to read a range of memory without the CPU handling it. For us, this means we would be able to construct the next batch of pixels while the last one is being transferred out. However, to do this effectively, we’ll need to batch together more than just 16 pixels.

That’s where “Sprites” come in. Yes, you might think of pixel-art animation when I say “sprites”, but here, it’s a convenient wrapper around some memory. Presenting itself as a tiny virtual screen, called the “canvas,” it offers the same functions that we can expect from a library following Adafruit_GFX. As long as we remember to use coordinates that place the square in this canvas (and not the whole screen!), we can load it up with many squares using the same fillRect() call, but under the hood, no transferring takes place yet. Once this sprite is packed with squares, only then do we initiate a transaction with a single call to pushImageDMA(), this function invoking the DMA mode. From there, we can start packing a new batch of squares at the same time.

fillRect() is called with (x_local, y_local) as where the square starts, pushImageDMA is called later with (x_canvas, y_canvas) as where the sprite starts, and meanwhile the previous sprite is still transferring

The caveat: if we pack squares into the same memory that we’re transferring out with DMA, then we’d end up overwriting squares before they ever reach the display. Therefore, we’d want two sprites—one for reading and one for writing—and then we’d flip which gets read from and which gets written to. This flip would happen after initiating the transaction but before we start packing new squares. Finally, the terminology for this is “double buffering”, more specifically that’s the “page flipping” approach to it, but the purpose of it here is more than just preventing screen tearing.

Classical page flipping is two buffers and two pointers: no data is copied between the buffers, but the pointers get swapped.

That covers the touch and render tasks, altogether describing how I used the touchscreen module. But between the hardware and my code are the TFT_eSPI and XPT2046_Touchscreen libraries, and that set in stone the features and conventions I got to work with. In particular, I had to lay out the exact relationship between Cartesian indices and the “(x, y)” Adafruit_GFX coordinates that have become ubiquitous across Arduino libraries, in large part because of the Adafruit_GFX library. With the rotation trick, we eliminated the transform between them. With that in mind, using XPT2046_Touchscreen was just a matter of scaling and maintaining a bit of memory. On the other hand, I turned to the DMA mode and “Sprites” that were uniquely offered by the TFT_eSPI library just to keep pace. Those features also kept within the Adafruit_GFX box, so just a bit of extra care (double buffering, that is) was needed.

With this post and the last post done, there’s one last task to cover: the next post is an overview of the physics before we get into the implementation. Stay tuned! But if you’re here before that post comes out, there’s always the code itself on GitHub.

Rebuilding ESP32-fluid-simulation: overview of tasks and intertask communication in FreeRTOS (Part 1)

2023-07-21T00:00:00+00:00

I graduated from college a couple of months ago, and ever since I’ve been interested in revisiting the things I put out while I was still learning. In particular, I obsessed over how I could make it appear more accessible and more professional. To that end, I decided that I needed to tie my works closer to established research and—if not that—fundamental concepts that are easy to look up. I had been trying that with my blog posts, but this new post is about ESP32-fluid-simulation, namely one of my old projects about fluid simulation on an ESP32.

Coincidentally, I was lurking on Brian Lough’s Discord channel when I learned of a cool new development board, packing an ESP32 and a touchscreen. Retailing for just about $14 when you count shipping, it was far more accessible than the RGB LED matrix I was using back then. It seemed like a perfect platform to target my new edition of this old fluid sim, and I could even add touch input while I was at it.

So, how did this project get built again using established research and otherwise stuff you can look up? I’m trying to be thorough here, so this will actually be the first out of three posts. Where we start and where I started is at the highest level: the breakup of a single loop that does everything into many loops that are smaller, share time on a processor, and communicate with each other. (This is also the perfect chance to show what this project does at a high level.) After this post, we can get to the input, rendering, and simulation itself.

What allows a processor to split its time and facilitate this communication is a “real-time operating system” (RTOS). I don’t have the expertise to summarize everything that an RTOS is, but I can safely say that two things (not exclusive) that an RTOS can do, split processor time and facilitate communication, are things an “operating system” (OS) can do generally. Why this disclaimer? My knowledge about these features mainly comes from a lesson in parallel programming on Linux that I took in school. This and “concurrent programming” on an RTOS have some overlapping concepts, but they’re not the same. In fact, the difference led me to a real trip-up as I was rewriting this project, and I can detail how this happened along the way.

The part of operating systems—generally—that allows a processor to split time is the scheduler. Let’s lay out the characteristics of the scheduler that gets used in the ESP32. The ESP-IDF comes with its own distribution of the open-source FreeRTOS, this version being called “ESP-IDF FreeRTOS”, and it can be shown that it roughly matches the default configuration. That configuration is “preemptive” scheduling with “round-robin time-slicing” for “equal priority tasks”. What do all those keywords mean? “Preemptive” means that splitting processor time is achieved by having higher-priority loops (“tasks” in FreeRTOS terminology) interrupt lower-priority loops. With few exceptions, higher-priority tasks always interrupt lower-priority tasks. These tasks do what they do and then stop interrupting, though they themselves can be interrupted by even higher-priority tasks. The below diagram shows one example of how this happens.

The highest-priority available task is the one that will be run

“Round-robin time-slicing” for “equal priority tasks” just means that tasks take turns in the case of a tie.

Two tasks that are equal in priority do not interrupt each other, but a scheduler with round-robin time-slicing will still split time between them

When the scheduling works, all tasks can appear to be running at the same time!

The ideal

Still, this scheduler behavior isn’t the same as in Linux. On one hand, a high-priority task is guaranteed to run on time, barring even higher-priority tasks and said exceptions (keyword “priority inversion”). On the other, if a high-priority task runs forever, then a lower-priority task never runs. That’s been termed “starvation”. This is what I accidentally caused, but to describe how I got there, we need to lay out the actual tasks that make up the project along with that other feature of an RTOS I mentioned: facilitating communication between tasks.

Originally, ESP32-fluid-simulation was written like any other Arduino project. It used the setup() and loop() functions for code that ran once and code that ran forever, respectively. Putting aside the code in setup(), the loop() function had five general blocks: (1) calculate new internal velocities, (2) add user input to the new velocities, (3) correct the new velocities, (4) calculate new fluid colors using the corrected velocities, and finally (5) render the new colors. For context, we capture everything we want to model about the fluid using just the internal velocities and color, but we’ll get to that in a later post. Altogether, this sequence can be visualized with a simple flowchart, showing the whole big loop.

However, the Arduino core for ESP32 was written on top of ESP-IDF, and we’ve already established that the ESP-IDF uses FreeRTOS. As a result, all FreeRTOS functions can be called in Arduino code (not even a header #include is needed!). So, I immediately broke out the five blocks into three tasks: an touch task, a simulation task, and a render task. In each task, the input of a block might be the output of another block that sits in another task, and we’ll get the data across… somehow. We’ll get to that. With this in mind, we can at least update the flowchart to show three concurrent tasks and the data dependencies between them.

The missing thing here is the facilitation of communication, which I left exclusively to FreeRTOS. To be more precise, FreeRTOS offers a couple of “synchronization primitives” that can be used to guarantee that “race conditions” never happen. Ignore using synchronization primitives in your concurrent applications at your own peril, for “race conditions” mean that the result depends on whatever way the scheduler executes your tasks. In other words, you can’t depend on the result at all! For example, the classic bank account example shows how a badly coded ATM network can vanish your money, thanks to a race condition.

I can’t cover every synchronization primitive, but the two I need to cover are the “binary semaphore” and the “mutex”. I’ll also cover the “queue”, an all-in-one package for safe communication that FreeRTOS offers. (You can see the FreeRTOS documentation for the rest, but the guide to FreeRTOS offered by Digikey is also useful.) As we cover these in the context of my three tasks, we’ll also be able to go over my trip-up.

A “mutex” is the canonical solution to our bank account race condition. A task must “take” the mutex, read and write the balance (in general, any shared memory), and finally “give” back the mutex. Because no interrupting task can take a mutex that is already taken, the race condition is prevented! This guarantee is called “mutual exclusion”. Furthermore, while the interrupting task cannot take the mutex it is forced to wait until it can, and in that time the scheduler is free to run lower-priority tasks. When the interrupting task runs into this, it’s in a “blocked” state.

A “binary semaphore” has a different canonical purpose. Quite simply, one task is blocked until another task says it can go ahead, and this go-ahead flag is then reset after that. Because the other task gives the go-ahead, it can also complete any operations it needs to complete before then. This guarantee is called “precedence”.

See the StackOverflow question for what "P" and "V" stand for, but they pretty much mean the semaphore operations this figure implies

Finally, I’ll only be vague here because the FreeRTOS documentation on “queues” is clear enough already: besides the classic synchronization primitives, an all-in-one package for communication between tasks, called a “queue”, is also offered. Tasks can just send to the queue and receive from the queue—all without triggering race conditions. Further, if a task is sending to a full queue or receiving from an empty one, it is blocked. They’re quite convenient in that sense!

All said, when I say that a task is “blocked”, that’s because we’re using the “blocking” mode. FreeRTOS also offers a “non-blocking” mode that instead lets the task do something else, and this non-blocking mode also offers the same guarantees. In all cases except one, I used the blocking mode.

Moving on, how do these apply to our three tasks? Between the touch task and the simulation task, I just needed the touch task to pass along valid touches to the simulation task. For that, I defaulted to a queue, and I used the non-blocking mode here to make the simulation task receive everything in the queue but move on after that. I left the touch task to send into the queue in the blocking mode. Between the simulation task and the render task, however, the semantics of a queue didn’t make much sense. After all, would I really “send” a set of large arrays (representing fluid color) between tasks? Instead, I allocated a single set of arrays and managed to make the two tasks share the set without race conditions. The race conditions I was anticipating: the simulation task starts updating the fluid colors while the render task is still reading them, or the render task starts reading while the simulation task is still writing.

At first, I thought that I only needed a mutex. If I wasn’t using an RTOS, this technically would’ve worked, but therein lay my problem. I needed semaphores instead. Why I couldn’t do without semaphores has to do with the preemptive scheduling built into FreeRTOS. Because the render task happened to have a higher priority than the simulation task, it would take the mutex, give it back, and then immediately take it back again.

Nothing stopped the render task from running forever, and so the simulation task was starved. If the scheduler was more like the Linux scheduler or if the tasks were on equal priority, then the simulation task technically would’ve gotten to take the mutex eventually. But I’m glad that I wasn’t technically correct because that forced me to acknowledge the semaphore-based solution to the race condition. This solution also worked on FreeRTOS and didn’t involve the processor wasting time on a task that spun between taking and giving back the mutex endlessly. Using binary semaphores, I got this: a write is always preceded by a complete read, and a read is always preceded by a completed write. In the following diagram, the former is represented by semaphore “R”, and the latter is represented by semaphore “W”.

Each semaphore prevented one of the race conditions, but they also blocked the tasks from spinning.

Now with the queue and these binary semaphores in mind, that completes how I broke apart a single Arduino loop() into smaller tasks that safely pass data to each other. To visualize it in its entirety, we can update the flowchart with this communication.

To explain the symbols a bit, the pipeline symbol stands for the queue, and the document symbol stands for the shared fluid colors. The dashed arrows represent communication between the tasks, pointing from where it’s initiated to where it’s awaited. (As we’ve established, they literally do wait for it!)

All said, while this post and flowchart emphasized the concurrent programming with safe communication that FreeRTOS offers, it also happens to serve as a high-level overview of this reimagining of my old project—and from a task-focused perspective at that! This nicely sets the stage for explaining what each task does in the next posts. Stay tuned to read about the touch and render tasks in the next post!

If you’re already here before that post comes out though, there’s always the code itself at the ESP32-fluid-simulation repo on GitHub.

Recoloring backgrounds to align with the Solarized base palette again (plus color, light mode support, and a demo!)

2023-06-02T00:00:00+00:00

A couple of months back, I wrote “Recoloring backgrounds to align with the Solarized Dark base palette”, and when I wrote that I wasn’t expecting to do a second part. At the time, because I had just encountered the Solarized palette, I didn’t even begin to fathom how you could add colors to the backgrounds. Still, even then I could imagine what it would look like, and shortly after I wrote that article I started to go down what seemed like the right path. I found myself making a 3D scatter plot of the entire Solarized palette as CIELAB values, and it looked to me like a spinning top in the middle of falling over.

So, I thought that all I might need to do was transform the colors of an image into points in CIELAB space, tip them over just the same, and then transform them back into RGB color. However, I didn’t come around to trying that idea until now. After a great deal of experimentation, I’ve found a particular style of “solarizing” images that generally works for any image: start by following the monochrome scheme that aligns with the Solarized base palette, then allow some subtle tinting with the other colors.

You can try it for yourself using a demo I put on HuggingFace.

Ultimately, it didn’t just involve tipping over a top. The general outline for achieving the effect is this:

Transform the colors of the image into points in CIELAB space,
reduce their saturation/”chroma” component,
remap their lightness component,
rotate and shift them (still in CIELAB space), and finally
transform them back into RGB color.

It’s worth noticing here that all the work was done in CIELAB space. It is the coordinate space in which the Solarized palette was canonically defined, but it’s also a space with a very convenient property. That is: the lightness of a color is an independent component. Out of the components of a point in CIELAB space, $L$, $a$, and $b$, lightness is just $L$. Given some—say—purple, you can get the same purple but brighter or darker by varying just $L$, and you leave the $a$ and $b$ components alone. If we worked in RGB instead, we would have to vary the red, green, and blue components together.

The $a$ and $b$ components together form a plane of all possible mixtures of the primary colors, and a specific $a$ and $b$ mean a specific mixture. Going in the $+a$ direction gets a redder mixture. The $-a$ direction gets a greener mixture. $+b$ gets a yellower one, and $-b$ a bluer one. That said, in this case, we should think about the $a-b$ plane in polar coordinates. In polar, the angle is called the “hue” (the very same hue that you’d pick from a color wheel), and the magnitude is called the saturation or “chroma”.

The $L$, $a$, and $b$ components all have meanings that make each step of the process into simple operations. On top of that, scikit-image gives us convenient functions that step in and out of CIELAB space, called rgb2lab and lab2rgb respectively. That’s the advantage of working in CIELAB space. With that in mind, what are we trying to do in each step? We’ll want to cover this backward, starting with the shift and rotate—the meat of the method!

In my previous post, I chose to throw out color, and then I mapped the grayscale values onto the line going through the Solarized base palette in CIELAB space.

However, all grayscale values can be thought of as the line where $a=0$ and $b=0$, or in other words the $L$-axis, and “throwing out color” can simply be thought of as a linear projection of all values onto it. Because we can think of the Solarized base palette as a line and all grayscale values as another line, a similar (but not the same) way to do what I did before is to do the projection then apply an “affine” function. “Affine” functions take the general form

\[y = Ax+b\]

and they differ from linear functions (their general form being $y=Ax$) only by a translation, expressed as the additional term $b$. Using an affine function makes sense here because the canonical center of CIELAB space is $(50, 0, 0)$, not the origin. (For that matter, the center of the Solarized base palette isn’t the origin either.)

On the mention of an affine function, you might follow up that thought by solving for A and b, perhaps by using a linear algebra package. In fact, though we have the Solarized base palette to possibly serve as $y$, we have nothing to serve for $x$. Before anyone mentions it, the Solarized website shows the colors it replaces for the xterm program, but a different set of colors of a different program can be replaced by the Solarized palette just the same. If we took the xterm colors as $x$, then we can just as arbitrarily take the colors of Google Chrome or Visual Studio Code as $x$. That is to say again: we have no solid choice for $x$. In that way, we’re forced to give up on using data to determine $A$ and $b$.

Instead, let’s give $A$ and $b$ some value, but we’ll guide our choice with some intuition. We’ll start with this: since we already know the center of CIELAB space and the center of the Solarized base palette, we can rewrite the affine transform as

\[y - y_0 = A (x - x_0)\]

where we should notice that we implicitly set $b$ to $y_0 - A x_0$. This intuitively defines $b$ as whatever brings the center of $Ax$ from $A x_0$ to $y_0$.

That leaves defining $A$. Given that we’re passing in $x-x_0$ and getting out $y-y_0$, subtraction of the centers $x_0$ and $y_0$ means we’re actually passing in a line through the origin and getting out a different line through the origin. The natural operation that should come to mind here is rotation.

One definition of a rotation matrix is parameterized by yaw, pitch, and roll

\[\begin{align*} A & = \underbrace{ \begin{bmatrix} \cos\alpha & -\sin\alpha & 0 \\ \sin\alpha & \cos\alpha & 0 \\ 0 & 0 & 1 \end{bmatrix} }_\text{yaw} \underbrace{ \begin{bmatrix} \cos\beta & 0 & \sin\beta \\ 0 & 1 & 0 \\ -\sin\beta & 0 & \cos\beta \end{bmatrix} }_\text{pitch} \underbrace{ \begin{bmatrix} 1 & 0 & 0 \\ 0 & \cos\gamma & -\sin\gamma \\ 0 & \sin\gamma & \cos\gamma \end{bmatrix} }_\text{roll} \\ & = \begin{bmatrix} \cos\alpha \cos\beta & \cos\alpha \sin\beta \sin\gamma - \sin\alpha \cos\gamma & \cos\alpha \sin\beta \cos\gamma - \sin\alpha \sin\gamma \\ \sin\alpha \cos\beta & \sin\alpha \sin\beta \sin\gamma + \cos\alpha \cos\gamma & \sin\alpha \sin\beta \cos\gamma - \cos\alpha \sin\gamma \\ -\sin\beta & \cos\beta \sin\gamma & \cos\beta \cos\gamma \end{bmatrix} \end{align*}\]

where $\alpha$, $\beta$, and $\gamma$ are the yaw, pitch, and roll angles respectively.

In my previous post, I found that the principal component of the Solarized base palette line was $(0.9510, 0.1456, 0.2726)$. For the $L$-axis, we can just take $(1, 0, 0)$ as the unit vector that spans it. Since these two vectors are unit-length, we can say that the rotation matrix is such that

\[\begin{bmatrix} 0.9510 \\ 0.1456 \\ 0.2726 \end{bmatrix} = A \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\]

Solving for $\alpha$, $\beta$, and $\gamma$ yields

\[\begin{bmatrix} 0.9510 \\ 0.1456 \\ 0.2726 \end{bmatrix} = \begin{bmatrix} \cos\alpha \cos\beta \\ \sin\alpha \cos\beta \\ -\sin\beta \end{bmatrix}\] \[\begin{align*} \alpha & = 0.152 \\ \beta & = -0.275 \\ \gamma & \text{ is free} \end{align*}\]

where we happen to find that roll about the $L$-axis, or in other words hue rotation, doesn’t matter! Let’s just let $\gamma = 0$.

We’ve now fully defined the shift and rotate, that being an affine transform. Therefore, we could now get something like my old post while working entirely in CIELAB space. Instead, remember that we could throw out colors by projecting onto the $L$-axis? To get colors, we just don’t do that and then proceed with the shift and rotate anyway! Let’s visualize what we’ve done so far with the help of this diagram.

Now, what about the preprocessing steps?

Let’s look at the lightness remap first. Solarized is a low-contrast palette that offers a light mode and a dark mode. If we flip to the development section of the Solarized documentation, we find that it does so by assigning an upper and lower subset (not mutually exclusive) of the base palette to each respectively.

Given one mode or another, a fair expectation is that colors exclusive to the alternate mode are never encountered or else the theme is not low-contrast! For the same reason, we shouldn’t expect colors that are outside both palettes as well. Therefore, we need to restrict the range in which we expect points going through the rotate and shift to land, and that target range is a segment of the line going through the base palette along with the neighborhood around that segment.

Taking the dark mode first, the target range is the segment between base03 and base1—excluding the brightest base2 and base3—and the neighborhood around it. We can invert the rotate and shift to find what values on the $L$-axis they correspond to. That’s how we find that the condition for achieving the target range is $8.1397 < L < 59.4372$. Therefore, if we remap the points of the input such that their lightness components fall into that range, we’re golden. The remap is

\[L_\text{new} = \frac{59.4372-8.1397}{100-0} L + 8.1397\]

where $100$ and $0$ are the maximum and minimum possible lightness. On top of that, we don’t need to touch the $a$ and $b$ components. However, this remap may as well be the definition of destroying contrast, and breaking out of the target range a bit may be worth it. Taking $8.1397 < L < 59.4372$ as just a guideline, we can bounce between setting a new remap and generating a histogram until the distribution of lightnesses mostly falls in that range. I’ve provided an interface for that tweaking on HuggingFace, and we can go through an example in a moment.

Taking the light mode, the target range is between base01 and base3, ignoring base03 and base02, and this corresponds to a target lightness of $38.7621 < L < 93.8699$. The rest of the process is the same.

Finally, what about reducing the chroma? We do that to enforce the style, and that called for subtle tinting. As I mentioned before, when we rewrite the $a$-$b$ coordinates as polar coordinates, the chroma is the magnitude and the hue is the angle. So, cutting the chroma by some factor means cutting the magnitude of the $a$-$b$ coordinate. Of course, cutting the $a$ component and the $b$ component each by the same factor is equivalent. If we let the factor by which we cut the chroma be $\mu$, then

\[a_\text{new} = \mu a \qquad b_\text{new} = \mu b\]

where I’ve found that $\mu = 0.25$ is a factor I like.

So, that’s the entire process for “solarizing” a background image defined. Let’s step through it in order with an example to review. We can input the Carina Cliffs into the Huggingface demo.

Here, we see that I had set the actual lightness range to $10 < L < 70$. After I clicked the preprocess button to perform the chroma cut and lightness remapping, we also see that the lightness histogram is acceptably in the target range for Solarized Dark. Finally, I clicked the transform button to perform the shift and rotate, yielding me the new background.

In the absence of data to base this process on, we were still successful in finding a way to align backgrounds to the Solarized base palette while also adding a bit of color to it. To do so, we chose sensible and geometric operations in CIELAB space, and we satisfied some constraints by inverting those operations to find the conditions to do so. Though this method works generally, I’ll add that there are places where change might be interesting, perhaps on the matter of defining a new style that works generally or reshaping the distribution of lightnesses. But in any case, though what I did wasn’t exactly tipping over the spinning top, I can have the wonderful colors of the Carina Cliffs back now!

Detecting motion in RPLIDAR data using optical flow

2023-05-26T00:00:00+00:00

Over a week, I happened to hack together an interesting procedure that ended up being an important part of the senior capstone project I was contributing to. The objective of this procedure: if it moves…

Context: three people in moving a room

…detect it! The sensor involved here is the RPLIDAR, a low-cost “laser range scanner” that yields distances from itself at all angles. The principle behind the procedure is “optical flow”, a whole class of techniques for inferring the velocity of an object in a video by looking from frame to frame. The specific technique I used is a classic called the “Lucas-Kanade method”. It turned out that the same reasoning that constructs it (and optical flow more generally) also works with the data taken from the RPLIDAR.

That said, there has to be a fair bit of preprocessing on that data beforehand. I think the preprocessing itself poses an interesting introduction to some backgrounds though, so I’ll cover it too. To see this whole procedure, we’ll use the below example data to visualize the steps. Before, I used that data to devise the procedure in the first place, and it had been collected for me by someone else.

First, the RPLIDAR yields an irregular sampling of the room around it for a variety of reasons—from protocol overhead to measurement failure. Some may call this kind of data “unstructured”. On the other hand, with video essentially being a grid of dynamically updating pixels, optical flow expects regularly-sampled data. One easy-to-see solution to this is an “interpolation”. The general idea behind “interpolation” is to construct a continuous function that goes through discrete samples, unstructured or not, then collect new, regularly-sampled data from the function.

At the time, I chose to use “radial basis function” (RBF) interpolation. However, that ended up being a poor choice because something about the data forced me to accept a very relaxed form of it. What do I mean here? The result of an interpolation is not necessarily smooth. The simplest kind of interpolation, linear interpolation or “lerp”, is just connecting the samples with straight lines.

Linear interpolations can be extremely jagged for some data. RBF interpolation promises a degree of smoothness on the other hand, but it can also simply fail—to put it shortly. Explaining exactly how it fails seems a bit beyond the scope here, but suffice it to say that it failed here. The result of that failure was the relaxed form, and it amounted to a kind of curve-fitting. Though it still yielded a smooth, continuous function, it no longer went through the points. Well, curve-fitting is another solution to this problem, anyway. We can collect regularly-sampled data from it too.

Here, let’s use a proper curve-fitting procedure in the first place! A good one is the Python make_smoothing_spline function offered by SciPy. This routine has some peculiarities, so I’ll leave here an Interpolator class that has a working use of it.

class Interpolator:
    def __init__(self, memory_size=512, lam=1e-3):
        self.memory = np.zeros((memory_size, 2))
        self.lam = lam

    def update(self, samples):
        self.memory = np.roll(self.memory, -len(samples), axis=0)
        self.memory[-len(samples):] = samples

    def take(self, x):
        # get samples in ascending order of angle
        angles = self.memory[:, 0]
        argsort_angles = np.argsort(angles)
        angles = angles[argsort_angles]
        distances = self.memory[:, 1][argsort_angles]

        # remove duplicate angles
        angles_dedup = [angles[0]]
        distances_dedup = [distances[0]]
        for i in range(1, len(angles)):
            if angles[i-1] != angles[i]:
                angles_dedup.append(angles[i])
                distances_dedup.append(distances[i])

        # the above was because make_smoothing_spline requires angle[i] > angle[i-1]
        interp_func = make_smoothing_spline(angles_dedup, distances_dedup, lam=self.lam)
        return interp_func(x)

Notice that the samples are stored in a buffer before they get interpolated. The person who looked at the data before me noticed that the Python RPLIDAR driver that was used, rplidar, only gave bursts of samples that didn’t contain a full rotation. Therefore, I needed to hold on to at least part of the previous burst. The output of this particular code when inputting our example data is this

However, it’s still noisy. It jitters a little from frame to frame, and I’ve once seen this noise become a problem before. (For the record, this noise was even worse when I used linear interpolation.)

Review: Removing noise using low-pass filters

“Low-pass filters” and “filters” in general have a wide variety of uses, but you may or may not be familiar with a major function of “low-pass filters”: removing noise. But to answer why this works, we have to ask ourselves a more basic question: what is noise? In the broadest sense, it’s the part of a signal that we don’t want. In a specific case, we have to decide what we don’t want, deeming that as noise, before we remove it.

Though I don’t trade stocks, a stock’s price is a great example. When people say to “buy the dip”, they recognize that prices have short-term trends (“the dip”) and long-term trends (a company’s continuing—presumably, anyway—track record of making money and thereby increasing shareholder value). Yet both of these behaviors make up the price. A company’s stock price might fall due to a random string of sells while the company itself makes money over the period at the same rate. If we were long-term traders, then the short-term trends wouldn’t matter to us—they’d be noise, and in this case “high-frequency” noise. We would want to remove them before making our decisions, and that’s where “low-pass filters” would apply. I’m not going to define them more formally, but suffice it to say that moving averages and exponential moving averages happen to fall into this category.

SMA and EMA technical indicators are low-pass filters. By Alex Kofman via Wikimedia and used under the CC BY-SA 3.0 license

Coincidentally, if we happened to be short-term traders, then the opposite would be true! Long-term trends would be noise, and there are “high-pass filters” for that.

To deal with the noise in the interpolated data, we’ll want to use a low-pass filter. At the time, my choice of a particular one was just a guess: feel free to Google “second-order Butterworth digital filter” or “IIR filters” if you want. Here, just a moving average of the last four frames also suffices.

class MaFilter:
    def __init__(self, n_channels=360, n_samples=4):
        self.samples = np.zeros((n_channels, n_samples))
        self.n_samples = n_samples
        self.i = 0
    
    def filter(self, x_t):
        self.samples[:, self.i] = x_t
        self.i = (self.i+1)%self.n_samples
        return np.mean(self.samples, axis=1)

Applying this code to our example data yields this

This data is finally a good base to extract motion out of! Now, optical flow has a rich history involving many, many specific end-to-end techniques. “Optical Flow Estimation” by Fleet and Weiss and “Performance of optical flow techniques” by Barron, Fleet, and Beauchemin look to me like very comprehensive descriptions of the older ones. However, since those texts were about applying optical flow on video, let’s work out the same reasoning on our RPLIDAR data. We can let $r(\theta, t)$ be the distance from the RPLIDAR at angle $\theta$ and time $t$. (It’s worth noting here that a single frame here is one-dimensional, but a frame of a video is two-dimensional.) Motion can be expressed as the equality

\[r(\theta, t) = r(\theta+\Delta\theta, t+\Delta t)\]

or, in other words, the translation of distances by $\Delta \theta$ over a timespan of duration $\Delta t$. The next step is the “linearization” of this equality: a Taylor series centered at $r(\theta, t)$ replaces the right-hand side, but then we truncate away all terms involving second-order partial derivatives. The approximation we get is

\[r(\theta, t) \approx r(\theta, t) + \frac{\partial r}{\partial \theta} \Delta \theta + \frac{\partial r}{\partial t} \Delta t\]

Considering that $\Delta \theta / \Delta t$ is essentially velocity, we can isolate this as the ratio of partial derivatives

\[\frac{\Delta \theta}{\Delta t} \approx - \frac{\partial r / \partial t}{\partial r / \partial \theta}\]

This here is the point of divergence from the basic optical flow analysis on two-dimensional frames of video. In the two-dimensional case, the velocity has two components, and we wouldn’t have found an expression for both from a single equation. In general, that’s an underdetermined linear system, also called the “aperture problem” in optical flow texts. Here, the one-dimensional frame means velocity (with a single component) that we can just solve for.

To turn this into a procedure, the partial derivatives can be approximated by the finite differences

\[\frac{\partial r}{\partial t} \approx \frac{r(\theta, t) - r(\theta, t-\Delta t)}{\Delta t}\] \[\frac{\partial r}{\partial \theta} \approx \frac{r(\theta+\Delta \theta, t) - r(\theta-\Delta \theta, t)}{2 \Delta \theta}\]

where $t-\Delta t$ means the previous frame, $\theta+\Delta \theta$ means to the next angle in the grid, and $\theta-\Delta \theta$ the previous. $\Delta \theta$ comes from the spacing of the grid, and $\Delta t$ can be measured using Python’s time.time(). Altogether, we have now completely specified one possible velocity estimation procedure. In practice, it gave me a few problems that weren’t just noise.

To be clear, this is the absolute value of the raw velocities times ten. You can see here a couple of issues:

Small flash-points in the velocity estimation that were consistent enough to beat the low-pass filter
A hole in the velocity estimate at the center of the moving object

One particular thing I tried that seemingly dealt with both problems is the “Lucas-Kanade method”. Originally, it was devised as the solution to the underdetermined linear system conundrum. On the assumption that neighboring pixels shared the same motion, the equations constructed from these pixels were imported, and this turned an underdetermined system into an overdetermined one with a least-squares solution. Doesn’t the same assumption apply here?

The modified construction is as follows. We can represent the partial derivatives at some $\theta$ and $t$ as $\partial r / \partial \theta \mid_{(\theta, t)}$ and $\partial r / \partial t \mid_{(\theta, t)}$. For some specific $\theta_i$, let’s also consider the 16 angles to its right and the 16 to its left, altogether $\theta_{i-16}, \theta_{i-15}, \dots, \theta_{i+16}$. The partial derivatives (approximated by finite differences) can be taken at these angles and formed into the vectors

\[R_\theta(\theta, t) = \begin{bmatrix} \partial r / \partial \theta \mid_{(\theta_{i-16}, t)} \\ \partial r / \partial \theta \mid_{(\theta_{i-15}, t)} \\ \vdots \\ \partial r / \partial \theta \mid_{(\theta_{i+16}, t)} \end{bmatrix}\] \[R_t(\theta, t) = \begin{bmatrix} \partial r / \partial t \mid_{(\theta_{i-16}, t)} \\ \partial r / \partial t \mid_{(\theta_{i-15}, t)} \\ \vdots \\ \partial r / \partial t \mid_{(\theta_{i+16}, t)} \end{bmatrix}\]

What do we do with these vectors? We can start again with the linearization

\[r(\theta, t) \approx r(\theta, t) + \frac{\partial r}{\partial \theta} \Delta \theta + \frac{\partial r}{\partial t} \Delta t\]

and manipulate it into the “equation”

\[0 \approx \frac{\partial r}{\partial \theta} \frac{\Delta \theta}{\Delta t} + \frac{\partial r}{\partial t}\]

which we can extend with our vectors under the shared motion assumption

\[0 \approx R_\theta \frac{\Delta \theta}{\Delta t} + R_t\]

where $R_\theta$ and $R_t$ are just shorthand here for $R_\theta(\theta, t)$ and $R_t(\theta, t)$. Though this vector equation usually doesn’t have a solution, it takes the classic form of “minimize $Ax-b$”. The solution to “minimize $Ax-b$” is $x = (A^\intercal A)^{-1} A^\intercal b$, or in our case

\[\frac{\Delta \theta}{\Delta t} \approx (R_\theta^\intercal R_\theta)^{-1} R_\theta^\intercal R_t\]

It is convenient here that $R_\theta$ and $R_t$ are vectors. We can see that $R_\theta^\intercal R_\theta$ is just the square magnitude $\left\Vert R_\theta \right\Vert^2$ and $R_\theta^\intercal R_t$ is just the dot product $R_\theta \cdot R_t$. So, we can just reduce the velocity estimator to just

\[\frac{\Delta \theta}{\Delta t} \approx \frac{R_\theta \cdot R_t}{\left\Vert R_\theta \right\Vert^2}\]

Using the following code, we can apply this estimator to our example data and get the following result

class VelocityEstimator:
    def __init__(self, window_size=16):
        self.h_prev = np.zeros(360)
        self.window_size = window_size

    def estimate(self, h, dt):
        dtheta = 2*np.pi/360
        
        dh_dt = (h-self.h_prev)/dt
        dh_dtheta = (np.roll(h, -1)-np.roll(h, 1))/(2*dtheta)

        dh_dtheta_neighbors = np.empty((360, 2*self.window_size+1), dtype=np.float32)
        dh_dt_neighbors = np.empty((360, 2*self.window_size+1), dtype=np.float32)
        for j in range(2*self.window_size+1):
            shift = j-self.window_size
            dh_dtheta_neighbors[:, j] = np.roll(dh_dtheta, shift)
            dh_dt_neighbors[:, j] = np.roll(dh_dt, shift)
        
        # calculates all estimated velocities as many dot products
        elementwise_product = np.multiply(dh_dtheta_neighbors, dh_dt_neighbors)
        v_est = -np.sum(elementwise_product, axis=1)/np.sum(dh_dtheta_neighbors**2, axis=1)
        
        self.h_prev = h
        return v_est

Compared to the results of the other procedure, the hole mostly disappears and the flash-points are suppressed. This signal appears to be so clean that all you could need is a threshold detector (possibly with hysteresis) to find all the directions of motion.

So, that’s the process I used to detect motion using the RPLIDAR. It’s made of a lot of random concepts—perhaps because it was hacked together over a week. So, it might serve more as a demonstration of how these concepts get applied than a whole, proven procedure. I’m sure that there are more effective, simple, or rigorous ways to solve the same problem. Still, this outline hopefully was an interesting read that inspires you to dive deeper into any of the backgrounds it invokes.

Recoloring backgrounds to align with the Solarized Dark base palette

2022-11-06T00:00:00+00:00

I know that I’m not the only person who made the “Carina Cliffs” into their desktop background on the day those first JWST shots were released. I had the idea shortly after I saw them, and it’s stayed on my desktop through the months since. However, I also switched from Windows to Pop!_OS to Arch Linux along the way, and sooner or later I wanted to theme my system. I eventually settled on Solarized Dark as my palette of choice, but then I had a problem. Solarized Dark focused on muted hues of blue as its base palette, but that clashed with the vibrant, orange splashes of my new favorite background.

import numpy as np
import matplotlib.pyplot as plt
from skimage import io, color

image = io.imread('carina.png')
image = image[::8, ::8] # downsample the image to 1/64 size for this blog post
io.imshow(image)
plt.show()

The ordinary idea would have been to switch to a background that aligned better, but–no–I wanted to keep my “Carina Cliffs”. So, I needed to recolor it. There were a couple of ways I could have gone about it, like composing the shot from scratch. The original infrared data was out there, but I was no color scientist.

Instead, my plan started with converting the image to grayscale (though throwing out the color hurt somewhat).

image = color.rgb2gray(image)

io.imshow(image)
plt.show()

Next, I wanted to map grayscale values to colors along a “curve” going through the base palette of Solarized Dark. But what was this “curve”?

The Solarized Dark palette originally defined its colors as carefully placed points in the CIELAB space. Unlike RGB, the CIELAB space moved away from pixel brightnesses to coordinates based on human vision. Consequently, moving along any straight path in this space should look like a natural transition of colors. This is what I wanted to take advantage of by drawing a “curve”.

That said, though I knew Solarized Dark was careful about its color coordinates. I didn’t know exactly what it did. At worst, I thought that I might need to draw a Bezier curve, but it turned out to be much simpler.

# palette[:, 0] is L, palette[:, 1] is A, palette[:, 2] is B
palette = np.array([
    [ 15, -12, -12], # Base03
    [ 20, -12, -12], # Base02
    [ 45,  -7,  -7], # Base01
    [ 50,  -7,  -7], # Base00
    [ 60,  -6,  -3], # Base0
    [ 65,  -5,  -2], # Base1
    [ 92,   0,  10], # Base2
    [ 97,   0,  10], # Base3
])

mean = palette.mean(axis=0)
print('mean:', mean)

U, sigma, V = np.linalg.svd(palette-mean)
principal_component = V[0]
print('principal_component:', principal_component)

line_pts = np.outer(np.linspace(-42, 42, 10), principal_component)+mean

fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter3D(palette[:, 0], palette[:, 1], palette[:, 2], c=palette[:, 0])
ax.plot3D(*line_pts.T)

plt.show()

mean: [55.5   -6.125 -2.875]
principal_component: [0.95104299 0.14562397 0.27260023]

In fact, the entire base palette was placed approximately along a straight line! The “curve” I wanted could just be this line. In my searches, I found one approach to getting it: finding the “principal component” using the “SVD”. That method gave some parameters of the line that I needed.

That was:

mean: a reference point on the line
principal_component: a unit vector in the direction of the line

There was just one last thing I needed: the endpoints. This was something I just eyeballed.

t_start = -42 # approx where base03 is
t_end = 11 # approx where base1 is

print('t_start:', t_start)
print('t_end:', t_end)

# copied from previous cell
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter3D(palette[:, 0], palette[:, 1], palette[:, 2], c=palette[:, 0])
ax.plot3D(*line_pts.T)

# plot the endpoints of the line
ax.plot3D(*(principal_component*t_start+mean).T, 'x', color='blue')
ax.plot3D(*(principal_component*t_end+mean).T, 'x', color='blue')

plt.show()

t_start: -42
t_end: 11

These endpoints were represented as the final parameters:

t_start: zero brightness will be mapped to principal_component*t_start+mean
t_end: max brightness will be mapped to principal_component*t_end+mean

And with this line fully defined, I could hop to it from grayscale as I planned.

orig_shape = image.shape
image = image.flatten()
image = image*(t_end-t_start)+t_start
image = np.outer(image, principal_component)+mean
image = image.reshape(*orig_shape, 3)
image = color.lab2rgb(image)

io.imshow(image)
plt.show()

And so, I had my new “Carina Cliffs”, recolored to align with my new theme! I’m sure that this isn’t the only method, but it was the first one that I tried and liked.

If anyone else wants to recolor their backgrounds in this way, it turns out to be quite the churn. For an 8K background like the “Carina Cliffs”, I’ve had a couple of OOM-kills along the way on my 8GB machine, but I have optimized the process into this quick and small script.

Investigating the math of waveshapers: Chebyshev polynomials

2022-06-18T00:00:00+00:00

Over a year ago, I wrote “Adding harmonic distortions with Arduino Teensy”. In that post, I happened upon a way to apply any arbitrary profile of harmonics using a Teensy-based waveshaper (just except that waveshapers categorically can’t vary the phase of each harmonic). However, when I wrote that, I totally missed out on the established literature on the topic! Even in 1979, there was “A Tutorial on Non-Linear Distortion or Waveshaping Synthesis”, and I ultimately had taken a very convoluted path only to arrive at the same place!

To compare it to the method I showed before, one can adapt that 1979 tutorial to the Teensy waveshaper quite naturally, and the adapted method is far easier to implement and more concise. However, to do the adaptation, we have to know one thing: what is a “Chebyshev polynomial”?

Chebyshev polynomials can be used in a rigorous approach to building waveshapers according to some desired profile of harmonics. In this context, their claim to fame is that they’re polynomials that can twist a $\cos x$ wave into its $n$-th harmonic, or in other words

\[T_n(\cos x) = \cos(nx)\]

You already know one if you can recall the double-angle formula, $\cos(2x) = 2\cos^2 x - 1$. Now, imagine un-substituting $\cos x$ from the right-hand side, and you’ll get the Chebyshev polynomial $T_2(x) = 2x^2 - 1$. Then, imagine a double-double-angle formula, $\cos(4x) = 2\cos^2(2x)-1$, and expand that to $8\cos^4 x - 8\cos^2 x + 1$. Unsubstituting $\cos x$ from that gets the Chebyshev polynomial $T_4(x) = 8x^4 - 8x^2 + 1$.

Okay, my only reason for bringing up $T_4(x)$ was this elegant-looking plot, though it's not as elegant for other $n$. That aside!

Now, algebraically manipulating these angle identities into polynomials is a nice hat trick, but there is a simpler way to think of all the Chebyshev polynomials. In the first section of Chebyshev Polynomials by Mason and Handscomb (the first book that appeared on Google Scholar, don’t @ me), you can find the claim that algebraic manipulations of De Moivre’s theorem are—technically—all that you need to find a Chebyshev polynomial $T_n(x)$ for arbitrary $n$. But in that same section, you can find an easy recurrence that connects them all:

\[T_n(x) = 2x T_{n-1}(x) - T_{n-2}(x)\]

where $T_0(x) = 1$ and $T_1(x) = x$ to start. For example, we can use this recurrence to get from $T_2(x)$ to $T_4(x)$ by way of $T_3(x)$

\[\begin{align*} T_3(x) & = 2x T_2(x) - T_1(x) \\ & = 2x (2x^2-1)-x \\ & = 4x^3 - 3x \end{align*}\] \[\begin{align*} T_4(x) & = 2x T_3(x) - T_2(x) \\ & = 2x (4x^3-3x)-(2x^2-1) \\ & = 8x^4 - 6x^2 - 2x^2 + 1 \\ & = 8x^4-8x^2+1 \end{align*}\]

where you can notice here that $T_3(x)$ corresponds with the triple-angle formula!

Hopefully, that’s enough about Chebyshev polynomials for us to start understanding how to use them here. Assume that $\cos x$ is our input signal (we can see how this assumption breaks down later). By the definition of the Chebyshev polynomials, $\cos x$ happens to be equal to $T_1(\cos x)$, and so we can therefore use $T_1(x) = x$ as a kind of stand-in for $\cos x$. In the same way, we can represent some $n$-th harmonic as the polynomial $T_n(x)$. Therefore, some linear combination of $\cos x$ and its harmonics can be represented as a linear combination of the Chebyshev polynomials, and that would be another polynomial in itself!

In other words, if we let $\alpha_n$ be the ratios between the harmonic and the fundamental (for $n \geq 2$, since $n = 1$ is the fundamental itself), then this polynomial can be written as

\[f(x) = T_1(x) + \sum_{n=2}^\infty \alpha_n T_n(x)\]

In fact, this is only a few minor tweaks away from being what we throw into the lookup table of a Teensy waveshaper. Everything can be written in only four steps!

How to generate a waveshaper lookup table in four steps!

Decide what amplitude ratios $\alpha_n$ each $n$-th harmonic should have with the fundamental frequency
Build a preliminary function $f_0(x)$ as the linear combination of the Chebyshev polynomials
\[f_0(x) = T_1(x) + \sum_{n=2}^\infty \alpha_n T_n(x)\]
where the first Chebyshev polynomials are
\[\begin{align*} T_0(x) & = 1 \\ T_1(x) & = x \\ T_2(x) & = 2x^2-1 \\ T_3(x) & = 4x^3-3x \\ T_4(x) & = 8x^4-8x^2+1 \end{align*}\]
and the rest can be derived by the recurrence relation
\[T_{n+1}(x) = 2 x T_n(x)-T_{n-1}(x)\]
Shift $f_0(x)$ so that it maps zero to zero (for preventing constant DC) by evaluating $f_0(x)$ at $x=0$ then subtracting that
\[f_1(x) = f_0(x)-f_0(0)\]
Normalize $f_1(x)$ by finding the maximum absolute value for $-1 < x < 1$ (try plotting $f_1(x)$) then dividing by that
\[f_2(x) = \frac{f_1(x)}{f_{\text{1,maxabs}}}\]

The above function, $f_2(x)$, is your final function. Evaluate it at as many points within $-1 < x < 1$ as can fit in your waveshaper’s LUT! If the input sine wave swings exactly within $-1 < x < 1$, then the ratios $\alpha_n$ will be realized. Otherwise, different and smaller ratios will occur.

Using this method, I can perfectly replicate my old post!