Reference post: differential equations

Well, one differential equation

Feb 10, 2023

There are a couple of places where I’ve been told that I lose people when describing information equilibrium and I am going to create a series of reference posts that act as explainers for those things at a high school math level. The kid is currently in high school math, so I have a test subject.

Slopes and derivatives

We’ll start with the slope of a line. The equation of for a line in “slope-intercept form” — i.e. one that lets you read off the slope (how steep) and intercept (where it intersects with the y-axis) is

\(y = m \; x + b\)

where m is the slope and b is the intercept (i.e. y = b if x = 0). The formula for calculating the average slope between two points (x₁, y₁) and (x₂, y₂) (or the actual slope if it is a straight line) is

\(\displaystyle \text{slope} = \frac{y_{2}-y_{1}}{x_{2}-x_{1}} = \frac{\Delta y}{\Delta x}\)

This is rise (change in y, the rise along the y-axis) over run (change in x, the run along the x-axis). I added the Greek capital letter delta (Δ) way of writing this — you’ll hear people refer to the difference between things as the “delta” which is just the equivalent of the Latin letter D in Greek (D for Difference). It’s often used in sciences where you are referencing a change. For those better with pictures, here’s a graphical representation:

If we plug our line formula into the slope formula we get:

\(\begin{eqnarray} \text{slope} & = & \frac{y_{2}-y_{1}}{x_{2}-x_{1}} \\ & = &\frac{(m\; x_{2} + b)-(m \; x_{1} + b)}{x_{2}-x_{1}} \\ & = &\frac{m\; x_{2} -m \; x_{1} }{x_{2}-x_{1}}\\ & = & m \frac{x_{2} - x_{1} }{x_{2}-x_{1}}\\ & = & m \end{eqnarray}\)

That is to say the slope is m when the equation for a line is written in slope intercept form. It works.

What do you do if it’s not a line, though? Say, a cubic function where y = a x³?

\(\begin{eqnarray} \text{slope} & = & \frac{y_{2}-y_{1}}{x_{2}-x_{1}} \\ & = &\frac{a \; x_{2}^{3} -a \; x_{1}^{3}}{x_{2}-x_{1}} \\ & = & a \frac{ x_{2}^{3} - x_{1}^{3}}{x_{2}-x_{1}} \\ & = & a \frac{ (x_{2}^{2} + x_{2}x_{1} + x_{1}^{2})(x_{2} - x_{1})}{x_{2}-x_{1}} \\ & = & a (x_{2}^{2} + x_{2}x_{1} + x_{1}^{2}) \end{eqnarray}\)

The average slope now depends on where exactly your two points are. Not as clean as for a line, but it’s for a cubic which rises faster and faster as you go to larger and larger x so it makes sense for the slope to depend on x.

However, if you move your points really close together you can find something interesting. Let’s say x₁ = x − δx/2 and x₂ = x + δx/2 where δx is really small. That’s a lowercase Greek letter delta that’s frequently used to mean a tiny quantity (or a tiny difference). We have x₂ − x₁ = δx, and using this in the slope formula above:

\(\begin{eqnarray} \text{slope} & =& a ((x + \delta x/2)^{2} +(x + \delta x/2)(x - \delta x/2)+ (x - \delta x/2)^{2})\\ & =& a (3 x^{2} + \delta x^{2}/4 )\\ & \approx & a ( 3 x^{2} + 0)\\ & = & 3 \; a \; x^{2} \end{eqnarray}\)

We can “drop” the δx² term as being very small (actually take the limit as δx → 0). The slope at an instantaneous point along your cubic function y = a x³ is just 3 a x². I chose a cubic there because it actually matters that δx is small. If you do it this way for a parabola, you still get the right answer (for y = a x² it is just 2 a x, notice a pattern?) but it doesn’t matter that δx is small — it completely cancels out.

Note that if you take that limit early on in the derivation you get a divide by zero issue because

\(\text{slope} = \frac{y_{2}-y_{1}}{x_{2}-x_{1}} = \frac{y_{2}-y_{1}}{\delta x} \)

However you might have noticed that in the derivation for the cubic function, the numerator had a x₂ − x₁ = δx term on the outside. This tells us that for our tiny run, the related tiny rise δy was equal to δy = f(x) δx where that function f(x) turned out to be f(x) = 3 a x². That δx in the numerator cancelled when we divided δy by δx leaving us with just f(x). Or

\(\begin{eqnarray} \delta y & = & f(x) \delta x\\ \frac{\delta y}{\delta x} & = & \frac{f(x) \delta x}{\delta x}\\ & = & f(x) \end{eqnarray}\)

So that if y = a x³ then δy/δx = 3 a x². If we turn those lowercase Greek deltas into ordinary Latin d’s we have Leibniz’s notation for what is called a derivative:

\(\text{if} \;\; y = a \; x^{3} \;\; \text{then} \;\; \frac{dy}{dx} = 3\; a\; x^{2}\)

Being a physicist, we often do weird things with mathematical notation so we’ll write the derivative as an “operator” (d/dx) — a kind of mathematical procedure applied to a function:

\(\begin{eqnarray} \text{if} \;\; y = a \; x^{3} \;\; \text{then} \;\; \left( \frac{d}{dx} \right)\; y & = & \left( \frac{d}{dx} \right) \; (a \; x^{3}) \\ & = & 3\; a\; x^{2} \end{eqnarray}\)

You can drop the parentheses around the d/dx since it’s usually pretty clear from the context. That’s a derivative — it’s a function that gives the slope of another function at every point along it.

So if you can apply a derivative to a function, can you undo it? Yes, and it’s called an integral shown with a big swoopy 18th century letter S plus a dx. It’s traditionally written with the S on the front and the dx on the end:

\(\displaystyle \int 3 \; a \; x^{2} \; dx = a x^{3} + \text{constant}\)

That “constant” is called the constant of integration and it’s there because applying the derivative formula to a constant function f(x) = c gives you zero: y₂ − y₁ = c − c = 0. Again, physicists do weird things with mathematical notation and in “operator form” you’d see

\(\displaystyle \left( \int dx \right) \; 3 \; a \; x^{2} = a x^{3} + c\)

So you can build a kind of algebra of functions where you apply some operations and undo them in various orders and creating equations with unknowns. The whole field of applying derivatives to functions, saying they satisfy some equations, and solving those equations is called …

Differential equations

So let’s say y is a function of x (we have to be more explicit now that we’re going to treat y as an unknown) and we ask what function y(x) has derivative 3 a x², i.e. solve:

\(\frac{d}{dx} y(x) = 3 \; a \; x^{2}\)

for y(x). Well, we know what it is from earlier in this post and we un-did the derivative by integrating. This is what is called a directly integrable differential equation

\(\begin{eqnarray} \frac{d}{dx} y(x) & = & 3 \; a \; x^{2}\\ \int dx \frac{d}{dx} y(x) & = & \int dx \; 3 \; a \; x^{2}\\ y(x) & = & a \; x^{3} + c \end{eqnarray}\)

Now I can make this slightly more complicated. Note that 3 a x² = 3 (a x³) / x = 3 y/x. That means I can write our differential equation like this:

\(\begin{eqnarray} \frac{d}{dx} \; y = 3 \frac{y}{x}\\ \frac{dy}{dx} = 3 \frac{y}{x}\\ \end{eqnarray}\)

We’re almost to the main differential equation in information equilibrium. It’s also the first kinds of differential equations to be solved in the late 1600s [pdf] (also wikipedia). It’s no longer a directly integrable equation but can be solved with some calculus tricks like the chain rule. The derivative of the natural log function “ln” or “logₑ” for log base e which in physics is almost always just written log is1:

\(\frac{d}{dx} \log x = \frac{1}{x}\)

If we have the log of a function, then we use the chain rule:

\(\frac{d}{dx} \log y(x) = \frac{1}{y(x)} \frac{d}{dx} y(x) = \frac{1}{y} \frac{dy}{dx}\)

Using these two facts, we can go back to our original differential equation and re-arrange it:

\(\begin{eqnarray} \frac{dy}{dx} & =& 3 \frac{y}{x}\\ \frac{1}{y} \frac{dy}{dx} &=& 3 \frac{1}{x}\\ \frac{d}{dx} \log y &=& 3 \frac{1}{x}\\ \int dx \frac{d}{dx} \log y &=& \int dx \; 3 \frac{1}{x}\\ \log y & = & 3 \log x + c_{1}\\ \log y & = & \log x^{3} + c_{1}\\ \log y & = & \log x^{3} + \log e^{c_{1}}\\ \log y & = & \log (e^{c_{1}} x^{3} )\\ y & = & c_{2} \; x^{3} \end{eqnarray}\)

where c₂ is just some constant that was relabeled from exp(c₁).

Two things to note: 1) when we said 3 a x² = 3 y/x we actually destroyed the information about a so the resulting differential equation cannot bring it back and 2) if our original equation had been a x³ + c then we would have had to have an additional term ~ c/x in order to get the same constant of integration as the directly integrable version — when we said 3 a x² = 3 y/x we said it’s not there.

However, it’s this more complicated solution that is directly relevant to the main differential equation used in information equilibrium:

\(\displaystyle \frac{dA}{dB} = k \frac{A}{B}\)

where A is the information source (abstract “demand”), B is the information destination (abstract “supply”) and k is the information transfer index. I write this sometimes in the form A ⇄ B to represent information flowing between the process variables to maintain equilibrium. In some more typical physicist’s abuse of notation we can solve this equation:

\(\displaystyle \begin{eqnarray} \frac{dA}{dB} &=& k \frac{A}{B} \\ \frac{dA}{A} &=& k \frac{dB}{B} \\ \int \frac{dA}{A} &=& \int k \frac{dB}{B} \\ \int \frac{dA}{A} &=& k \int \frac{dB}{B} \\ \log A - \log A_{0}& = & k \; (\log B - \log B_{0})\\ \frac{A}{A_{0}} & = & \left( \frac{B}{B_{0}} \right)^{k} \end{eqnarray}\)

where the constants of integration A₀ and B₀ appear to make the units come out right. That k is playing the role of the 3 in our original example — I snuck it out of the integral this time because you’re allowed to do that as the integral of a constant times X is equal to that constant times the integral of X.

The only differential equation you need to know?

Well, at least for information equilibrium. There are some additional ways to mess with this equation — additional variables (partial derivatives with d → ∂), holding things constant, or assuming a functional form for A and B. These lead to Cobb-Douglas functions, supply and demand curves, and dynamic information equilibrium models (DIEMs), respectively that are all described in my paper — which should be a bit more accessible after getting through this reference post. There’s going to be reference post on partition functions in the future (link [here] when available) — one that will help with the occasional appearance of angle brackets ⟨A⟩ which are basically expectation values with respect to maximum entropy distributions for which partition functions are a useful tool.

You can get an idea of how to obtain this from the slope formula above due to the fact that log(a) − log(b) = log(a/b)

Information Equilibrium

Discussion about this post