Differential equations: how does separation of variables really work?

One of the simplest forms of differential equation to solve is that in which the variables can be separated on either side of the equality. Such equations occur frequently in science and engineering, and are usually the first problems of this type to be taught to students.

Some of the operations that textbooks and classes teach to students, as part of the process of solving separable-variable equations, appear to be mathematically highly dubious. Students are typically taught to treat the differential operator dy/dx as if it were a simple ratio of two quantities, whose individual terms can be multiplied and divided independently. However, dy/dx is not a ratio, and is not clear why these procedures work. Worse, it is unclear whether it is always safe to treat the differential operator in such a cavalier fashion, or whether it is only safe for particularly classes of problem.

This situation irritated me when I was first taught calculus more nearly forty years ago, and it continues to irritate me to this day.

In this article I will attempt to describe how we can deal with separation of variables without playing fast and loose with differentials. In doing so, I hope to provide some insight into what is going on behind the scenes when we follow the method that is usually taught. The arguments in this article apply also to other procedures in calculus that appear to rely on treating a differential operator as a ratio, such as integration by U-substitution.

I'll start by describing the usual textbook procedure for solving a simple differential equation with separable variables, and the explain what's wrong with it, and how it can be put right.

I want to point out right from the start that this article is really about the philosophy of mathematics. You can follow the textbook method and get the right answer without understanding any of this stuff.

The textbook approach

Consider the following simple differential equation for x and y

$$ \frac {dy}{dx} = \frac {x^2}{1 - y^2} $$

Our job is to find a relationship between x and y that does not contain a derivative term, that is, to solve for y in terms of x.

The usual textbook approach to solving an equation of this type is to separate the variables -- putting all the x terms on one side, and the y terms on the other. The x terms have to include dx, and the y terms dy. We can multiply both sides by dx and by 1-y², and this gives us: $$ (1 - y^2) dy = x^2 dx $$

But what does it mean to 'multiply by dx?' dx is not a quantity -- it is a component of the operator d/dx[f(x)]. While it is true that the 'd' terms do, in some senses, represent small values ('infinitesimals', in the Leibnitz formulation), the operator itself is a limit expression (maybe; see the closing remarks for discussion) -- it denotes the limiting value of one infinitesimal as the other approaches zero. The components of dy/dx have little significance on their own, and they certainly don't form an arithmetical ratio.

Leaving that problem aside for now, the textbook procedure now typically calls for us to write an integration sign in front of each side (whatever that signifies), to create a pair of integrals: $$ \int 1-y^2 dy = \int x^2 dx $$

We now have something that is mathematically well-formed, but we've got to it by a peculiar process. Like the differential operator, the indefinite integral operator

∫ dx

is something that is meaningful in a particular form; the

∫

doesn't have much meaning on its own.

Be that as it may, integrating both sides with respect to their independent variables gives:

$$ y - \frac{y^3}{3} = \frac{x^2}{3} + C $$

With a bit of tidying up:

$$ 3 y - y ^3 = x^2 + C $$

This is a cubic equation in y, and not easy to juggle into a straightforward relationship of the form y=f(x). Still, we have a solution (or, rather, a family of solutions) to the original differential equation; the calculus is done, and the rest is algebra.

So what's the problem?

Most textbooks and classes gloss over the fact that dy/dx is not a ratio of two quantities, and treat the individual terms as though they can be multiplied, divided and canceled at will. Some authors do at least point out the problem, but usually to say "dy/dx is not a ratio, but it can be helpful on occasions to treat it as if it were." It's never very clear what these occasions are. Sometimes you'll come across statements such as "This procedure is justified by the chain rule" or "This procedure is justified by the Fundamental Theorem of Calculus." Maybe that's true, but how is it justified? You'll sometimes see expressions in dx or dy explained away with hand-waving expressions like "formal form" or "infinitesimal form". But what do those terms really mean? By leaving these questions unanswered, students are left with no way to figure out whether the approach being explained is generally applicable, or works only for a limited class of problems.

Another attempt

Let's see if we can't improve the formulation of the separation of variables procedure, and avoid some of these ugly mathematical kludges. Starting with the original differential equation: $$ \frac {dy}{dx} = \frac {x^2}{1 - y^2} $$

We can rearrange to give:

$$ 1 - y^2 \frac{dy}{dx} = x^2 $$ Doing this keeps dy/dx intact, for now.

Now let's take the indefinite integral of each side with respect to x. This is a legitimate thing to do, as we're using the same independent variable in the two integrals, even though the LHS is a function of y as well as x:

$$ \int 1 - y^2 \frac{dy}{dx} dx = \int x^2 dx $$

From here, we could "cancel out" the dx terms to leave:

$$ \int 1 - y^2 dy = \int x^2 dx $$

and this is the same separated form we arrived at earlier.

But what does it mean to "cancel out" the dx terms? Is doing this any better than manipulating dy and dx independently? Not really -- all we've done is postpone the application of a kludge -- we're still going to have to treat dy/dx as a ratio -- we're just doing it a bit later. The dx in the integration operator is not the same quantity as the dx in the differentiation operator, if it is even a quantity at all.

Giving the lack of mathematical rigour, it can be somewhat surprising that these procedures do actually work -- in this particular kind of problem, at least, manipulating dy and dx independently does allow the correct solution to be reached.

But how?

"Cancelling dx" as a shortcut to applying the chain rule

To figure out what's going on here, we need to think about what an indefinite integral actually means. In essence, when we look to evaluate

$$ \int 1 - y^2 \frac{dy}{dx} dx $$

what we're really asking is "What can be differentiated with respect to x, to give $$ 1 - y^2 \frac{dy}{dx} $$

It's an odd question, on the face of it, because aren't particularly used to see the results of a differentiation having differential terms in them. There's certainly no problem finding something that will differentiate to $$ 1 - y^2 $$ Straightforward integration gives us $$ y - \frac{y^3}{3} + C $$ for any constant C. Knowing this, it is possible to find an expression intuitively that will differentiate to $$ 1 - y^2 \frac{dy}{dx} $$ Such an expression is $$ y - \frac{y^3}{3} $$ where y is a function of x, and we differentiate in terms of x, not y. To perform this differentiation we need the chain rule:

$$ \frac{du}{dx} = \frac{du}{dy} \frac{dy}{dx} $$ If we take $$ u = y - \frac{y^3}{3} $$ where u is a function of x, then its derivative with respect to x is:

$$ \frac{d}{dx}[y - \frac{y^3}{3}] = \frac{d}{dy}[y-\frac{y^3}{3}] . \frac{dy}{dx} = 1 - y^2 \frac{dy}{dx} $$

In other words, the LHS of

$$ \int 1-y^2 \frac{dy}{dx} dx = \int x^2 dx $$

can be replaced to give:

$$ y - \frac{y^3}{3} = \int x^2 dx $$ This is exactly where we got to by "cancelling out" the dx terms and performing the integration of the LHS with respect to y, as explained above. We could then proceed by straightforwardly integrating the RHS.

The above reasoning shows that the "cancelling out" of the dx terms was justified, in the sense that doing it led to the same answer that could be obtained by intuitively finding an expression that could be differentiated to give the integrand in question. In practice, however, we need something more general -- something that justifies the operation used, and can be applied routinely in problems of this short. For that, we need to show with reasonable rigour that

$$ \int f(y) \frac{dy}{dx} dx = \int f(y) dy $$

That is, we need to show that "cancelling the dx terms" is mathematically valid.

To do that, we'll start with the chain rule again:

$$ \frac{du}{dx} = \frac{du}{dy} \frac{dy}{dx} $$ Then let $$ u = \int f(y) dy $$

Substituting into the chain rule: $$ \frac{d}{dx} ( \int f(y) dy ) = \frac{d}{dy} ( \int f(y) dy ) . \frac {dy}{dx} $$

The first term on the RHS is simply a differentiation of an integration with respect to the same variable, y, and so the equation reduces to:

$$ \frac{d}{dx} ( \int f(y) dy ) = f(y).\frac{dy}{dx} $$

Now if we integrate both sides with respect to x, we get $$ \int [ \frac{d}{dx} ( \int f(y) dy ) ] dx = \int f(y).\frac{dy}{dx} dx $$

As the LHS is just the integral of a derivative with respect to the same variable, the equation reduces to: $$ \int f(y) dy = \int f(y).\frac{dy}{dx} dx $$

which is what we set out to prove.

So where does that all get us?

In summary, when we split up dy/dx into separate terms and manipulate those terms separately, what we're really doing is integrating both sides of the equation with respect to the same variable, and then applying the chain rule to remove the differential term from the integrand. Treating dx and dy as elements of a ratio is sloppy, but it is methodologically sound to do so in cases where the chain rule would be capable of yielding the integrands in question. And, of course, it's a lot more convenient than applying the chain rule explicitly every time.

Closing remarks

The meaning of dy/dx is itself problematic. Traditionally it has been seen as an operator, derived by considering infinitesimal changes in one variable with respect to another. However, this formulation is not without its problems -- most prominently that there is no 'infinitesimally small' real number. The set of real numbers is, by definition, infinitely sub-dividable into smaller sets. To some extent, problems understanding what 'infinitesimal' meant in the context of real numbers led to derivatives being treated as limiting expressions, rather that ratios of infinitesimals. However, there are other ways to understand a derivative, including Abraham Robinson's rigorous definition of infinitesimals in terms of hyperreal fields. Some of these formulations might allow dy and dx to be treated independently in some circumstances.

Nevertheless, it seems to me that explaining the methodology of separation of variables and integration by substitution as a short-cut to the application of the chain rule is more likely to be understood by students, than a highly technical discussion of the meaning of a differential.