The Physics
Opus in profectus

Kinetic-Molecular Theory

search icon



The postulates of kinetic molecular theory (also known as molecular kinetic theory) start sensibly with…


Which can be expanded to…


Then we need another postulate to explain what molecules are…

and how they behave…

If you think it sounds like I've been teaching in Brooklyn too long, think again. You could walk up to Sir Isaac Newton in 1665 when he was avoiding the plague (literally) and, after a proper introduction, tell him these postulates in almost these exact words and he would understand you. To be sure, you'd probably have to replace the 18th century French word "molecule" with the Latin word molecula (little mass) or the classical Greek word ἄ τομος (atomos, uncuttable). He would understand you, but he probably wouldn't believe you… at first.

Despite being a old concept (originating nearly 25 centuries ago in Greece) atoms weren't really given serious consideration until the end of the 19th century. It would take some convincing that matter was reducible to little parts — parts that themselves couldn't be reduced to anything simpler. Even today, trying to convince someone that atoms exist is a tough sell. We're just so used to the terminology that we don't even think about it.

A postulate is a statement that is assumed to be true for the purposes of logical reasoning. Every rhetorical argument starts by stating certain things as given. There are many reasons to believe that atoms exist from chemical observations (disussed elsewhere in this book), but we don't need them for this discussion. We'll just assume they exist and see where the arguments lead us.

Once again, the postulates of kinetic molecular theory, in shortened form…

and now for the first time, in expanded form…


Oh, Mr. Baldessari, do you live in the complex?

No, my dear, I live in the simple.

Unnamed college student asking the (apparently no so) famous American artist John Baldessari if he lived in student housing, no date, but probably some time in the 1970s

This is a derivation of a famous formula first published by the English physicist and brewer James Joule (1818–1889) in 1843. It's a simplification of reality, but it works. Even after you fancy it up with proper statistics, you still get the same simple answer. Reality may be complex, but physics is simple. This is a common theme in physics and other sciences.

Animated molecule in a box

Imagine a box of dimensions x, y, z with a single gas molecule of mass m in it. (Matter be molecules.) So much space to occupy! (Molecules be small.) Imagine the molecule is traveling parallel to the edge of the box labeled x toward the face of the box labeled yz with a speed v. (Molecules be moving.) Bang! It hits the wall and bounces back with no loss of energy or momentum. (Molecules be elastic.) On each bounce, the wall transfers this much momentum to the molecule…

p = mv = m[(+v) − (−v)] = 2mv

It takes this much time for the molecule to return to the wall…

t =  s  =  2x
v v

From Newton's second law of motion (or its logical equivalent the impulse-momentum theorem), the force applied by the wall is equal to the time rate of change of the molecule's momentum. This turns out to be twice the kinetic energy of the molecule divided by the width of the box.

F =  p  =  2mv  =  mv2  =  2K
t 2x/v x x

Let's give the molecule some friends so that the box is filled with N molecules. Assume that one-third of them are traveling parallel to the x axis, one-third parallel to the y-axis, and one-third parallel to the z-axis. (This is an application of the equipartition of energy theorem to be discussed later.) Then the pressure on any one face (the force divided by the area of the face) would be two-thirds of the total kinetic energy of all the molecules divided by the volume of the box.

P =  N(F)  =  N(2K/x)  =  2   NK  =  2   NK
A yz 3 xyz 3 V

Let's rearrange this a bit.

PV = 23NK

Use the ideal gas law (the statistical version that uses Boltzmann constant, k).

PV = NkT

Substitute the right side of this old equation into the left side of our new equation.

NkT = 23NK

The number of molecules, N, cancels out. Solve for kinetic energy, K, and Bob's your uncle — we're done.

K = 32kT

Well Bob's not my uncle and we're not quite done. We made some pretty extreme assumptions beyond the four allowed postulates. The big one being that all the molecules were moving with exactly the same speed in one of only three directions. In a real macroscopic box of microscopic molecules, the molecules would be whizzing in all directions, smashing into each other countless times per second, and traveling with all sorts of different velocities. Turns out, none of this matters. In the middle of the 19th century (roughly 1859–1866) the Scottish physicist James Clerk Maxwell (1831–1879) worked out most of the statistical details. The Austrian physicist Ludwig Boltzmann extended Maxwell's work in 1871. The final theory is known as Maxwell-Boltzmann statistics and it is beyond the scope of this book (i.e., I don't understand it well enough to write about it with authority).

Maxwell-Boltzmann statistics describe the behavior of large numbers of non-interacting particles — particles that interact only briefly when they collide (that is, they don't stick together to form a solid, or react chemically in any way, or do anything else to upset their boring behavior). These particles have momentums and energies that are spread out over a range of values. (The proper term is distributed.) The molecules of a gas don't have a single-valued, all-identical kinetic energy K, they have a time-averaged, typical kinetic energy K. The symbols on either side of K the are called angle brackets and are used whenever a quantity is averaged over time for a large collection of entites (called an ensemble).

For an ideal gas, the time-averaged kinetic energy of the molecules is directly proportional to its temperature.

K⟩ = 32kT

Some people think that this equation is the definition of temperature. (We knocked that off in an earlier chapter of this book.) They're wrong. It's much, much more than a mere definition.

This equation comes from a mathematical proof that starts with (what were at the time) unproven assumptions about the fundamental nature of matter (the postulates of KMT), applies Newton's laws of motion and elementary kinematics to unseen and unseeable entities (molecules), and then seemingly accidentally combines them with the laws that were derived from the easily observable properties of gases. Since Newton's laws and the gas laws were well established in the 19th century as true (for lack of a better word), the postulates must also be true. If you believe (for lack of a better word) that Newton's laws and the gas laws are true, then molecules are true as well. This equation is proof that molecules exist.


Kinetic molecular theory is a mixture of classical mechanics and statistics. The numbers involved are so large, however, that the basic statistics most people learned are nearly useless. Take the concept of an average. Everyone is taught this as a "sum and divide" process. For example…

According to the 2010 US Census, the average American family consisted of 3.14 people (π people!). This was determined by asking the self-identified head of every household to complete a survey. Two questions were used to determine the answer:

  1. Who lives with you?
  2. What is their relationship to you?

Every time the head of the household checked off at least one relative, they added 1 to the running total for the "number of family households" and 2 to the running total for the "number of family members". Every additional relative checked off after that added another 1 to the "number of family members". The average family size was then computed to be…

number of family members  = average family size
number of family households
243 280 168  = 3.14
77 538 296

Let's try this with the air in a party balloon. There are something on the order of 1023 molecules in this container. Identifying each molecule is out of the question and trying to measure anything about them is a census for the insane.

  1. Are you a molecule?
  2. What are you wearing?

The number 1023 is so large it might as well be infinity. Although molecules are discrete entities, we're going to use the statistics of continuous variables to describe their behavior.

Let's play a game of chance. Pick a number, any number… between zero and one. If I pick your number, you give me a buck. If I don't, I give you nothing. It's a game I can't win and you will never lose (probably). There are an infinite number of real numbers between zero (and most of those have an infinite number of digits). This makes the census for the insane look sensible in comparison. If you said…


and I said…


I still wouldn't win (but then, my answer isn't a real number in any known number system).

To really play a game like this, we'd have to agree on a spread. If the difference between my number and your number is less than or equal to a tenth, you pay me. As long as the other rule applies (If I don't win, I give you nothing), you should not play this game. You will eventually lose and, if you play long enough, I will eventually win all of your money.

Determining the probability that my number equals your number plus or minus some spread is easy since no real number is any more likely than any other real number in this game. The direction of motion of the molecules in a stationary, closed container is similarly distributed. Since the container isn't moving, the gas inside isn't moving, but the molecules that make up the gas are moving. This implies that the molecules are moving in every possible direction with equal likelihood. Since they don't "agree" on a direction, the gas as a whole doesn't move. (The word "agree" is written in quotes since molecules don't have minds and can't possibly have intentions.)

This simple uniform distribution of probabilities doesn't apply to any of the other properties of the molecules, however. For things like the speed of the molecules, we need to introduce a new statistical concept — the probability distribution.

A probability distribution is a type of continuous function. It doesn't tell you what the probabilities are by itself, but (as the name implies) it does tell you how the probabilities are distributed if you do a little bit of extra work. The integral of a probability distribution over some range tells you the probability that a value within that range will occur. The integral across all possible values is 1, since a value of some sort will exist.

The best known probability distribution is most certainly the normal distribution. It was discovered in 1809 by the German mathematician Carl Friedrich Gauss (1777–1855), which is why it is also known as the gaussian distribution. It looks vaguely like a bell (rounded on the top, flaring smoothly outward on the edges), which is why it is also known as the bell curve. It is used widely in many scientific disciplines. It agrees with our notion that when a large number of measurements are made, there will be many values near the middle and few on the edges. (This is known as the law of large numbers.)

The normal distribution can be written in many ways. The starting point is the standard normal distribution.

p(x) =  1 e
−  x2


p(x) =  the value of the probability distribution
x =  any real number
π =  a familiar transcendental number, 3.141 592 653…
e =  a less familiar transcendental number, 2.718 281 828…

The area under a segment of this curve is the probability that a number will occur within the range of values. The area under the entire curve is 1 since some value must exist. (x = 0 is still a value.)

Standard normal distribution

The standard normal distribution has a mean, median, and mode of 0 (three different measures of central tendency) as well as a standard deviation and variance of 1 (two related measures of statistical dispersion). In less formal language, the standard normal distribution is centered around 0 and has a spread of 1. A normal distribution can be made with any mean (μ, the Greek letter mu) and any standard deviation (σ, the Greek letter sigma), but that discussion is best left to another time and place.

molecular speeds

The probability distribution for the molecules in an ideal gas is called the Maxwell-Boltzmann distribution. It can be written to describe the kinetic energies or the components of the momentum but it is most frequently used to describe the speeds of the molecules in an ideal gas…

p(v) = 4v2


− mv2


p(v) =  value of the probability distribution [no unit]
v =  speed of the molecules in an ideal gas [m/s]
T =  absolute temperature of the gas [K]
k =  Boltzmann constant, 1.380 649 × 10−23 J/K
π =  a familiar transcendental number, 3.141 592 653…
e =  a less familiar transcendental number, 2.718 281 828…

The area under any segment of this distribution is equal to the probability of finding a molecule in an ideal gas with a speed within the range of the segment. The area under the curve from 0 to +∞ is 1, since every molecule we find will have some speed. (v = 0 m/s is still a speed.)

Maxwell-Boltzmann distribution

The Maxwell-Boltzmann distribution is different from the normal distribution in that it cannot have any values below zero. Negative speeds just don't make any sense. A velocity vector with one or more negative components still has a positive magnitude. This skews the distribution and makes it asymmetric. It also makes the measures of central tendency more interesting. They aren't all the same.

Physicists use the term most probable speed to describe what mathematicians would call the mode. On a function like this, the most probable value is the one that gives you the highest point on the curve. That occurs where the derivative of the function is zero and the second derivative is negative. The Maxwell-Boltzmann distribution is easier to write out if we let…

a =  4
b =  m


p(v) = ab3/2v2e bv2

Which has the derivative

dp  = ab3/2

2ve bv2 − 2bv3e bv2

dp  = 2ab3/2ve bv2 (1 − bv2)

The whole equation is equal to zero when anyone of these parts is equal to zero.

v = 0
e bv2 = 0
1 − bv2 = 0

The first equation is already solved, v = 0. The second is true in the limit when v = +∞. Both of these are local minima, however, which is not what we're after. (I see no reason to prove this mathematically. Just look at the curve.) The last equation is the one we care about.

1 − bv2 =  0
v = ±√ 1

Undoing the substitution (and ignoring the negative root, since speed is positive) gives us the formula for the most probable speed for the molecules in a gas (the mode of the distribution).

vp = √ 2kT

Now let's find the mean speed (also known as the average speed or the expected speed). For a probability distribution, the mean is the value multiplied by the probability distribution integrated over the entire range of possible values. I'm going to use the angle bracket notation, v, but you could also write the average by overlining the variable, v, or adding a descriptive subscript, vave.

v⟩ = 

v p(vdv 

For the Maxwell-Boltzmann distribution with the substitution

a =  4
b =  m

we get

v⟩ = 

ab3/2v3e bv2 dv

I might be able to do this myself, but I prefer using an online integrator. Here's what it said the answer is.

v⟩ = 

−½ab−½e bv2 (1 + bv2)


This simplifies quite nicely to

v⟩ =  a

Again we undo the substitution and simplify. This is the equation for the mean speed of the molecules in a gas.

v⟩ = √ 8kT

Which can also be written like this

v⟩ =  2  √ 2kT
√π m

That makes it look like the equation for the most probable speed with a multiplier in front.

v⟩ =  2  vp

Isn't the Maxwell-Boltzmann distribution nice? Two of the most common measures of central tendency (the mode and the mean) have a simple relationship. What more could we hope for? Thank you for listening to this presentation.

Well, before you go, there is one more thing (slight pause). Today, I'd like to introduce you to the most popular measure of central tendency in kinetic-molecular theory. It's three things. The first is a root (hold for applause), the second is a mean (longer hold for applause), and the third is a square (hold a bit as the audience anticipates the announcement). So, three things: a root, a mean, and a square. Are you getting it? (The audience laughs with delight as they realize the outcome.) These are not three separate operations. This is one operation. (The applause grows thunderous.) And we are calling it rms. (The audience sits in stunned silence as the beauty of the new operation is revealed to them in detail.) Excuse me for a moment while I change out of this black turtleneck and faded blue jeans and put on a dinner jacket and top hat (my usual attire).


The root mean square is what its name says. It's the root of the mean of a bunch of values that were squared. It's another measure of central tendency and it is popular in physics for ensemble calculations where the mean of a value is zero (like the velocities of the molecules in a gas) but the mean of the magnitudes is not zero (like the speeds of the molecules in a gas). For a probability distribution, the root mean square is the square root of the integral over the range of possible values of the value squared times the probability distribution. That makes more sense in symbols than it does in words.

vrms = 

v2 p(vdv ½


For the Maxwell-Boltzmann distribution with the substitution

a =  4
b =  m

We get

vrms = 

ab3/2v4e bv2 dv ½


I am lead to believe that this integral can be solved, but I would not know where to begin. Even when I pass the heavy lifting off to an online integrator I have a hard time understanding the results. After the indefinite integral has been evaluated over the limits, you get this.

vrms = 

3√π   a ½

8 b

Unsubstitute the temporary symbols a and b.

vrms = 

3√π   4   2kT ½

8 √π m
vrms = 

3   2kT   ½

2 m

The root mean square speed of the molecules in a gas can be written like this (which is what I consider the final form).

vrms = √ 3kT

Or it can be written like this (in a way that makes it easy to compare it to the most probable speed).

vrms = √ 3  √ 2kT
2 m

So let's do that. Let's compare the rms speed to the most probable speed. (Note that the root symbol is only supposed to encompass the multiplier 32 and not the variable vp.)

vrms = √ 3  vp
Central measure of molecular speeds
most probable speed mean speed root mean square speed
vp = √2kT
v⟩ = √8kT
vrms = √3kT

vp = 1 vp

v⟩ =  2  vp
vrms = √ 3  vp

Distribution of molecular speeds for N₂ at 3 different temperatures

Distribution of molecular speeds for O₂, H₂, and H₂O at the same temperature

Macroscopic, Microscopic

six modes of freedom

  1. heave
  2. sway
  3. thrust
  4. roll
  5. pitch
  6. yaw

degrees of freedom

Photons created at the Sun's center travel a distance of 2 × 1010 times the Sun's radius before emerging. The trip takes something like 30,000 years.