The Quake III Algorithm That Defies Math, Explained
Quake III Arena is one of the most influential games in FPS history, not just for being one of the greatest early multiplayer-focused shooters, but also due to the absolutely ingenious Quake III algorithm in the source code. This code accomplished something that was seemingly mathematically impossible and allowed the game to function faster and more smoothly, and has been a subject of investigation for more than 15 years. But what exactly was the purpose of this algorithm, and how did it work?
The Quake III Algorithm Itself
Below is an image of the Quake III algorithm exactly as appears in its source code, complete with some pretty funny comments added by another developer. Don’t worry if you don’t understand C coding, as each step will be explained later in further detail. The ultimate goal of this algorithm is to solve for the inverse square root of a number, i.e. 1/√x. Why this is important will be explained shortly, but for those who do understand C, try your best to follow along with the code and make sense of it yourself.
If you’re a programmer, then that second comment in the code should sum up your thoughts quite nicely. What on Earth is going on here, and what does this have to do with 1/√x?
The Importance of the Inverse Square Root
When dealing with physics, namely reflections, it’s important that vector lines in the game are normalized to lengths of one. The vector lines we’re talking about here are the ones that are normal to the surfaces of polygons. If these vectors aren’t normalized, then projectiles or lighting that reflect off the surface will not do so in ways that make sense physically.
In order to normalize a vector to a length of one, you just need to divide its length by its length, but what is its length? Well, if you think back to your high school algebra class, you may remember the Pythagorean theorem, a² + b² = c². This is used to calculate the length, c, of the hypotenuse of a right-angle triangle, but it can also be used to calculate the length of a vector in three dimensions: a² + b² + c² = d². This calculation is done first to find d². d is the length of the vector, thus when it is normalized, it is dividing the length by √(d²). By letting d² = x, we are thus looking to find 1/√x, which we will multiply our vectors by to normalize.
Why is This Hard to Compute?
Truth be told, it’s not hard to compute the inverse square root of a number at all, but it’s very slow. Division and square roots are both very computationally heavy, due to the nature of how they are performed. A simple division question, say n/m, is really saying “how many times do I need to remove m from n until I get zero?”. The computer needs to check after every iteration to see if it has passed zero, whereas with something like multiplication, say n*m, it can just add m to itself n times without needing to make additional checks in between. Square roots are even worse, as they use Newton’s method of calculating. To summarize without getting too technical, it involves taking an initial guess close to the expected answer (taken from a look-up table) and using a lot of iterative calculus to converge quadratically closer to the positive root of an x² = r function. Don’t stress if that went over your head; it’ll come back up later in detail.
The easiest operations for computers to handle are addition and subtraction. They are both single-step processes with no checks or iterations needed, so in theory, if the inverse square root of a number could somehow be calculated using just these easy processes (no division or square rooting allowed), then all of our problems would be solved. In fact, this is exactly what the Quake III algorithm is doing to approximate the value of the inverse square root at more than triple the speed, with an error less than a single percentage point.
The Code Broken Down
Let’s start from the very beginning of the algorithm:
float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F;
The top line, “float Q_rsqrt( float number )” is defining the function to take a single number, named “number”, as an input. This is the number whose inverse square root will be taken. Within the function, it defines a variable, i, to be a long, which is a 32-bit integer. It then defines two 32-bit decimal numbers (floats), x2 and y. Finally, it defines a constant float named “threehalfs”, equal to 1.5.
All of these defined variables have their values represented in binary. Binary is best explained with an example, so for those who don’t know how to represent integers in binary, consider this:
11010 = 1*(2⁴) + 1*(2³) + 0*(2²) + 1*(2¹) + 0*(2⁰) = 16 + 8 + 0 + 2 + 0 = 26
Each number (bit) represents a power of two, with an exponent one greater than the bit to its right. Every one is “on”, while every zero is “off”. All of the “on” bits are then added together to get your final value. The beauty of this is that any positive integer can be written as a sum of powers of two. But what about the floats, x2 and y? They aren’t integers, so how do we handle those numbers?
The notation used to represent decimals is called IEEE 754, which has the following rules:
- The first bit represents the sign of the number. If it’s zero, it’s positive. If it’s one, it’s negative. In the case of an inverse square root, the number has to be positive, so it’s always going to be zero.
- The next eight bits represent an exponent, which we will call E. E is an integer, and the eight bits craft a number using the same binary addition previously explained. With eight bits, numbers between zero and 255 can be created. We want to allow for potential negative exponents though, so this eight-bit number is shifted down by 127, thus the actual range of values is -127 to 128. This value will be used as a power of two.
- The final 23 bits are called the mantissa, represented by M. Think of these 23 bits as yet another integer, which can take on any value between zero and 2²³ – 1. Instead, however, we are going to take the 23-bit number and divide it by the maximum value of 2²³ – 1, thus the mantissa takes on values from zero to one. However, we actually need the numbers to go from one to two, so we’re choosing to add a constant one to our number. The final value we get from these 23 bits is being multiplied by our power of two.
- The decimal number being represented is thus (1 + M/2²³)*2E-127.
x2 and y are both floats of this form, as is threehalfs. Now, imagine we take this 32-bit float and take the base-two logarithm of it. Don’t worry about why yet; it’ll make sense. Using some laws of logarithms:
log2((1 + M/2²³)*2E-127) = log2(1 + M/2²³) + log2(2E-127) = log2(1 + M/2²³) + E – 127
The trick to further break this down involves an approximation, the approximation being that log2(1 + x) is approximately equal to x, for small values of x. This means that log2(1 + M/2²³) + E – 127 ≈ M/2²³ + E – 127. This is a good approximation, but it could be better. Look at the graph below of log2(1 + x) and x. They are equal at one and zero, but log2(1 + x) is greater than x at every other point. We can choose to shift the linear function slightly upward to make the average approximation even closer. The amount that we shift it up by is 0.04504656:
Our approximation for the 32-bit decimal is now M/2²³ + E – 127 + 0.04504656 . A little bit of algebra and this can become (1/2²³)(E*2²³ + M) – 126.9549534. If you are super observant, you will notice that E*2²³ + M is exactly what our 32-bit string would be valued as, if it were interpreted as an integer. Imagine E is 13, or in 8-bit notation, 00001101. By multiplying it by 2²³, we are really just shifting it 23 bits to the left, thus it would become 0000110100000000000000000000000. By then adding M, a 23-bit number, we are effectively just replacing the trailing 23 zeroes with our mantissa. We can then stick an extra zero onto the very beginning of our string without changing the value it represents. This will match our original 32-bit string, as it always features a zero as the first bit.
Why is this important? Well, it tells us that by taking the logarithm of our float, it will return an approximation of the integer interpretation, just scaled by a factor of 1/2²³ and shifted down by 126.9549534. To reiterate, the integer interpretation of the 32-bit string is the logarithm of the float representation of that same string, just scaled and shifted. This is very important
Let’s get back to the code.
x2 = number * 0.5F; y = number; i = * ( long * ) &y; i = 0x5f3759df - ( i >> 1 ); y = * ( float * ) &i; y = y * ( threehalfs - ( x2 * y * y ) ); //y = y * ( threehalfs - ( x2 * y * y ) ); return y; }
Ignore the first line just for now, and focus instead on the second through fifth lines. The float, y, is being set equal to the number inputted into the function, and then being converted to a long. This is done by converting the memory address of the float to be that of the variable i. This causes the number to no longer be interpreted as a decimal with a sign bit in front, eight bits for an exponent, and a 23-bit mantissa, but instead, it is interpreted as a 32-bit integer whose ones and zeroes match one-to-one to what it was before the relocation. Remember, though, we have already proven that this 32-bit integer is the logarithm of its float interpretation, but scaled and shifted.
The code will now interpret and read the inputted float as though it were a long. This is incredibly useful, because when dealing with integers in binary, we can double or half the value by simply shifting the string to the left or right. For example, if we shift 1110 to the left, we get 11100. This doubles the value from 14 to 28. If we shift 1110 to the right, we get 111, which halves it from 12 to six. This is called bit-shifting, and allows us to sneakily get around the computational heaviness that would normally come with dividing by two. Here’s why we want to do that:
We are calculating log2(1/√y), which can be written as log2(y-1/2). Per the laws of logarithms, this can be rewritten as -(1/2)*log2(y). As you can see, we are taking the negative logarithm of y and dividing it by two, but this division is accomplished by bit-shifting, so it’s not computationally damning. This is exactly what the “- ( i >> 1 )” code is doing, but what is that absolute mess to the left of it?
log2(1/√y) = -(1/2)*log2(y), so let’s insert log2(y) = (1/2²³)(E*2²³ + M) – 126.9549534 into it:
log2(1/√y) = -(1/2)*((1/2²³)(E*2²³ + M) – 126.9549534). The left side of the equation can also be written in that expanded form, as it’s just another number in a logarithm. It would just have different values for its exponent and mantissa, which we’ll call X and S respectively:
(1/2²³)(X*2²³ + S) – 126.9549534 = -(1/2)*((1/2²³)(E*2²³ + M) – 126.9549534). It’s time for a bit of algebra:
- (1/2²³)(X*2²³ + S) = -(1/2)*((1/2²³)(E*2²³ + M) – 126.957) + 126.9549534
- (1/2²³)(X*2²³ + S) = -(1/2)*(1/2²³)(E*2²³ + M) + (3/2)*126.9549534
- X*2²³ + S = (3/2)*126.9549534*2²³ – (1/2)*(E*2²³ + M)
The right-hand side has two terms. The second term, -(1/2)*(E*2²³ + M) is accomplished with the bit-shift, while the first term is literally just a constant. (3/2)*126.9549534*2²³ = 1597463007, or in hexadecimal representation, 0x5f3759df. This is the “magic number”, and is why the “i = 0x5f3759df – ( i >> 1 );” line is what it is. Why the developers chose to not name this constant, but did name 1.5F “threehalfs”, which is longer and harder to type, and also spelled wrong, we may never know.
The result is an integer equal to X*2²³ + S, which is then sent back to the memory address of the original float to be interpreted as a decimal. Remember, the logarithm of the decimal interpretation is approximately equal to the integer, shifted and scaled. We just applied this shift and scale to the integer in reverse. X*2²³ + S is the logarithm of the integer, and by reading it as a float, we are eliminating that logarithm and isolating 1/√y, thus the Quake III algorithm has approximated the inverse square root.
There’s one more step to go over. Remember earlier when Newton’s method was briefly discussed? An initial guess, known to be close to the true value of the inverse square root is taken, and a calculus process is iterated to converge to a better approximation. Well, we have an initial guess now, and we’re so close to the actual value that a single iteration of Newton’s method is enough to get a great approximation. Newton’s method is done by taking a value at some point of a function near a root (where it is equal to zero) and calculating the derivative at that point. In other words, we are drawing a tangent line to the graph at that x-coordinate
This creates a right triangle with a base length we call dx, and a height we call dy. You’ll notice that the top left corner of the triangle is much closer to the true root of the function than our guess is. The slope (the rise over run) is dy/dx, thus the length dx will be equal to the height divided by the slope, as dx = dy/(dy/dx). dy is, by definition, f(x) at our initial guess, while dy/dx is, by definition, the derivative at f(x), expressed as f'(x). By subtracting f(x)/f'(x) from our initial guess, we will thus find the x-value of the corner of our triangle, closer to the true value we want.
This process could be iterated infinitely, but the Quake III algorithm only does it once. The line “y = y * (threehalfs – ( x2 * y * y ) );” is applying Newton’s method for the function f(y) = 1/y² – x. This is because if y is a root, i.e. f(y) = 0, then 0 = 1/y² – x, which is the same as saying y = 1/√x. The function y = y * (threehalfs – ( x2 * y * y ) ) is used, because y * (threehalfs – ( x2 * y * y ) ) is equal to y – f(y)/f'(y) for f(y) = 1/y² – x. x is a constant defined to be half the inputted number earlier in the code (the half comes from the “1/2” that appears when you work out y – f(y)/f'(y)). As this function is predetermined, the division in Newton’s method is never actually computed. Only subtraction and multiplication are used in this step to better approximate the inverse square root to within a 1% error.
In summation, this ingenious code has managed to find a way to determine the inverse square root of a number, using nothing more than subtraction, bit-shifting, and a little multiplication. No division or square roots were used, and the computation load was significantly decreased, to allow for the game to normalize its vectors and run significantly more smoothly. For a comparison between using the fast inverse square root method and not using it, watch the following video. You can see how borderline unplayable the game becomes when the Quake III algorithm is removed.
The Mystery Behind the Quake III Algorithm
The origins of the Quake III algorithm have a long history. It was first published by id Software co-founder John Carmack when the source code was released to the public, so many attributed him as the creator at first. However, Rys Somerfaldt, the Senior Manager of the European Game Engineering Team at AMD RTG, who launched the investigation into the function’s origins, asked Carmack about this, and Carmack suggested that it was instead Terje Mathisen, another programmer at id Software who wrote it. Mathisen stated that he had written an inverse square root function in the past, but it wasn’t the one in question. Mathisen directed the investigators to MIT.
More conversations eventually lead Somerfaldt to NVIDIA, particularly a programmer named Gary Tarolli. Tarolli acknowledged that he had used and seen the function on numerous occasions, but was also not the original author. Somerfaldt went on to publish his findings, with the sad conclusion that the origins of the function were still unknown. The article, fortunately, gained enough traction to catch the attention of the original creator, Greg Walsh. Walsh wrote the algorithm while working at Ardent on the Titan graphics minicomputer in the 1980’s, working alongside Cleve Moler, the author of MatLab. Cleve took inspiration from code written by Velvel Kahan and K. C. Ng in 1986, which used the same trick, but with much more extensive code.
So the Quake III algorithm was written by Greg Walsh, though the mathematical trickery behind it goes back a bit further. Whether it was Kahan and Ng that figured out that trick is unknown, or if they took inspiration from someone else or some other code will likely never be known. While we know who wrote the algorithm itself, the origin of the mathematical trick itself remains a mystery.
There is but another unexplained mystery behind this code, and that is where the magic number, 0x5f3759df comes from. If you remember, this number was equal to 1.5*(127 – 0.04504656)*2²³, where 0.04504656 was the constant that the linear function was shifted up by, to make the log(1+x) ≈ x approximation better. The weird thing is that 0.04504656 is not the optimal number to shift by. That number would be 0.04503329. Where the magic number came from is totally unclear, and it seemingly was pulled out of nowhere, though it was likely due to a calculation error. Regardless, it’s close enough that the error is effectively negligible.
Conclusion
While this method of normalizing vectors is long outdated, it serves as a unique mathematical artifact in gaming culture that has no real comparison. Outdated or not, the math behind the algorithm is equally as ingenious now as it was then. Though the algorithm itself predates the game by about a decade, it was the release of this source code that launched it into the mainstream and sparked communal interest and investigations. The humor of the confusion-filled code comments only add to the greatness of the Quake III algorithm, and the sheer importance of the algorithm to the quality and playability of the game make it legendary.
Quake III Arena is available to play for PC, Dreamcast, Xbox 360, and PlayStation 2.
ncG1vNJzZmiZpKmupLfOn6uhnZaWu6O72GeaqKVflr%2B1tcKlnKxnpJ2ybr3UmqKeZZmetm6ty6Cmq6GknbpuwMeaq2aclZu2pr%2BMppitoF2axbG4wKKlnpxf