Performance Numbers | Samuel Holloway

Here is a question for you. What is 1 + 1? It’s not a trick question. The answer is 2. If you add up whole numbers you get another whole number.

How about 0.1 + 0.1? Once again, not a trick question. The answer is 0.2. But what about 0.2 + 0.1? This one is tricky. The answer depends on how you calculate it.

If we use a pencil and paper 0.2 + 0.1 is 0.3. But, if we use a computer the answer is not so clear cut.

Number Problems#

In a Python REPL try running the below code snippet.

print(0.2 + 0.1)

You’ll see that the value written to STDOUT is 0.30000000000000004. So what’s up? Is it a bug with Python?

Try it in just your terminal.

echo $((0.2 + 0.1))

Same result. What’s the deal?

In computerland, 0.2 + 0.1 == 0.30000000000000004 is an example of a floating point error. The way of representing Real numbers doesn’t align with the actual mathematical properties.

A number like 0.3 is typically stored by a computer program as a floating point value. The number is stored in a fixed space of memory. The fixed nature can result in overflow errors and loss of precision.

This has been written about extensively so I’ll refer you to better resources for all the gory details.

You’ll Float Too#

If you program long enough you’ll eventually get burned by inaccuracies in floating point calculations. I was reminded of this recently while writing unit tests for my 3D engine.

I was testing calculating the length of a 3 dimensional vector using the formula, for a Vector V, composed of components i, j, and k:

$$ Length(V) = sqrt(V_i^2 + V_j^2 + V_k^2) $$

This formula can be implemented to to work in multiple dimensions (1D, 2D, 3D, 4D, etc) like so.

from functools import reduce

class Vector:
    """
    Example Vector class that supports working 
    in multiple dimensions.
    """
    def __init__(self, *components: float) -> None:
        self._components: list[float] = list(*components)

    def length(self) -> float:
        """Calculates the length of the vector."""
        sq_comps_sum: float = reduce(
            lambda a, b: a + b**2, self._components, 0
        )
        return math.sqrt(sq_comps_sum)

However, things get interesting when attempting to verify the implementation is correct.

class TestVector:
    def test_vector_length(self):
        # Note: The below line doesn't compile.
        assert Vector(1,1,1).length() == ?

Hmm…. Everything is straightforward until I try to assert the result of calculating the length of the vector. This is because the length of Vector(1,1,1) is the square root of 3, which is an irrational number. It trails on forever.

This is just one example of the type of gotchas that arise when dealing with floats.

Computer graphics make heavy use of arithmetic, square roots, trigonometric functions, and logarithms. It’s not possible to program graphic routines and avoid math. So what to do?

In the good ol’ days engine developers would leverage fixed-point arithmetic to reduce the issues that come with floating point. The gist is 3D software used to represented real numbers with a scheme that represents the number in multiple parts.

The Sign (+/-)
The Whole Part (left of the decimal point)
The Fractional Part (right of the decimal)

For example, the early ID Software games used a 32-bit integer space to represent numbers. The 32-bits where divided into two parts. The upper 16 bits store the whole part and sign. The lower 16 bits store the fractional part.

Consider that a bit can store two values, 0 and 1. Id Software took advantage of this property by realizing the following. $$ 2^{16} = 65536 $$ Which means that a 16-bit memory space can store numbers smaller than 65536.

To convert a number to a 16-bit, fixed-point number we just multiply it by 65536. Incidentally, this has the same affect as shifting the bits by 16.

# Where x is a whole number (0, 1, 2, 3...).
x * 65_536 == x << 16
# Returns True

The number 1.5 becomes 98,304 (i.e. 1.5 * 65,536) . This scheme reduces some of the challenges of working with floating point numbers but introduces new ones. Not all numbers fit in the fixed space. Also working with fixed-point numbers makes programming harder.

The cognitive load of the programmer is increased by having to deal with the nuances of the various mathematical operations.

Given what a pain in the butt this is, it was inevitable that fixed-point eventually fell out of fashion in 3D programming.

If you’re interested in this sort of thing, chapter 63 of Michael Abrash’s book Graphics Programming Black Book provides a bit of history for how the optimization of CPU instructions for floating point arithmetic led 3D programers to eventually abandon fixed-point techniques for floating point.

To my knowledge, modern game consoles and engines now leverage floating point for spatial calculations. How do they avoid errors?

Comparing Floats#

In practice, working with floats isn’t too bad. Challenges don’t arise until we need to compare two values. For example, depth tests or frustum culling. In these use cases the trick is to not compare the two values directly but rather take their difference and compare that to some threshold.

In Python the math.isclose() function is the way to do that in production code. For unit tests pytest provides the approx() function for asserting on floating point values.

With this knowledge, I can now test my Vector class.

from pytest import approx

class TestVector:
    def test_vector_length(self):
        assert Vector(1,1,1).length() == approx(1.73205) 
        # Evaluates to True

Alternatives to Floats#

Now, I should have probably stopped here. For my needs using Math.isclose and pytest.approx is good enough. However, I got curious. Could a different mathematical type work better for my engine?

For code running on the CPU, higher level languages offer multiple options. CPython has the following numeric types.

Type	Number Range	Examples
int	+- Infinity	x: int = 1
float	+- 1.7976931348623157e+308	x: float = 1.0
complex	N/A	x: complex = complex(‘1+5j’)
Decimal	+- 999999999999999999	x: Decimal = Decimal(1e-134)
Fraction	+- Infinity	x: Fraction = Fraction(1,2)

I’ve already established the trade-offs between using integers (via fixed-point) and floats. What about float vs Decimal vs Fraction?

The Cost of Precision#

The Decimal and Fraction numeric types don’t have the same challenges that floats do. However, the numerical precision comes at the cost of performance.

The package pytest-benchmark can be used to compare the performance of mathematical operations on the various numeric types.

Here is an example of how to compare the performance of addition.

import pytest
from decimal import Decimal, getcontext
from fractions import Fraction

# Set the Decimal precision to be 6. Why 6? 
# The number doesn't matter. 
# I just need it to be specific for benchmarking.
getcontext().prec = 6

class TestAddition:
    @pytest.mark.benchmark(group="Test Addition",disable_gc=True)
    def test_ints(self, benchmark):
        benchmark(add, 7, 9)

    @pytest.mark.benchmark(group="Test Addition",disable_gc=True)
    def test_floats(self, benchmark):
        benchmark(add, 7.0, 9.0)

    @pytest.mark.benchmark(group="Test Addition",disable_gc=True)
    def test_decimals(self, benchmark):
        benchmark(add, Decimal(7), Decimal(9))

    @pytest.mark.benchmark(group="Test Addition",disable_gc=True)
    def test_fractions(self, benchmark):
        benchmark(add, Fraction(3, 4), Fraction(1, 3))

When this benchmark test is run, each operation is run thousands of times. The performance timings are aggregated across the iterations.

The below table shows the average and median performance in nanoseconds. The numbers in () are the ratio between the rows. A value of (2) for example means that row was 2x slower than the row with a value of (1).

Type	Mean (ns)	Median (ns)
Int	33.8687 (1.0)	33.8866 (1.0)
Float	35.9602 (1.06)	35.3449 (1.04)
Decimal	56.9352 (1.68)	55.4201 (1.64)
Fraction	768.7710 (22.70)	750.0057 (22.13)

The table above shows that, on average, the time spent adding ints and floats is relatively the same while adding two fractions is about 22 times slower than doing the same operation with integers.

In this manner, I compared the performance of Addition, Subtraction, Multiplication, Division, Ceil, Floor, Round, Truncate, Initialization, Create From String, and Equality.

Here are the mean performance numbers for the arithmetic operations and square root.

Operation	Int (ns)	Float (ns)	Decimal (ns)	Fraction (ns)
Addition	33.8687 (1.0)	35.9602 (1.06)	56.9352 (1.68)	768.7710 (22.70)
Subtraction	33.0952 (1.0)	34.7292 (1.05)	55.1926 (1.67)	687.7079 (20.78)
Multiplication	34.3719 (1.0)	35.3614 (1.03)	54.6882 (1.59)	810.8374 (23.59)
Division	37.5791 (1.04)	36.0212 (1.0)	81.8910 (2.27)	707.5471 (19.64)
Square Root	46.1888 (1.22)	37.7584 (1.0)	153.9302 (4.08)	235.7727 (6.24)

The conclusion is that, on average, the performance of working with floats and ints is nearly identical. Decimal is faster than fractions in every case except for truncating.

Fraction is always at least an order of magnitude slower than ints and floats on all operations.

But don’t take my word for it. The benchmark test and raw performance data is here.

What does this mean? Are fractions too slow for general everyday use? Not at all. The average time of adding two fractions on my machine is 768 nanoseconds. That is still only 0.000000786 of a second (7.68e-7). However, in the context of real-time 3D simulations floats are the way to go.

Spatial Types#

Once again, I probably should have stopped at this point. However, I was having fun.

My engine is standardizing on floats. I’m using floats for everything. But… What would it be like if I used Decimal or Fractions?

Can I create a Vector class that supports floats, Decimals, and Fractions?

Totally. However, the challenge isn’t just vectors. There is a topology of spatial types that my little engine needs to work with.

Type	Purpose
Coordinate	A position in space.
Vector	A direction or physical property like velocity.
Matrix	A collection of numbers that can represent a spatial transformation.
Quaternion	Complex numbers that are used in spatial rotations and other operations.

The various spatial types are used together in a variety of combinations. For example, a matrix and vector can by multiplied together.

# Define a 4x4 translation matrix that moves anything multiplied
# by it +5 units along the x-axis, +2 units along the y-axis, 
# 3 and +1 units along the z-axis
m: Matrix = m4(
    1.0, 0.0, 0.0, 5.0, 
    0.0, 1.0, 0.0, 2.0, 
    0.0, 0.0, 1.0, 1.0, 
    0.0, 0.0, 0.0, 1.0
)

# Create a vector from two Coordinates.
# The vector will be of the direction: end_point - start_point
start_point = Coordinate(14.0, 7.0, 92.0)
end_point = Coordinate(13.0, 7.0, 0.0)

direction: Coordinate = end_point - start_point
v: Vector = vector(*direction, 1.0)
# Creates Vector4d(-1.0, 0, -92.0, 1.0)
# The extra 1.0 component makes the 3D vector homogeneous which 
# allows multiplying it with a 4x4 matrix.

# Translate the vector by multiplying it with the matrix.
translated_v: Vector = m * v

You may notice, that in the example above, all the numbers are explicitly declared as floats.

Combining numeric types (like multiplying a float and an int together) can result in the output of the operation being in a different type than expected.

For example, multiplying the fraction 1/2 by 2 results in the value of 1. But, check this out.

from fractions import Fraction

type(Fraction(1,2) * 2)
# <class 'fractions.Fraction'>

type(Fraction(1,2) * 2.0)
# <class 'float'>

Mixing a float with a different precision type results in a float. This is true for combining a float with an int or a Fraction. However, floats cannot interact with decimals. Defining spatial types that work with the various Python numeric types is not straightforward.

With that in mind, I decided to implement spatial types that are generic but don’t allow combining spatial types defined with different numeric types.

Being Generic#

My little engine implements coordinates, vectors, and matrices in a way that allows defining them agnostic of the dimension they’re in (1D, 2D, 3D, or 4D).

This is demonstrated below with the initialization method of the Coordinate class.

class Coordinate:
    """
    A coordinate represents a location in a coordinate space.
    It can be of any number of dimensions. 
    (e.g. 1D, 2D, 3D,...ND)
    """
    def __init__(self, *components: float) -> None:
        self._components: tuple[float, ...] = components

This makes it easy to spin up a coordinate.

# 1D: A coordinate on a line. For example between 0 and 1.
a = Coordinate(0.145)

# 2D: A coordinate on a plane or a location in a flat grid.
b = Coordinate(14, 5)

# 3D: A coordinate in 3D space.
c = Coordinate(172.456, 32.0, -78.26)

# 4D: A homogenous coordinate.
d = Coordinate(172.456, 32.0, -78.26, 1.0)

From a design perspective, I really like that. The class design enables me to think in the dimension I’m trying to work in and I don’t need to select an appropriate child class (Coordinate1d, Coordinate2d, etc).

As cool as that is, it comes at a complexity cost. Logically, spatial types of different dimensions shouldn’t be combined.

# This is Ok (1D + 1D).
new_coord: Coordinate = Coordinate(4) + Coordinate(32) 

# This is invalid (1D + 2D).
new_coord: Coordinate = Coordinate(4) + Coordinate(32, 73)

Preventing Mixing Dimensions
I choose to use a decorator to prevent scenarios like the above from happening.

The below snippet defines a custom error class and a decorator that raises an error if the passed value doesn’t have the same number of dimensions as the instance.

from functools import wraps

class CoordinateError(Exception):
    """
    An error that occurred while working with a coordinate.
    """
    def __init__(self, *args: object) -> None:
        super().__init__(*args)

def coordinates_must_be_same_size(func):
    """
    A decorator that guards against using another 
    coordinate of a different size.
    """
    @wraps(func)
    def _guard(*args, **kwargs):
        self: Coordinate = args[0]
        other: Coordinate = args[1]
        if len(self) == len(other):
            return func(*args, **kwargs)
        else:
            error_msg = 
                f"Coordinates must be the same dimension."
            raise CoordinateError(error_msg)

    return _guard

Here is how the decorator is used in practice.

import itertools

class Coordinate:
    @coordinates_must_be_same_size
    def __add__(
        self: Coordinate, 
        other: Coordinate
    ) -> Coordinate:
        """Add two coordinates with the + operator."""
        sums = itertools.starmap(
            operator.add, 
            zip(self, other))
        return Coordinate(*sums)

This is cool. Now, when I goof up and try to add coordinates of different dimensions an error will be raised.

Supporting Numeric Types
At this point, the Coordinate class is useful but only works with floats. To address this, I use the combination of TypeVar and Generic to make the class work with all of the appropriate numeric types.

from decimal import Decimal
from fractions import Fraction
from typing import Generic, TypeVar

# Define a custom type variable. 
# This is used by static type checkers (e.g. mypy) to verify 
# that a passed in value is either an int, float, Fraction, 
# or Decimal.
NumericType = TypeVar(
    "NumericType", 
    int, 
    float, 
    Fraction, 
    Decimal)

# Update the Coordinate class to be generic. 
class Coordinate(Generic[NumericType]):
    def __init__(self, *components: NumericType) -> None:
        self._components: tuple[NumericType, ...] = components

    @coordinates_must_be_same_size
    def __add__(
        self: Coordinate[NumericType], 
        other: Coordinate[NumericType]
    ) -> Coordinate[NumericType]:
        sums = itertools.starmap(
            operator.add, 
            zip(self, other))
        return Coordinate[NumericType](*sums)

This is great. I can now declare coordinates in multiple dimensions and of various numeric types. There just one problem. The __init__ method allows mixing numeric types.

Enforcing Homogenous Parameters
Another decorator can be used to enforce that an instance method only permits a single type for all the parameters.

def must_be_homogeneous(func):
    """
    A decorator that forces all parameters to 
    have the same type.

    Note: This is intended to be used on instance methods. 
    So the first parameter is expected to be the instance 
    of the class (i.e. "self"). With that convention, the 
    type of the second parameter is determined and then 
    used to check against the remaining types.
    """
    @wraps(func)
    def _guard(*args, **kwargs):
        first = args[1] # Skip "self" on the instance method.
        others = args[2:]

        expected_type = type(first)
        for value in others:
            if type(value) != expected_type:
                error_msg = "Cannot mix parameter types."
                raise NumericTypeError(error_msg)

        return func(*args, **kwargs)

    return _guard

And here is how it is used.

# Update the Coordinate class to be generic. 
class Coordinate(Generic[NumericType]):
    @must_be_homogeneous
    def __init__(self, *components: NumericType) -> None:
        self._components: tuple[NumericType, ...] = components

Now when a coordinate is initialized all of the dimension values must be the same type. That just leaves the final challenge of preventing coordinates of different types from interacting with each other. Once again a decorator can be used.

def coordinates_must_have_same_type(func):
    """A decorator that guards against using another coordinate of a different type.
    Prevents mixing integers, floats, and fractions.
    """

    @wraps(func)
    def _guard(*args, **kwargs):
        self: Coordinate = args[0]
        other: Coordinate = args[1]

        # Look at the first type in this coordinate. If any of the types are different
        # raise an error.
        expected_type = type(self[0])
        for value in itertools.chain(self, other):
            if type(value) != expected_type:
                error_msg = (
                    "Cannot mix coordinates of different types.",
                    f"Attempted to mix {expected_type.__name__} with {type(value).__name__}",
                )
                raise CoordinateError(error_msg)

        return func(*args, **kwargs)

    return _guard

Now I can easily define coordinates of type int, float, Decimal, or Fraction and prevent errors that arise from mixing coordinates of different dimensions or types. This is what the final implement of coordinate addition looks like.

class Coordinate(Generic[NumericType]):
    """
    A coordinate represents a location in a coordinate space.
    It can be of any number of dimensions (e.g. 1D, 2D, 3D,...ND)
    """
    @must_be_homogeneous
    def __init__(self, *components: NumericType) -> None:
        self._components: tuple[NumericType, ...] = components

    @coordinates_must_have_same_type
    @coordinates_must_be_same_size
    def __add__(
        self: Coordinate[NumericType], 
        other: Coordinate[NumericType]
    ) -> Coordinate[NumericType]:
        sums = itertools.starmap(operator.add, zip(self, other))
        return Coordinate[NumericType](*sums)

The general pattern can be repeated for vectors and matrices.

But Isn’t it Slow?#

The fastest code is the code that doesn’t run at all. Given that, it makes sense that adding decorators and generics will slow the Coordinate implementation down. The question then is, how much slower?

To determine this I can once again use the pytest-benchmark package to compare the performance of a vanilla Coordinate implementation with another one using all of the decorators.

Here are the results looking at just the mean time spent. Note that all of the timings are listed in nanoseconds.

Operation	Baseline (4D)	1D	2D	3D	4D
Initialization	162.7285 (1.0)	227.1296 (1.40)	261.0954 (1.60)	281.2447 (1.73)	303.2785 (1.86)
Addition	628.6151 (1.0)	1,632.9629 (2.60)	1,675.3439 (2.67)	1,786.2362 (2.84)	1,836.5976 (2.92)
Subtraction	550.7952 (1.0)	1,663.3733 (3.02)	1,654.2672 (3.00)	1,779.0331 (3.23)	1,880.1373 (3.41)
Multiplication	616.8384 (1.0)	1,622.5777 (2.63)	1,657.9908 (2.69)	1,741.7661 (2.82)	1,871.4839 (3.03)

The table above shows that adding the decorators comes at the cost of slowing down operations by sometimes as much as 3x. That sounds like a lot but keep in mind that the timings are in nanoseconds. The slowest timing in the table is 1,871.4839 ns. That is 0.0000018714839 seconds. It’s slower but it’s still pretty damn fast.

Wrapping Up#

Now, all of the above is relevant if computation is occurring on the CPU. In a game engine however, that is often not the case. The current trend in real-time graphics is to do the majority of mathematical operations on the GPU.

As I’m working through building my little engine the biggest challenge for me has been slowly shifting from programming for the CPU to the GPU. Designing an application architecture that leverages all of the hardware available in an efficient yet extensible way is nontrivial.

After working through all of the above my big takeaways are:

On hardware that supports it, floats beat everything else for pure performance.
Don’t compare floats directly. Rather use library functions to test the difference within a threshold.
It’s not hard to have generic spatial types but it comes at a significant performance cost.
The trend in real-time computing is to shift mathematical processing to the GPU to maximize parallel processing. The challenge shifts from how can I do this efficiently on the CPU to how can I efficiently get this on the GPU, compute it, and then get the results back to the CPU?

Until next time…