While playing with OpenGL, I was facing the situation where the common way to use matrices can quickly lead to CPU overload.

If you know the 3 common transformations in 2D (in homogeneous coordinates):

you also surely know that you can compose with these simple transformations to build more complex transformations. That is simply done with the use of matrix multiplication.

For example, if you have a sprite that you want to rotate and scale before translating it to have its own center exactly at the center of the screen. You would simply build 4 simple matrices T1, S, R, T2 and do the multiplication to obtain the final matrix M:

M = T2 * R * S * T1

with:

T1 the translation matrix moving your sprite to match his local center with the origin

S a scale matrix

R a rotation matrix

T2 a translation matrix moving your sprite at the center of the screen

That is a very common way to build complex transformations, but when you need to deal with thousands of them, you quickly overload your CPU.

Why ? Because generic matrix 3x3 multiplication involves 27 scalar multiplications and 18 scalar additions, as shown in the code below to compute C = AxB:

C[0][0] = A[0][0] * B[0][0] + A[1][0] * B[0][1] + A[2][0] * B[0][2];

C[1][0] = A[0][0] * B[1][0] + A[1][0] * B[1][1] + A[2][0] * B[1][2];

C[2][0] = A[0][0] * B[2][0] + A[1][0] * B[2][1] + A[2][0] * B[2][2];

C[0][1] = A[0][1] * B[0][0] + A[1][1] * B[0][1] + A[2][1] * B[0][2];

C[1][1] = A[0][1] * B[1][0] + A[1][1] * B[1][1] + A[2][1] * B[1][2];

C[2][1] = A[0][1] * B[2][0] + A[1][1] * B[2][1] + A[2][1] * B[2][2];

C[0][2] = A[0][2] * B[0][0] + A[1][2] * B[0][1] + A[2][2] * B[0][2];

C[1][2] = A[0][2] * B[1][0] + A[1][2] * B[1][1] + A[2][2] * B[1][2];

C[2][2] = A[0][2] * B[2][0] + A[1][2] * B[2][1] + A[2][2] * B[2][2];

Building M involved 3 matrix multiplications (T2 * R * S * T1), so

That could seem really complicated to set directly a single matrix that is the result of many matrices composition. But in fact that is not, because the 3 common transformation matrices (translate, rotation, scale) contain a lot of 0 and 1.

For example, here is the matrix M baked for T2 * R * S * T1:

with:

t1x, t1y : the 1st translation

sx, sy

a

t2x, t2y : the 2nd translation

This matrix looks complicated, but

Here is my code using float as scalars:

void mat3TRST(float& t2x, float &t2y, float &rot, float &sx, float &sy, float &t1x, float &t1y, Matrix3 &matRes)

{

float cRot, sRot;

cRot = cos(rot);

sRot = sin(rot);

matRes[0][0] = sx*cRot;

matRes[1][0] = sy*-sRot;

matRes[2][0] = t1x*sx*cRot + t1y*sy*-sRot + t2x;

matRes[0][1] = sx*sRot;

matRes[1][1] = sy*cRot;

matRes[2][1] = t1x*sx*sRot + t1y*sy*cRot + t2y;

matRes[0][2] = 0.0f;

matRes[1][2] = 0.0f;

matRes[2][2] = 1.0f;

}

How did I obtain that result ? I didn't bake the M matrix "by hand". Instead I used WolframAlpha for that.

My input was:

{{1,0,i},{0,1,j},{0,0,1}} . {{e,f,0},{g,h,0},{0,0,1}} . {{c,0,0},{0,d,0},{0,0,1}} . {{1,0,a},{0,1,b},{0,0,1}}

It is my T2 * R * S * T1 matrix composition, but using Wolfram syntax. From that I obtained the resulting matrix, still involving the constants a,b,c,d,e,f,g,h,i,j:

{{c e, d f, a c e + b d f + i}, {c g, d h, a c g + b d h + j}, {0, 0, 1}}

My job was only to copy/paste it in my code and replace the letters with the good variable names !

If you know the 3 common transformations in 2D (in homogeneous coordinates):

translation

scale

rotation

For example, if you have a sprite that you want to rotate and scale before translating it to have its own center exactly at the center of the screen. You would simply build 4 simple matrices T1, S, R, T2 and do the multiplication to obtain the final matrix M:

M = T2 * R * S * T1

with:

T1 the translation matrix moving your sprite to match his local center with the origin

S a scale matrix

R a rotation matrix

T2 a translation matrix moving your sprite at the center of the screen

That is a very common way to build complex transformations, but when you need to deal with thousands of them, you quickly overload your CPU.

Why ? Because generic matrix 3x3 multiplication involves 27 scalar multiplications and 18 scalar additions, as shown in the code below to compute C = AxB:

C[0][0] = A[0][0] * B[0][0] + A[1][0] * B[0][1] + A[2][0] * B[0][2];

C[1][0] = A[0][0] * B[1][0] + A[1][0] * B[1][1] + A[2][0] * B[1][2];

C[2][0] = A[0][0] * B[2][0] + A[1][0] * B[2][1] + A[2][0] * B[2][2];

C[0][1] = A[0][1] * B[0][0] + A[1][1] * B[0][1] + A[2][1] * B[0][2];

C[1][1] = A[0][1] * B[1][0] + A[1][1] * B[1][1] + A[2][1] * B[1][2];

C[2][1] = A[0][1] * B[2][0] + A[1][1] * B[2][1] + A[2][1] * B[2][2];

C[0][2] = A[0][2] * B[0][0] + A[1][2] * B[0][1] + A[2][2] * B[0][2];

C[1][2] = A[0][2] * B[1][0] + A[1][2] * B[1][1] + A[2][2] * B[1][2];

C[2][2] = A[0][2] * B[2][0] + A[1][2] * B[2][1] + A[2][2] * B[2][2];

Building M involved 3 matrix multiplications (T2 * R * S * T1), so

**the total for one sprite is 81 scalar multiplications and 53 scalar additions**. It is quite a lot when you have that amount of operations each frame for thousand of sprites.**The solution ? Bake your matrix !**That could seem really complicated to set directly a single matrix that is the result of many matrices composition. But in fact that is not, because the 3 common transformation matrices (translate, rotation, scale) contain a lot of 0 and 1.

For example, here is the matrix M baked for T2 * R * S * T1:

with:

t1x, t1y : the 1st translation

sx, sy

*:*the scale factorsa

*:*the rotation anglet2x, t2y : the 2nd translation

This matrix looks complicated, but

**it involves only 12 scalar multiplications and 4 scalar additions**. That is such an improvement if we compare with the 81 multiplications and 53 additions of the previous method. Putting that in your code instead of explicitly computing the result of T2 * R * S * T1 saves a lot of CPU resources.Here is my code using float as scalars:

void mat3TRST(float& t2x, float &t2y, float &rot, float &sx, float &sy, float &t1x, float &t1y, Matrix3 &matRes)

{

float cRot, sRot;

cRot = cos(rot);

sRot = sin(rot);

matRes[0][0] = sx*cRot;

matRes[1][0] = sy*-sRot;

matRes[2][0] = t1x*sx*cRot + t1y*sy*-sRot + t2x;

matRes[0][1] = sx*sRot;

matRes[1][1] = sy*cRot;

matRes[2][1] = t1x*sx*sRot + t1y*sy*cRot + t2y;

matRes[0][2] = 0.0f;

matRes[1][2] = 0.0f;

matRes[2][2] = 1.0f;

}

**Baking is easy**How did I obtain that result ? I didn't bake the M matrix "by hand". Instead I used WolframAlpha for that.

My input was:

{{1,0,i},{0,1,j},{0,0,1}} . {{e,f,0},{g,h,0},{0,0,1}} . {{c,0,0},{0,d,0},{0,0,1}} . {{1,0,a},{0,1,b},{0,0,1}}

It is my T2 * R * S * T1 matrix composition, but using Wolfram syntax. From that I obtained the resulting matrix, still involving the constants a,b,c,d,e,f,g,h,i,j:

{{c e, d f, a c e + b d f + i}, {c g, d h, a c g + b d h + j}, {0, 0, 1}}

My job was only to copy/paste it in my code and replace the letters with the good variable names !

## No comments:

## Post a Comment