# The Math Behind a Scanning App

### You only have to write the equations, SymPy does all the math for you

Join the DZone community and get the full member experience.

Join For FreeIn 2012, I made a simple camera app for Android that undoes the natural projection in a picture.

Of course, now we have all sorts of scanning apps on every phone, but at the time this was a novelty. Let's dissect the application and see what's it made of.

But first, a small recap of the projective transformation. The formula for the projective transformation is shown below.

x_{p} = (ax_{i} + by_{i} + c) / (gx_{i} + hy_{i} + i)

y_{p} = (dx_{i} + ey_{i} + f) / (gx_{i} + hy_{i} + i)

In homogeneous coordinates, all the polynomial components will become homogeneous too, and the denominator could be 1.

x_{p} = (ax_{i} + by_{i} + cw_{i}) / (gx_{i} + hy_{i} + iw_{i})

y_{p} = (dx_{i} + ey_{i} + fw_{i}) / (gx_{i} + hy_{i} + iw_{i})

w_{p} = 1

Unless the denominator is 0, and in real projective transformations it isn't, we can multiply all the coordinates by the denominator. Remember, (1, 0, 1) and (2, 0, 2) is the same point.

x_{p} = ax_{i} + by_{i} + cw_{i}

y_{p} = dx_{i} + ey_{i} + fw_{i}

w_{p} = gx_{i} + hy_{i} + iw_{i}

If you know the row-on-column rule, you probably see the resemblance with the matrix multiplication already. Indeed, in homogeneous coordinates, we can rewrite our projective transformation as matrix multiplication.

The projective is a four-point transformation so all the input we need is a picture, which the camera API provides us with, and four corner points of a rectangle we will ask the user to select.

Modern scanning apps do point selection automatically, although not always with great success. This automatic selection is an interesting problem on its own, but it has little to do with projective transformations so we'll set it aside for now.

Solving a system for an arbitrary 4 point transformation symbolically is a little problematic. It takes forever in SymPy, and the solution itself also appears to be unreasonably huge.

Of course, we always have an option to solve it numerically, but it's usually faster to compute things by simply running a ready-made formula than with some numeric solver. Let me show you how to get the formulas you want.

So, the scanning problem is as follows. For every pixel of a target rectangular picture, find the original pixel color from a photograph, given that the four corner points of the rectangular picture have their correspondent points in the picture from the camera.

This might seem like something that implies a convex quadrilateral to a rectangular transformation, but in fact, it's the opposite. For every pixel of the flat picture, we want to find where it comes from. The transformation is rectangular to the quadrilateral, not vice versa.

Now, why is this suddenly important? Because we can now call the standard cube for help.

A rectangular to standard cube transformation is just scaling. Let's say our rectangle has width “w” and height “h”. The transformation matrix that transforms its pixels onto the standard cube is this.

Now the second transformation – from the standard cube to the source picture is a little trickier. Let’s write down the system for SymPy to solve. In our input, x1, y1 ... x4, y4 are the initial points, xt1, yt1 ... xt4, yt4 are the transformed points, and a, b, c, d, e, f, g, h, i are the elements of the transformation matrix.

A four-point transformation gives us only 8 equations: two for each of the four points. Luckily, if we multiply a matrix by a number, it will still carry the same transformation. It’s scaling-proof. So let’s just pick a matrix with i == 1, and the rest will rescale automatically to form the transformation we want.

[

xt1 * x1 * g + xt1 * y1 * h + xt1 * i - x1 * a - y1 * b - c,

yt1 * x1 * g + yt1 * y1 * h + yt1 * i - x1 * d - y1 * e - f,

xt2 * x2 * g + xt2 * y2 * h + xt2 * i - x2 * a - y2 * b - c,

yt2 * x2 * g + yt2 * y2 * h + yt2 * i - x2 * d - y2 * e - f,

xt3 * x3 * g + xt3 * y3 * h + xt3 * i - x3 * a - y3 * b - c,

yt3 * x3 * g + yt3 * y3 * h + yt3 * i - x3 * d - y3 * e - f,

xt4 * x4 * g + xt4 * y4 * h + xt4 * i - x4 * a - y4 * b - c,

yt4 * x4 * g + yt4 * y4 * h + yt4 * i - x4 * d - y4 * e - f,

i - 1

], (a, b, c, d, e, f, g, h, i))

The system is large, sure, but let's apply our standard cube coordinates to it and see what will happen.

The exact order in which we pick our coordinates is irrelevant so this will do.

(x1, y1) = (0, 0)

(x2, y2) = (1, 0)

(x3, y3) = (1, 1)

(x4, y4) = (0, 1)

By applying these specific numbers to the equations we make them smaller and simpler.

[

xt1 * i - c,

yt1 * i - f,

xt2 * g + xt2 * i - a - c,

yt2 * g + yt2 * i - d - f,

xt3 * g + xt3 * h + xt3 * i - a - b - c,

yt3 * g + yt3 * h + yt3 * i - d - e - f,

xt4 * h + xt4 * i - b - c,

yt4 * h + yt4 * i - e - f,

i - 1

], (a, b, c, d, e, f, g, h, i))

This is a leaner version of the same system but, more importantly, it doesn't have to be a single system anymore! There are now 4 pieces we can solve one by one. Much faster, and with more compact solutions too.

There is the obvious “i”.

[i – 1 ], (i)

Then, with the known “i”, the first two equations become independent.

[xt1 * i – c] , (c)

and

[yt1 * i – f,], (f)

Then, knowing the "c" and "f", we can resolve the next two equations along with the two at the end in terms of "g" and "h".

c = xt1

f = yt1

i = 1

[xt2 * g + xt2 * i - a - c,

yt2 * g + yt2 * i - d – f,

xt4 * h + xt4 * i - b - c,

yt4 * h + yt4 * i - e – f] , (a, b, d, e)

The result will be rather compact:

{a: g*xt2 - xt1 + xt2, d: g*yt2 - yt1 + yt2, b: h*xt4 - xt1 + xt4, e: h*yt4 - yt1 + yt4}

And reusable! Let's finish up the whole system with these equations.

a = g*xt2 - xt1 + xt2

d = g*yt2 - yt1 + yt2

b = h*xt4 - xt1 + xt4

e = h*yt4 - yt1 + yt4

solution_3 = solve([

xt3 * g + xt3 * h + xt3 * i - a - b - c,

yt3 * g + yt3 * h + yt3 * i - d - e - f,

], (g, h))

The solution for this will be in "x"s and "y"s only. No other variables.

g: (xt1*yt3 - xt1*yt4 - xt2*yt3 + xt2*yt4 - xt3*yt1 + xt3*yt2 + xt4*yt1 - xt4*yt2)/(xt2*yt3 - xt2*yt4 - xt3*yt2 + xt3*yt4 + xt4*yt2 - xt4*yt3),

h: (xt1*yt2 - xt1*yt3 - xt2*yt1 + xt2*yt4 + xt3*yt1 - xt3*yt4 - xt4*yt2 + xt4*yt3)/(xt2*yt3 - xt2*yt4 - xt3*yt2 + xt3*yt4 + xt4*yt2 - xt4*yt3)

With these formulas, we can compute "g", and "h" numerically for every four points of the standard cube transformation, and use that to compute "a", "b", "d", and "e". Computing “c” and “f” from their simple formulas, and putting “i” as 1 will complete the solution.

So we now have a transformation that scales pixels to the points of the standard cube, and a transformation that transforms points from the standard cube to the source picture. We can compose them into a single transformation and then apply it to every point of the target rectangular picture to know where it should take its color from.

There are still a few less-relevant problems to solve.

First, how do we know the size of the rectangular picture? And the answer is, we don't. There is a projective transformation for every rectangular to every possible quadrilateral. Of course, most of them are improbable in practice, although I'm not sure if it's even grammatically correct to apply the word “most” to an infinite continuum.

So let's solve this constructively. We don't want our picture to be overly pixelated, meaning that a lot of pixels on the target pixels would correspond to a single pixel on the source, but we don't want to lose data either, meaning that there would be pixels on the source picture that would not pass their color to any pixels on the target.

The rule of thumb I came up with is: the height of the target picture should be the average of the segment lengths from point 1 to point 4 and point 2 to point 3. The width is then the average of the segment lengths from point 1 to 2, and from 3 to 4.

This is not math, this is engineering. It’s not proven to be optimal it’s just something that works most of the time. Please feel free to come up with your own solutions.

Another problem is how do we compute colors in between pixels so the target picture becomes nice and smooth. This is a completely independent problem and it is addressed in the later chapters of my book, *Geometry for Programmers*.

Opinions expressed by DZone contributors are their own.

Comments