# Even Boring Form Data Can Be Interesting (For A Developer)

# Even Boring Form Data Can Be Interesting (For A Developer)

Join the DZone community and get the full member experience.

Join For FreeHortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

*you can look them up online*), so you don't even need to encrypt it – difficult to get excited about that. Of course if that was the end of the story, this wouldn't be much of a blog post, so – as you might imagine –

**things are not as bland as they appear**.

## The Interesting Thing About A Credit Card Number…

At CrowdHired, we don't tend to deal much with credit card numbers, but ABNs are another matter entirely – since companies are one of the two types of users we have in the system (*by the way – as you may have deduced – I've been working for a startup for the last few months, I should really talk about how that happened, it's an interesting story*). Just like any piece of data, you want to validate the user input if you possibly can. When I started looking into this for ABNs I discovered that they had an interesting trait, it is a trait which credit card numbers share. **You see, both credit cards and ABNs are self-verifying numbers**.

I've been doing web development for many years now, but had no idea this was the case. So naturally – being the curious developer that I am – I had to dig a little further. It turns out that these kinds of numbers are quite common, with other well-known examples being ISBNs, UPCs and VINs. **Most of these use a variation of a check digit-based algorithm for both validation and generation**. Probably the most well-known of these algorithms is the Luhn algorithm which is what credit cards use. So, we'll use a credit card as an example.

## Validating And Generating Credit Cards

Let us say we have the following credit card number:

4870696871788604

It is 16 digits (*Visa and MasterCard are usually 16, but Amex is 15*). This number is broken down in the following way:

Issuer Number | Account Number | Check Digit 487069 | 687178860 | 4

You can read lots more about the anatomy of a credit card, but all we want to do is apply the Luhn algorithm to check if this credit card is valid. It goes something like this:

**1. Starting from the back, double every second digit**

4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4 8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4

**2. If the doubled numbers form a double digit number, add the two digits**

4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 0 | 4 8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 |00 | 4 8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 0 | 4

**3. Sum up all the digits of this new number**

8+8+5+0+3+9+3+8+5+1+5+8+7+6+0+4 = 80

**4. If the number is perfectly divisible by 10 it is a valid credit card number. Which in our case it is.**

You can see how we can use the same algorithm to generate a valid credit card number. All we have to do is set the check digit value to **X** and then perform all the same steps. During the final step we simply pick our check digit in such a way as to make sure the sum of all the digits is divisible by 10. Let's do this for a slightly altered version of our previous credit card number (*we simply set the digit before the check digit to 1 making the credit card number invalid*).

4 | 8 | 7 | 0 | 6 | 9 | 6 | 8 | 7 | 1 | 7 | 8 | 8 | 6 | 1 | X 8 | 8 |14 | 0 |12 | 9 |12 | 8 |14 | 1 |14 | 8 |16 | 6 | 2 | X 8 | 8 | 5 | 0 | 3 | 9 | 3 | 8 | 5 | 1 | 5 | 8 | 7 | 6 | 2 | X 8+8+5+0+3+9+3+8+5+1+5+8+7+6+2+X = 78+X X = (78%10 == 0) ? 0 : 10 - 78%10 X=2

As you can see **no matter what the other 15 digits are, we'll always be able to pick a check digit between 0 and 9 that will make the credit card number valid**.

Of course not every self-verifying number uses the Luhn algorithm, most don't use mod(10) to work out what the check digit should be, and for some numbers like the IBAN, the check digit actually consists of 2 digits. And yet, the most curious self-verifying number of the lot is the first one I learned about – the ABN. This is because, for the life of me, **I couldn't work out what the check digit of the ABN could be**.

## The Curious Case Of The ABN

Australia is certainly is not averse to using check digit-based algorithms. The Australian Tax File Number (*TFN*) and the Australian Company Number (*ACN*) are just two examples, but the ABN seems to be different. At first glance the ABN validation algorithm is just more of the same, it just has a larger than normal "*mod*" step at the end (mod(89)).

- Subtract 1 from the first (left) digit to give a new eleven digit number
- Multiply each of the digits in this new number by its weighting factor
- Sum the resulting 11 products
- Divide the total by 89, noting the remainder
- If the remainder is zero the number is valid

In-fact, here is some ruby code to validate an ABN which I appropriated from the Ruby ABN gem (*and then rolled it into a nice Rails 3, ActiveRecord validator so we could do *validates_abn_format_of* in all out models :)*) :

def is_integer?(number) Integer(number) true rescue false end def abn_valid?(number) raw_number = number number = number.to_s.tr ' ','' return false unless is_integer?(number) && number.length == 11 weights = [10, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19] sum = 0 (0..10).each do |i| c = number[i,1] digit = c.to_i - (i.zero? ? 1 : 0) sum += weights[i] * digit end sum % 89 == 0 ? true : false end

But, **while validating ABNs is easy, generating them is a whole other matter**. As we've seen, with a check digit-based algorithm, generating the number is the same as validating the number, except we pick the digit in such a way as to make our 'mod' step evaluate to zero. But with a number such as the ABN, where there is no apparent check digit (*perhaps I am just having a bout of stupid, so if you can see an obvious check digit with ABNs do let me know*), how do you easily generate a valid number? In-fact, * why would you want to generate these numbers in the first place, isn't being able to validate them enough*?

Well, in the case of CrowdHired, we tend to create object trees that are quite deep, so we build an maintain some infrastructure code to allow us to create fake data for use during development (*another interesting thing to talk about at a later date*). Before we started using the self-validating properties of ABNs we simply generated any old 11 digit number as fake data for ABN fields, but once the validations started kicking in this was no longer an option. Being the pragmatic developers that we are (*even if we do say so ourselves*), we took some real ABNs (*like our own*) chucked them into an array and randomly picked from there. But, this offended the developer gods, or my developer pride – whichever, so one Saturday I decided to take a couple of hours to generate some truly random ABNs that were still valid. Here is the code I came up with (*it is now a proud part of our fake data generation script*):

def random_abn weights = [10,1,3,5,7,9,11,13,15,17,19] reversed_weights = weights.reverse initial_numbers = [] final_numbers = [] 9.times {initial_numbers << rand(9)+1} initial_numbers = [rand(8)+1, rand(7)+2] + initial_numbers products = [] weights.each_with_index do |weight, index| products << weight * initial_numbers[index] end product_sum = products.inject(0){|sum, value| sum + value} remainder = product_sum % 89 if remainder == 0 final_numbers = initial_numbers else current_remainder = remainder reversed_numbers = initial_numbers.reverse reversed_weights.each_with_index do |weight, index| next if weight > current_remainder if reversed_numbers[index] > 0 reversed_numbers[index] -= 1 current_remainder -= weight if current_remainder < reversed_weights[index+1] redo end end end final_numbers = reversed_numbers.reverse end final_numbers[0] += 1 final_numbers.join end

The idea is pretty simple. Let's go through an example to demonstrate:

#### 1. Firstly we randomly generate 11 digits between 0 and 9 to make up our probably ABN (*they are actually not all between 0 and 9 but more on that shortly*)

7 5 8 9 8 7 3 4 1 5 3

#### 2. We then perform the validation steps on that number

multiply the digits by their weights to get weight-digit products

7x10=70 5x1=5 8x3=24 9x5=45 8x7=56 7x9=63 3x11=33 4x13=52 1x15=15 5x17=85 3x19=57

- sum the products
70+5+24+45+56+63+33+52+15+85+57 = 505

- divide by 89 to get the remainder
505 mod 89 = 60

#### 3. Since we do mod(89) at worst we'll be off by 88 (*although if we get 0 as the remainder we lucked out with a valid ABN straight away*), we now use the weight-digit products to "*give change*", subtracting from the remainder as we go until we hit zero.

We start with the last digit where the weight is 19. We subtract 1 from this digit, which means we can subtract 19 from our remainder. We then move on to the next digit until the remainder hits zero

Initial | Change | Remainder ------------------------------- 7x10=70 | 7x10=70 | 0 5x1=5 | 5x1=5 | 0 8x3=24 | 8x3=24 | 0 9x5=45 | 9x5=45 | 0 8x7=56 | 8x7=56 | 0 7x9=63 |6x9=63 | 0 3x11=33 | 3x11=33 | 9 4x13=52 | 4x13=52 | 9 1x15=15 | 0x15=0 | 9 5x17=85 |4x17=68 | 24 3x19=57 |2x19=38 | 41

#### 4. This gives us our new number

7 5 8 9 8 6 3 4 0 4 2

#### 5. Now we just need to add 1 to the very first number (*as per the ABN validation steps*) and we have our valid ABN

85898634042

There are a couple of nuances to those steps.

**None of our initial generated digits can be zero**. Since we "*give change*" by subtracting 1 from each digit we need to ensure we can actually subtract 1 (otherwise things become much more complex). So we ensure our digits are randomly generated between 1 and 9 rather tahn between 0 and 9.**Even when all our initial digits are at least 1 we may still fail to "**, the easiest example of this is when our remainder is 2. The only digit when can use to give change is the one with the weight of 1 (*give change*" for some of the remainders*i.e. the second digit of the ABN*). If this digit is initially generated to be 1, we can only give change once at which point we end up with a remainder of 1 and there is nothing more we can do about it. In-fact this exact scenario can happen for a few other remainders namely 86, 77, 66, 53, 38, 21. The easies way to overcome this problem is to ensure that the digit that has a weight of 1, is always randomly generated to have a value of at least 2. This way we can use it to give change twice and our problematic remainders are covered.- Lastly,
**since we have to add 1 to the very first digit as our final step, we need to make sure that this digit is not already equal to 9**, so we generate that one to be between 1 and 8.

Given these nuances, this algorithm won't generate every possible ABN, but it will give you a large percentage of possible ABNs which is good enough for our needs. It took about an hour to get that working (*we won't mention the little bug where I forgot the remainder could be zero from the start, which caused much grief to our random data generator :)*), but it was a fun little exercise – time well spent as far as I am concerned. And to think, all this learning about self-validating numbers and algorithmic coding fun was triggered by trying to capture the most mundane piece of data on a form. It just goes to show that **you can learn and grow no matter where you are and what you're doing, you just need to see the opportunities** for what they are.

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Published at DZone with permission of Alan Skorkin , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}