Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

U.S. Laws vs. The Human Genome

DZone's Guide to

U.S. Laws vs. The Human Genome

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Since you can download the U.S. Code, I thought it would be interesting to compare the size to that of the Human Genome, operating on the premise that the latter represents the DNA for a living thing, and the former, the DNA for a nation.

I’ve charted this below – to reproduce this you need to plot the sizes of the compressed file for each genome. Using the compressed form rather than uncompressed means that these numbers represent the amount of unique information encoded in a file, rather than counting superfluous data like whitespace and repetitive symbols (if we were to use uncompressed files, this would make make the U.S. Code look quite large by comparison- ~486 MB)

Here, we have see the sizes of genomic data for many species:

figure_1

If we zoom in on the left, we can add the U.S. Code:

figure_2

It looks like it’s near in size to a few types of fish – if you could obtain relevant state laws, this would likely jump quite a bit (my home state of Pennsylvania does not easily allow downloading a copy of all the laws at once).

Should you wish to access this data or reproduce my results, it is available as a simple python script:

sizes = {"Lizard": 492,
"Human": 778,
"Alpaca": 738,
"Armadillo": 902,
"Cod": 238,
"Baboon": 826,
"budgerigar ": 312,
"Bushbaby": 630,
"Cat": 615,
"Chicken": 296,
"Chimp": 823,
"coelacanth ": 796,
"Cow": 745,
"Dog": 603,
"Dolphin": 646,
"Elephant": 800,
"Ferret": 603,
"Fugu": 107,
"Gibbon": 737,
"Gorilla": 758,
"Hedgehog": 901,
"Kangaroo": 545,
"Lamprey": 238,
"Manatee": 775,
"Marmoset": 727,
"Ground Finch": 305,
"Megabat": 500,
"Microbat": 507,
"Mouse": 682,
"Lemur": 722,
"Naked Mole-rat": 653,
"Tilapia": 261,
"Monodelphis domestica ": 907,
"Painted Turtle": 714,
"Panda": 577,
"Pig": 702,
"Pika": 844,
"Rabbit": 682,
"Rat": 725,
"Rhesus": 743,
"Rock Hyrax": 751,
"Sheep": 718,
"Shrew": 796,
"Sloth": 627,
"Squirrel Monkey": 652,
"Tasmanian Devil": 920,
"Tenrec": 947,
"Tetroadon": 98,
"Tree Shrew": 908,
"Turkey": 257,
"Wallaby": 787,
"Rhino": 613,
"US Federal Code": 84,
"Zebrafish": 355,
"Yeast": 2.9}
 
 
import numpy as np
import matplotlib.pyplot as plt
 
fig = plt.figure()
 
width = .1
ind = np.arange(len(sizes))
values = sizes.values()
values.sort()
plt.bar(ind, values)
 
plt.ylabel("Megabytes (Compressed)")
 
keys = sizes.keys()
keys.sort(lambda a, b: int(round(sizes[a] - sizes[b])))
plt.xticks(ind + width / 2, keys)
 
fig.autofmt_xdate(rotation = 90)
 
plt.show()
import numpy as np
import matplotlib.pyplot as plt
 
fig = plt.figure()
 
width = .1
ind = np.arange(len(sizes))
values = sizes.values()
values.sort()
plt.bar(ind, values)
 
 
plt.ylabel("Megabytes (Compressed)")
 
keys = sizes.keys()
keys.sort(lambda a, b: int(round(sizes[a] - sizes[b])))
plt.xticks(ind + width / 2, keys)
 
plt.xticks(rotation=90)
 
ax1 = fig.add_subplot(111)
bars = ax1.bar(range(0,len(sizes)), range(0,len(sizes)), color='blue', edgecolor='black')
 
bars[1].set_facecolor('red')
bars[1].set_height(sizes["US Federal Code"])
 
plt.subplots_adjust(left=0.125, right=0.9, top=0.9, bottom=0.2, wspace=0.2, hspace=0.2)
 
plt.show()

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:

Published at DZone with permission of Gary Sieling, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}