Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Vector Based Languages

DZone's Guide to

Vector Based Languages

Vectorization comes from R, but it applications in other languages as well. In this post, explore vectorization in Python.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Image title

After working in data science for a while there is one concept that I began to take for granted: Vectorization.

The term Vectorization comes from R. It can have other names but I like Vectorization because it sounds cool.

In a normal programming language, if you want to add two arrays together it can be quite a grind.

Let’s say you want to do this in regular ‘ole Python (or C or any other ‘normal’ language), you would have to build an elaborate series of for-loops, like this:

d = [1,2,2,3,4]
e = [4,5,4,6,4]
f = []
for x in range(0, len(d)):
     f.append(d[x]*e[x])
print(f)
[4, 10, 8, 18, 16]

That’s all fine and good, but now imagine doing that with 2D matrices. Or multiple arrays. Or performing even more complex math on any of them. Things can get very complicated very quickly. I've never seen a 5 or 6 deep for-loop that I liked.

In a Vector Based Language, you don’t have to go through that whole rigamarole.  Instead you can just do this:

d = np.array([1,2,2,3,4])
e = np.array([4,5,4,6,4])
print (d*e)
[4, 10, 8, 18, 16]

Vector Based Languages let you perform mathematical functions on entire lists or matrices as though they were single objects.

d = np.array([[1,2,2,3,4],
[3,2,8,7,12],
[11,21,26,3,43]])
e = np.array([[4,5,4,6,4],
[13,21,21,31,24],
[51,12,22,31,46]])

print (d*e)
[[   4 10    8 18 16]
 [  39 42  168 217 288]
 [ 561  252 572   93 1978]]

With a vectorized language, like R, or Python with NumPy, you can do these types of calculations simply and without concern about the underbelly of the process.

Thank Thor for this technology. Staring at endless nested for-loops would cause me to pull my eyeballs out.

Again, I completely lost any appreciation for this important construct because getting knee deep in NumPy or R will allow you to do that.  Just wait until you get back to your C programming!  Then you'll appreciate it again.

After many months I've realized that this may be the most important thing that differentiates data related languages from others. Language types are an interesting subject; often I find it a tad arbitrary, especially since some of the major themes of one type can drip into others. R is technically a functional language and Java is procedural. But you're going to find them both listed under the same categories.

It's the way you use them....

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
vector ,r programming ,numpy ,big data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}