Vector Based Languages
Vectorization comes from R, but it applications in other languages as well. In this post, explore vectorization in Python.
Join the DZone community and get the full member experience.
Join For FreeAfter working in data science for a while there is one concept that I began to take for granted: Vectorization.
The term Vectorization comes from R. It can have other names but I like Vectorization because it sounds cool.
In a normal programming language, if you want to add two arrays together it can be quite a grind.
Let’s say you want to do this in regular ‘ole Python (or C or any other ‘normal’ language), you would have to build an elaborate series of for-loops, like this:
d = [1,2,2,3,4]
e = [4,5,4,6,4]
f = []
for x in range(0, len(d)):
f.append(d[x]*e[x])
print(f)
[4, 10, 8, 18, 16]
That’s all fine and good, but now imagine doing that with 2D matrices. Or multiple arrays. Or performing even more complex math on any of them. Things can get very complicated very quickly. I've never seen a 5 or 6 deep for-loop that I liked.
In a Vector Based Language, you don’t have to go through that whole rigamarole. Instead you can just do this:
d = np.array([1,2,2,3,4])
e = np.array([4,5,4,6,4])
print (d*e)
[4, 10, 8, 18, 16]
Vector Based Languages let you perform mathematical functions on entire lists or matrices as though they were single objects.
d = np.array([[1,2,2,3,4],
[3,2,8,7,12],
[11,21,26,3,43]])
e = np.array([[4,5,4,6,4],
[13,21,21,31,24],
[51,12,22,31,46]])
print (d*e)
[[ 4 10 8 18 16]
[ 39 42 168 217 288]
[ 561 252 572 93 1978]]
With a vectorized language, like R, or Python with NumPy, you can do these types of calculations simply and without concern about the underbelly of the process.
Thank Thor for this technology. Staring at endless nested for-loops would cause me to pull my eyeballs out.
Again, I completely lost any appreciation for this important construct because getting knee deep in NumPy or R will allow you to do that. Just wait until you get back to your C programming! Then you'll appreciate it again.
After many months I've realized that this may be the most important thing that differentiates data related languages from others. Language types are an interesting subject; often I find it a tad arbitrary, especially since some of the major themes of one type can drip into others. R is technically a functional language and Java is procedural. But you're going to find them both listed under the same categories.
It's the way you use them....
Opinions expressed by DZone contributors are their own.
Comments