# Topological Data Analysis: Extracting Meaning From Big Data

# Topological Data Analysis: Extracting Meaning From Big Data

### Not only does Topological Data Analysis have a potential to change how we, as humans, understand data; it might affect greatly the way perceive technology.

Join the DZone community and get the full member experience.

Join For FreeHortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

We live in an era of Big Data.

Businesses collect and analyze clients' records to drive growth. Healthcare companies use biometrics and stats from sensors to better patient care. Data examination is now a crucial part of development strategy for most firms worldwide.

However, as the world’s digitalization advances and the number of online transactions grows at a frenetic pace, the quantity and complexity of data sets grow, too — exponentially.

At this point, even the most sophisticated firms often find themselves lacking analytical capabilities to process and extract meaning from an overwhelming flow of data they receive.

So what are corporations to do? Is there a way for them to examine data beyond conventional hypothesis-driven analysis?

A group of experts in Topological Data Analysis (a new study practiced and commercialized by Ayasdi) claims to have a solution.

## Topological Data Analysis (TDA): Introduction

Topology, the discipline from which TDA originates, is a branch of mathematics that concerns itself with measurement and representation of shapes. As part of math, it has been studied for over 250 years. But it’s in the last 20 years, however, that scientists have begun to research applications of topology to various real world problems.

The two main functions of the discipline — measurement and representation of shape — are especially relevant in the context of analyzing big, highly complex, feature-rich data sets.

And thus Topological Data Analysis became a thing.

## What Does Topological Data Analysis Do?

What do you think of when someone says *data*?

I’ve always imagined sets of numbers and maybe some mysterious abbreviations piled up in spreadsheets. Or, sometimes, percentages. Or mathematical fractions.

Whichever image it is, the data has never looked simple or understandable in my head. I’ve thought of it as requiring encryption and imagined only trained analysts, people with a firm grip on advanced math concepts, to be able to squeeze any insights out of it.

TDA is set to change the way we perceive data. It claims that data has shape.

Here’s how it works. Suppose there’s a large data set that is divided into groups.

But instead of conventional columns that are used to view data, the groups are represented by nodes, which are connected to one another reflecting the relationships between the groups.

Now, instead of a plethora of figures and columns and rows, which are all unstructured and incomprehensible, we have something more pleasant-looking: a network. Our data now has shape.

And since human visual perception is powerful, it is simple for us to identify features within such a network that correspond to patterns within our data set.

These patterns are precisely what TDA experts are referring to when they say that data has meaning.

## TDA’s Properties

There are three core properties of Topological Data Analysis that form its unique power to understand shapes.

### 1. Coordinate Invariance

TDA studies properties of shapes that are not dependent on the coordinate systems in which data is viewed. Therefore, rotations and possible switches of these systems do not in any way affect the shape’s examination.

TDA is concerned with qualitative values which are “firm.” It studies things like distances between pairs of points within a data set — metric spaces.

What that means in layman’s terms is that TDA won’t let you get confused over the difference between 54 kg and 119 pounds (weight being a property here, and the units of mass used, our coordinate systems). It shows you properties that do not change when translations, scaling, and other minor modifications are applied.

### 2. Deformation Invariance

The properties of shapes in TDA are set to be deformation invariant — for example, insensitive to stretching, squashing, bending, and so forth.

As long as shape has not undergone substantial modifications and as long as no tearing of a shape has taken place, the TDA’s perception of it won’t be affected.

To understand this concept, think of the letter A, and of how good you are at recognizing it.

No matter which font the A might be drawn in, no matter what size and color it is, your eyes will still be able to make it out, unless the letter has been changed radically and doesn’t look like itself anymore.

### 3. Compressed Representation

Rather than taking on objects in their entirety, TDA produces compressed representations.

The value of this property lies in giving analysts a chance to retrieve meaning from data, even if it’s highly-dimensional and thus utterly complex.

To get this, suppose you’re dealing with a circle shape which includes in it, as circles do, an infinite number of points and pairwise distances. The idea of a comprehensive analysis of such shape, normally, seems next to impossible.

Now, imagine that the circle is suddenly turned into a hexagon and the relationships between data points within it are boiled down to a list of nodes and edges, while the original shape’s loopy property it retained.

That is what TDA experts would call a summary; it’s a representation of a complex shape that encodes all the data points’ relationships and therefore makes key patterns easily observable.

## Conclusion

Not only does Topological Data Analysis have a potential to change how we, as humans, understand data; it might affect greatly the way we perceive technology.

After more data scientists become accustomed to TDA, we should expect a considerable deepening of automation within logistics and marketing software, and, maybe, a drastic increase in the capabilities of technology regarding discovery. After automating routine tasks for years quite successfully we might finally be able to teach our software to conduct researches and management as well.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. Join the discussion.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}