Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Disappearing Data Projections

DZone's Guide to

Disappearing Data Projections

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Suppose you have data in an N-dimensional space where N is large and consider the cube [-1, 1]N. The coordinate basis vectors start in the center of the cube and poke out through the middle of the faces. The diagonals of the cube run from the center to one of the corners.

If your points cluster along one of the coordinate axes, then projecting them to that axis will show the full width of the data. But if your points cluster along one of the diagonal directions, the projection along every coordinate axis will be a tiny smudge near the origin. There are a lot more diagonal directions than coordinate directions, 2N versus N, and so there are a lot of orientations of your points that could be missed by every coordinate projection.

Here’s the math behind the loose statements above. The diagonal directions of the form (±1, ±1, …, ±1). A unit vector in one of these directions will have the form (1/√N)(±1, ±1, …, ±1) and so its inner product with any of the coordinate basis vectors is 1/√N, which goes to zero as N gets large. Said another way, taking a set of points along a diagonal and projecting it to a coordinate axis divides its width by √N.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}