Over a million developers have joined DZone.

Mining Patent Data to Understand the Nature of Invention

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

News broke yesterday that the United States Patent Office pushed through a cancellation of the Washington Redskins' trademark registrations on the grounds that the team's controversial name is disparaging to Native Americans. While the consequences of this action are still up in the air – especially if the team's ownership challenges the cancellation – it does provide a convenient launching point to talk about one major source of data both archival and continuing that is often overlooked: patent records.

In an article on the MIT Technology Review , we get a glimpse of one project in which patent data – records that go back several centuries – is used to gain a better understanding of the process of invention.

That opens up the possibility of an interesting study, they say. Since the US Patent Office records go back to 1790, it ought to be possible to see how the combination of codes has changed over time. In particular, these records should reveal to what extent invention is the refinement of existing combinations of technologies and to what extent it is the result of new combinations of technologies.

By looking at the technology codes associated with each patent, the study was able to see whether technology was created from recombining certain technology codes or new combinations altogether.

The results give an interesting insight into this question. They suggest that some 40 per cent of new inventions rely on previously existing combinations of technologies while about 60 per cent introduce entirely new combinations of technologies.

That has important implications. One idea is that new inventions can come about through a random walk through the space of all possible permutations of technologies. But the fact that 40 per cent reuse previously existing combinations suggests that invention is not the result of this kind of random search.

There is much more work to be done even in just analyzing these technology codes and the stories that they tell about the evolution of technology.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Whitney Baker. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}