Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Harness Wikipedia's Location Data with MongoDB

DZone's Guide to

How to Harness Wikipedia's Location Data with MongoDB

· Java Zone
Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

There is a lot of information on Wikipedia - basically all the information there is, right? - but one particularly interesting bit of data that can be pulled from Wikipedia articles is location data. Some articles, particularly those that refer to a specific place, are geotagged with latitude and longitude information. And when you have a boatload of data with no real purpose - you know, like a bunch of basketball stats, or something - what do you do with it? Put it into MongoDB, of course.

From Andy Jenkins at Gear11.org comes a walkthrough to help you do just that. He makes it sound easy, splitting the process into four broad steps:

  1. How to download everything from wikipedia
  2. How to extract the location data from the articles
  3. How to load the data into MongoDB
  4. How to query the data you've loaded

It looks pretty straight-forward, but there is a fairly substantial time commitment when it comes to downloading and processing the data. Just pulling everything from Wikipedia will take you a few hours, Jenkins says. Ultimately, though, it's a cool little project to try out yourself, or to expand with alternative tools or datasets:

Wikipedia is certainly not the only body of location-relevant content, but it is a good one to start with. And MongoDB is not the only database with a GeoSpatial index, but it is easy to get up and running with.

Check out the full article for all the details.

Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}