How to Harness Wikipedia's Location Data with MongoDB
Join the DZone community and get the full member experience.
Join For FreeThere is a lot of information on Wikipedia - basically all the information there is, right? - but one particularly interesting bit of data that can be pulled from Wikipedia articles is location data. Some articles, particularly those that refer to a specific place, are geotagged with latitude and longitude information. And when you have a boatload of data with no real purpose - you know, like a bunch of basketball stats, or something - what do you do with it? Put it into MongoDB, of course.
From Andy Jenkins at Gear11.org comes a walkthrough to help you do just that. He makes it sound easy, splitting the process into four broad steps:
- How to download everything from wikipedia
- How to extract the location data from the articles
- How to load the data into MongoDB
- How to query the data you've loaded
It looks pretty straight-forward, but there is a fairly substantial time commitment when it comes to downloading and processing the data. Just pulling everything from Wikipedia will take you a few hours, Jenkins says. Ultimately, though, it's a cool little project to try out yourself, or to expand with alternative tools or datasets:
Wikipedia is certainly not the only body of location-relevant content, but it is a good one to start with. And MongoDB is not the only database with a GeoSpatial index, but it is easy to get up and running with.
Check out the full article for all the details.
Opinions expressed by DZone contributors are their own.
Comments