Foursquare Moves to the Future With a Geospatial Knowledge Graph
In this interview, learn more about what kind of data Foursquare deals with, what it does with that data, and how using a knowledge graph is going to help.
Join the DZone community and get the full member experience.
Join For FreeIf the name Foursquare rings a bell, it means you were around in the 2010s. Your only resort to plausible deniability would be if you are a data professional – although that’s not an either/or proposition.
In the 2010s, Foursquare was a consumer-oriented mobile application. The premise was simple: people would check in at different locations and get gamified rewards. Their location data would be shared with Foursquare and used for services such as recommendations.
Facebook and Yelp got the lion’s share of that market, but Foursquare is still around. In addition to having 9 billion-plus visits monthly from 500 million unique devices, Foursquare’s data is used to power the likes of Apple, Uber, and Coca-Cola.
Today the company announced Foursquare Graph, what it dubs the industry’s first application of graph technology to geospatial data.
“The Foursquare Graph will harmonize the company’s full product suite, allowing for unprecedented querying, visualization capabilities, and advanced analytics to solve complex technical challenges that enable customers to unlock key business insights with ease and speed," said Gary Little, President and CEO of Foursquare.
I caught up with Vikram Gundeti, Distinguished Engineer at Foursquare, to learn more about what kind of data Foursquare deals with, what it does with that data, and how using the graph is going to help.
The Evolution of Foursquare
Gundeti was the first engineer to work on Amazon’s Alexa and spent about 10 years working on different systems within Alexa. He has been with Foursquare for about a year and a half and was instrumental in building Foursquare Graph.
As Gundeti shared, Foursquare has a huge database of all places in the real world. Those data points have been accrued by check-ins, which makes Foursquare’s data different from many other location datasets. In addition, Foursquare has developed expertise in the way people move across places. The combination of data and expertise enables Foursquare to do a number of things for its clients.
Foursquare’s core products are aptly named Places and Visits. These are datasets that can be used by clients via APIs. Foursquare also offers solutions such as Attribution and Proximity that build on top of the core datasets. Foursquare uses a number of different tools for its online and offline data stacks.
Furthermore, Places and Visits have been enriched via Foursquare’s acquisitions over time. The most notable ones were Factual, Placed, and Unfolded. Each acquisition brought something to Foursquare’s data, but making that happen was also challenging as Gundeti explained.
Foursquare Data Pain Points: Integration, Composability, and Schema
Over a series of iterations, data coming from the acquisition was integrated with the Places and Visits data sets. But that did not spell the end of Foursquare’s data woes:
“We had the Places data set that was in one vertical stack serving the Places product. We had the Visits data set that we are generating, which has a loose dependency on Places because you need Places to snap to a location, and it was in a separate vertical stack. These stacks were very siloed, using different technologies. As a result, when we wanted to add a new attribute, we had to do the work two times," Gundeti said.
This was one of the first symptoms that led to the evolution of Gundeti’s thought process. In data parlance, what Foursquare needed was composability. But this is not the whole story. Gundeti, coming from the world of software services, noticed something else too:
“In services, you have interfaces and APIs. In the data world, we haven’t been as disciplined about interfaces. It’s a natural progression of how evolution of technologies took place in the 2010s. There was heavy usage of key-value stores and then MapReduce and Spark technologies.
That eliminated the emphasis on asking – 'Hey, what is the schema of your data?’ – because you can manage it at the application level. You make adaptations at the application level, and you throw more compute at a problem because the data is stationary," Gundeti said.
Reducing Time To Value
The way data was managed at Foursquare meant that there were many denormalized copies of the same data around. That led not only to confusion but also to more duplicate efforts. For example, when using additional data from OpenStreetMap, the data had to be ingested twice – once time for Places and another for Visits.
To make matters worse, Foursquare customers were also facing similar issues. As Gundeti noted, customers also use Foursquare data combined with more datasets such as demographics or weather, and integrating those datasets requires effort and takes time.
That got Foursquare thinking about how to reduce the time to value for customers. The questions that Foursquare’s customers are looking to answer revolve around two key dimensions: space and time.
Where do people living in this neighborhood typically go for groceries? How far is the typical commute distance? Where do they spend their time? Which categories of places do they spend time over weekends? Is this a pet-friendly neighborhood?
How does the visitation of businesses change when there is a football game going on nearby? What consumption patterns do clients have then? Do I need to stock more pizza? Do I need to stock some other thing?
These are the types of questions Foursquare customers are looking to answer. Addressing those issues, both from a customer perspective and an internal acceleration perspective is what led to the idea of the Foursquare Graph.
Knowledge Graph to the Rescue
Addressing those issues seemed like a typical graph use case in many ways, as data integration and capturing relationships that can be used to derive insights are both things in which knowledge graphs excel. But in this case, there was a catch or two: time and space.
The Foursquare team started looking at different graph database options, but they realized that spatiotemporal modeling and analytics on graphs were rather challenging; so what they ended up with is a hybrid model.
Foursquare worked with a technology partner to build the temporal aspect, which seems a lot like a traditional data warehouse. In addition, information is mined from relationships by applying graph algorithms. That information can be leveraged to provide better recommendations for example, as Gundeti said:
“We will now be able to provide justifications around why we are recommending this place in a more explainable way. Rather than just saying that this has a five-star rating, we can say that people working in this particular locality typically go here for lunch, which is a much stronger signal. We can surface those insights via our mobile apps."
Gundeti also emphasized the privacy aspect. Clearly, working with location data is sensitive. However, Foursquare does not provide individual location data, only aggregates centered around specific locations. The knowledge graph is going to help with that, Gundeti said.
Foursquare Graph Is Both Typical and Atypical
A key concern for Foursquare was whether making their data available as a graph would have an impact on customers who are not familiar with the paradigm. That bridge has not been crossed yet.
Foursquare Graph is not an external-facing product at this point. It’s an internal platform that accelerates development and helps Foursquare better serve existing customers and reach new customers, as per Gundeti. Foursquare Graph is being incrementally rolled out internally.
“The first step was getting all the assets in a single location. Now teams will benefit from consuming those assets. In next iterations, we’ll be using a lot more of that data from the Foursquare Graph. That creates a pattern that automatically eliminates existing sources.
If there was a dependency on a bespoke table, that’s going to be removed and replaced with a dependency on the graph. Some of this is going to be incremental, and some is going to be driven by new product development,” Gundeti said.
As is typical for knowledge graph initiatives, getting it underway may have been the hardest part for Foursquare. It meant that the organization had to explore, document, and question its data landscape and practices, and make hard decisions.
What is atypical about the Foursquare Graph is that it is centered around Geospatial data. Gundeti said that the team was “naive” initially. This looks like a graph problem, so let’s find a graph database for this.
However, after exploring different options, they realized that they needed something bespoke and resorted to using the H3 open-source framework; but it looks like the key benefits of using knowledge graph technology are there.
Published at DZone with permission of George Anadiotis. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments