DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Overcoming Some Pitfalls of the Google Maps API
  • Router4j: A Free Alternative to Google Maps for Route and Distance Calculation
  • How to Convert Excel and CSV Documents to HTML in Java
  • Keep Your Application Secrets Secret

Trending

  • Designing a Java Connector for Software Integrations
  • Cloud Security and Privacy: Best Practices to Mitigate the Risks
  • AI Speaks for the World... But Whose Humanity Does It Learn From?
  • Secrets Sprawl and AI: Why Your Non-Human Identities Need Attention Before You Deploy That LLM
  1. DZone
  2. Data Engineering
  3. Databases
  4. Bulk Geocode Addresses Using Google Maps and GeoPy

Bulk Geocode Addresses Using Google Maps and GeoPy

Learn how to ingest and analyze big data sets of geographical coordinates with Python.

By 
Cedric Brun user avatar
Cedric Brun
·
Updated Jun. 26, 19 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
32.5K Views

Join the DZone community and get the full member experience.

Join For Free

Geocoding is the process of converting addresses (like a street address) into geographic coordinates (like latitude and longitude). With Woosmap you can request nearby location or display on a map a lot of geographic elements like stores or any other point of interest. To take advantages of these features, you first have to push geocoded locations to our system, as we discussed in this previous post. Most of the time, your dataset has addresses but no location information.

The following script, hosted on our Woosmap Github Organization, is a basic utility for geocoding CSV files that have address data included. It will parse the file and add coordinate information as well as some metadata on geocoded results like the location type, as discussed below. It calls the Geocoding API through the GeoPy Python client. 

The source code is open to view and change, and contributions are welcomed.

GeoPy

GeoPy is a Python client for several popular geocoding web services. It makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources. It includes geocoder classes for the OpenStreetMap Nominatim, ESRI ArcGIS, Google Geocoding API (V3), Baidu Maps, Bing Maps API, Mapzen Search, Yandex, IGN France, GeoNames, NaviData, OpenMapQuest, What3Words, OpenCage, SmartyStreets, geocoder.us, and GeocodeFarm geocoder services.

Geocoding API

The Geocoding API is a service that provides geocoding and reverse geocoding of addresses. In this Python script we used the GeoPy wrapper of this API but a nice alternative could be to implement the Python Client for Google Maps Services available on Google GitHub organization. Adapting the source code of the script to this Python client would be easy as the Geocoding method accepts identical parameters and returns almost the same object. 

API Key

Each Google Maps Web Service request requires an API key that is freely available with a Google Account at Google Developers Console. The type of API key you need is a Server key.

To get an API key (see the guide to API keys):

  1. Visit the Google Developers Console and log in with a Google Account.
  2. Select one of your existing projects, or create a new project.
  3. Enable the Geocoding API.
  4. Create a new Server key.
  5. If you’d like to restrict requests to a specific IP address, do so now.

Important: This key should be kept secret on your server (but you can revoke a key and generate a new one if needed)

Client ID + Digital Signature (Google Maps Platform Users)

A Client ID is given to you when you sign up as a Google Maps Platform. The Digital Signature is generated using a cryptographic key provided to you by Google and used when authenticating with a Client ID. 

To get your Client ID and Crypto Key (see the guide to API keys):

  1. Visit the Google Maps Platform Support Portal and log in with your administrative Account.
  2. Go to Maps: Manage Client ID menu entry.
  3. Select your Client ID in the drop down list.
  4. Click on the button Show Crypto Key.

Important: This Crypto Key must be kept secrete as you can’t revoke it!

Script Usage

The script takes an input CSV file with addresses you need to geocode and produce an output CSV file that contains all values from origin csv with appended following fields:

  • Latitude
  • Longitude
  • Location_Type (see )
  • Formatted_Address
  • Error (if needded, for failed geocoded addresses)

Download the script to your local machine. Then run:

python google_batch_geocoder.py

To make it compatible with your own CSV file and Google keys, you have to set the following parameters on top of the script. 

Mandatory Parameters

ADDRESS_COLUMNS_NAME - List - used to set a google geocoding query by merging these values into one string comma separated. it depends on your CSV input file.

  • NEW_COLUMNS_NAME - List - appended columns name to processed data CSV (no need to change this but you can add new columns depending on Geocoding Google Results).
  • DELIMITER - String - delimiter for your input CSV file.
  • INPUT_CSV_FILE - String - path and name for input CSV file.
  • OUTPUT_CSV_FILE - String - path and name for output CSV file.

Optional Parameters

COMPONENTS_RESTRICTIONS_COLUMNS_NAME - Dict - used to define component restrictions for Google geocoding. See Google component Restrictions doc for details.

  • GOOGLE_SECRET_KEY - String - Google Secret Key, used by GeoPy to generate a Digital Signature that allows you to geocode for Google Maps API For Work Users.
  • GOOGLE_CLIENT_ID - String - Google Client ID, used to track and analyse your requests for Google Maps API For Work Users. If used, you must also provide GOOGLE_SECRET_KEY.
  • GOOGLE_API_KEY - String - Google API Server Key. It will become a mandatory parameter soon.

Input Data

The sample data (hairdresser_sample_addresses.csv) supported by default is a CSV file representing various hairdressers around the world. You can see below a subset of what the file looks like:

name addressline1 town postalcode isocode
Hairhouse Warehouse Riverlink Shopping centre Ipswich 4305 AU
Room For Hair 20 Jull Street ARMADALE 6112 AU
D'luxe Hair & Beauty Burra Place Shellharbour City Plaza Shellharbour 2529 AU 

Script Explanation

Prepare the Destination File

As described above, the destination file contains all the origins fields plus geocoded data (latitude, longitude, and error (when one occurs)) and some metadata (formatted_address and location_type). In the case of our sample file, the Header of destination CSV file looks like : 

name;addressline1;town;IsoCode;Lat;Long;Error;formatted_address;location_type

To build it, open the input and output CSV files and create a new header to append to the destination file:

with open(INPUT_CSV_FILE, 'r') as csvinput:
    with open(OUTPUT_CSV_FILE, 'w') as csvoutput:
        # new csv based on same dialect as input csv
        writer = csv.writer(csvoutput, dialect="ga")

        # create a proper header with stripped fieldnames for new CSV
        header = [h.strip() for h in csvinput.next().split(DELIMITER)]

        # read Input CSV as Dict of Dict
        reader = csv.DictReader(csvinput, dialect="ga", fieldnames=header)

        # 2-dimensional data variable used to write the new CSV
        processed_data = []

        # append new columns, to receive geocoded information 
        # to the header of the new CSV
        header = list(reader.fieldnames)
        for column_name in NEW_COLUMNS_NAME:
            header.append(column_name.strip())
        processed_data.append(header)

Build an Address Line

The principle is to build for each row a line address by merging multiple values in one string, based on the list ADDRESS_COLUMNS_NAME, to pass it to Google Geocoder. For instance, the first line of our csv will become the line address: "Hairhouse Warehouse, Riverlink Shopping centre, Ipswich".

# iterate through each row of input CSV
for record in reader:
    # build a line address 
    # based on the merge of multiple field values to pass to Google Geocoder`
    line_address = ','.join(
        str(val) for val in (record[column_name] for column_name in ADDRESS_COLUMNS_NAME))

Apply the Component Restrictions

The Geocoding API is able to return limited address results in a specific area. This restriction is specified in the script using the filters dict COMPONENTS_RESTRICTIONS_COLUMNS_NAME. A filter consists in a list of pairs component:value. You can leave it empty {} if you don’t want to apply restricted area.

# if you want to use componentRestrictions feature,
# build a matching dict {'googleComponentRestrictionField' : 'yourCSVFieldValue'}
# to pass to Google Geocoder
component_restrictions = {}
if COMPONENT_RESTRICTIONS_COLUMNS_NAME:
    for key, value in COMPONENT_RESTRICTIONS_COLUMNS_NAME.items():
        component_restrictions[key] = record[value]

Geocode the Address

Before calling the geocoding method of GeoPy, instantiate a new GoogleV3 Geocoder with your Google credentials, at least the Server API Key.

geo_locator = GoogleV3(api_key=GOOGLE_API_KEY,
                       client_id=GOOGLE_CLIENT_ID,
                       secret_key=GOOGLE_SECRET_KEY)

Then call the geocoding method, passing the geocoder instance, build line address, and optional component restrictions.

# geocode the built line_address and passing optional componentRestrictions
location = geocode_address(geo_locator, line_address, component_restrictions)

def geocode_address(geo_locator, line_address, component_restrictions=None, retry_counter=0):
    # the geopy GoogleV3 geocoding call
    location = geo_locator.geocode(line_address, components=component_restrictions)

    # build a dict to append to output CSV
    if location is not None:
        location_result = {"Lat": location.latitude, "Long": location.longitude, "Error": "",
                           "formatted_address": location.raw['formatted_address'],
                           "location_type": location.raw['geometry']['location_type']}
    return location_result

Retry on Failure

The script retries to geocode the given line address when intermittent failures occur. That is, when the GeoPy library raised a GeocodeError exception that means that any of the retriable 5xx errors are returned from the API. By default the retry counter (RETRY_COUNTER_CONST) is set to 5:

# To retry because intermittent failures sometimes occurs
except (GeocoderQueryError) as error:
    if retry_counter < RETRY_COUNTER_CONST:
        return geocode_address(geo_locator, line_address, component_restrictions, retry_counter + 1)
    else:
        location_result = {"Lat": 0, "Long": 0, "Error": error.message, "formatted_address": "",
                           "location_type": ""}

Handle Generic and Geocoding Exceptions

Other exceptions can occur, like when you exceed your daily quota limit or request by seconds. To support them, import the GeoPy exceptions and handle each error after the geocode call. The script also raises an error when no geocoded address is found. The error message is appended to the Error CSV field.

# import Exceptions from GeoPy
from geopy.exc import (
    GeocoderQueryError,
    GeocoderQuotaExceeded,
    ConfigurationError,
    GeocoderParseError,
)

# after geocode call, if no result found, raise a ValueError
if location is None:
    raise ValueError("None location found, please verify your address line")

# To catch generic and geocoder errors.
except (ValueError, GeocoderQuotaExceeded, ConfigurationError, GeocoderParseError) as error:
    location_result = {"Lat": 0, "Long": 0, "Error": error.message, "formatted_address": "", "location_type": ""}

Query-per-Second

If you’re not a customer of the Google Maps Platform, you will have to wait between two API calls, otherwise you will raise an OVER_QUERY_LIMIT error due to the quotas request per seconds. 

# for non customer, we have to sleep 500 ms between each request.
if not GOOGLE_SECRET_KEY:
    time.sleep(0.5)

Results

Apart from the Latitude and Longitude fields, here are the three main results appended to the destination CSV file.

Formatted Address 

The formatted_address matches a readable address of the place (and original line address). This address is most of the time equivalent to the “postal address.”

Geocoding Accuracy Labels

For each successfully geocoded address, a geocoding accuracy result is returned in the location_type field. It comes from the Geocoding API service (see the Google geocoding Results doc for more information). The following values are currently supported:

  • “ROOFTOP” indicates that the returned result is a precise geocode for which we have location information accuracy down to street address precision.
  • “RANGE_INTERPOLATED” indicates that the returned result reflects an approximation (usually on a road) interpolated between two precise points (such as intersections). Interpolated results are generally returned when rooftop geocodes are unavailable for a street address.
  • “GEOMETRIC_CENTER” indicates that the returned result is the geometric center of a result such as a polyline (for example, a street) or polygon (region).
  • “APPROXIMATE” indicates that the returned result is approximate.

Error

Whenever the call to the Google Maps API failed, the script will return an error message: “No location found, please verify your address line.” For all other errors, the message raised by GeoPy will be appended to the field value. See GeoPy Exceptions for more details.

Useful Links

  • This script on the Woosmap Github Organization
  • GeoPy
  • Alternative to GeoPy : Python Client for Google Maps Services
  • Google Geocoding Strategies
  • Google geocoding Results doc

This article has been published first on woosmap.com

Google (verb) Google Maps API CSV

Published at DZone with permission of Cedric Brun. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Overcoming Some Pitfalls of the Google Maps API
  • Router4j: A Free Alternative to Google Maps for Route and Distance Calculation
  • How to Convert Excel and CSV Documents to HTML in Java
  • Keep Your Application Secrets Secret

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!