Over a million developers have joined DZone.

PostgreSQL's Foreign Data Wrapper

DZone's Guide to

PostgreSQL's Foreign Data Wrapper

Have you ever needed to write a query that combined data from two PostgreSQL databases?

Free Resource

Running out of memory? Learn how Redis Enterprise enables large dataset analysis with the highest throughput and lowest latency while reducing costs over 75%! 

Have you ever needed to write a query that combined data from two PostgreSQL databases? Our client Healthify has.

Since Healthify works with healthcare and stores some patient information, they’re very careful not to send any personally identifiable information to third party services.

One such service is Segment, which Healthify uses to track user activity on the site. Conveniently, Segment can send your data to a PostgreSQL data warehouse, so Healthify has two PostgeSQL databases full of useful information:

  • Their app database, which their Rails app uses, and
  • a reporting database, with anonymised data from Segment.

To better understand how people are using the application, we wanted to answer some questions that needed data from both sources. We wanted to filter the reporting data using attributes from the app data, without having to pass those attributes to Segment. This blog post explains what we did.

Foreign Data Wrapper

PostgreSQL has a useful feature called Foreign Data Wrapper, which lets you create foreign tables in a PostgreSQL database that are proxies for some other data source. When you make a query against a foreign table, the Foreign Data Wrapper will query the external data source and return the results as if they were coming from a table in your database.

There are two foreign data wrappers that ship with PostgreSQL:

  • file_fdw to create foreign tables that represent flat files (Postgres 9.1 and later), and
  • postgres_fdw to create foreign tables that represent tables in another PostgreSQL database (Postgres 9.3 and later).

You can also define your own wrapper, or use a third-party wrapper (we’ve used tds_fdw on other thoughtbot projects), but for Healthify’s situation postgres_fdw was exactly what we needed.

Before we could create foreign tables in our reporting database to proxy the tables in our app database, we needed to do a bit of setup in the reporting database.

  1. Install the postgres_fdw extension:
    CREATE EXTENSION postgres_fdw;

  2. Create a server:
    CREATE SERVER app_database_server
      FOREIGN DATA WRAPPER postgres_fdw
      OPTIONS (host 'postgres.example.com', dbname 'my_app');

  3. Create a user mapping, which defines the credentials that a user on the local server will use to make queries against the remote server:
      SERVER app_database_server
      OPTIONS (user 'reporting', password 'secret123');

    Since we’re using this for reporting, we chose to create a separate user in our app database specifically for Foreign Data Wrapper to use. That user only has read permissions, so a badly written reporting query can’t accidentally INSERT or UPDATE data.

Now we’ve told Foreign Data Wrapper where the server is, and the credentials it can use to connect, we’re ready to define the foreign tables.

We could use CREATE FOREIGN TABLE for this, but we’d end up specifying all of the fields for each table we want to access. We’d effectively end up repeating our whole schema. Fortunately, Postgres 9.5 introduced the IMPORT FOREIGN SCHEMA command:


  FROM SERVER app_database_server
  INTO app;

This will create foreign tables for all of the tables from our app database’s public schema into our reporting database’s app schema.

A schema in PostgreSQL is just a namespace for grouping tables. If you haven’t explicitly specified a schema then you’re implicitly using the public schema.

We could import from the app database’s public schema into the reporting database’s public schema, but keeping all of the foreign tables in a separate schema means we don’t have to worry about naming collisions.

Segment also recommends that if you’re going to put any additional tables in a PostgreSQL database that you’re using as a Segment warehouse you should make sure it’s in a separate schema:

If you want to insert custom data into your warehouse, create new schemas that are not associated with an existing source, since these may be deleted upon a reload of the Segment data in the cluster.

Now that we’ve imported our schema, we can query the foreign tables as if they were normal, local tables.

SELECT COUNT(*) FROM app.users;

We can even JOIN between a local table and a foreign table.

  JOIN app.users
  ON production.events.user_id::INT = app.users.id
  app.users.company_id = 10;

It Ain’t All Good 

Handling Schema Changes 

When we ran IMPORT FOREIGN SCHEMA we created foreign tables based on the current structure of the app database. If the app database changes  —if we add a table, or drop a column — our foreign tables will need updating too.

To work around this we added some code to update our reporting database’s foreign tables every time we run the migrations on the production app.

Rake::Task["db:migrate"].enhance do

The UPDATE_FOREIGN_TABLES_AFTER_MIGRATE environment variable lets us control which environments use this task. It’s always going to be useful in production, but rarely in development.

The db:export_schema_for_reporting task runs all of the queries we’ve previously seen, slightly modified to make them idempotent:


Rebuilding the server config and foreign tables every time we migrate keeps our reporting server up-to-date. Wrapping the whole thing in a transaction ensures queries against the foreign tables that are run while we’re running the db:export_schema_for_reporting task will be blocked until the task finishes, but won’t fail.

There were some fringe benefits from running this from our Rails application: we can work out what arguments to pass to CREATE SERVER by reading our application’s database configuration.

Once we had everything set up we learnt that we couldn’t always treat foreign tables exactly like local tables. In particular, the performance of queries that JOIN over multiple foreign tables was very poor.

Behind the scenes, PostgreSQL was making separate queries to the app database and then JOINing them after the fact in the reporting database.

Re-writing our JOINs as sub-queries improved things significantly.

Now that Postgres 9.6 has been released, this should be less of a problem. The Postgres 9.6 release notes say:

postgres_fdw now supports remote joins, sorts, UPDATEs, and DELETEs

See the original here: PostgreSQL's Foreign Data Wrapper.

Running out of memory? Never run out of memory with Redis Enterprise databaseStart your free trial today.

foreign data wrappers ,postgresql ,database ,schema

Published at DZone with permission of George Brocklehurst, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}