DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Creating Real-Time Dashboards With AWS AppSync
  • How To Set Up a Scalable and Highly-Available GraphQL API in Minutes
  • How Hasura 2.0 Works: A Design and Engineering Look
  • The Bill You Didn't See Coming

Trending

  • When Search Started Breaking at Scale: How We Chose the Right Search Engine
  • Setting Up Claude Code With Ollama: A Guide
  • The Prompt Isn't Hiding Inside the Image
  • Navigating the Complexities of AI-Driven Integration in Multi-Cloud Environments: A Veteran’s Insights
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Create Data Lineage With the Tableau GraphQL Metadata API

How to Create Data Lineage With the Tableau GraphQL Metadata API

Tableau has a rich metadata API exposed through a GraphQL interface that you can use to extract your data lineage.

By 
Grant Seward user avatar
Grant Seward
·
Dec. 08, 20 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
9.9K Views

Join the DZone community and get the full member experience.

Join For Free

I love data. The ways it can be used to curate value and express relationships never ceases to amaze me. To this extent, visualizing data is often one of the most powerful ways to share insights and Tableau certainly is one of - if not the - most popular data visualization tools on the market. It's extremely simple for non-technical users to develop rich and meaningful graphs with a pretty intuitive UI and there are some really nice features under the hood that are used to speed up query performance when extracts are stored within Tableau.

My absolute favorite Tableau feature is that you can query your metadata using the same GraphQL API that Tableau itself uses. A portion of the metadata exposed includes the lineage for the fields, sheets, tables and data stores that exist within your Tableau Site. Exposing the metadata via an extensive API like this is a really forward thinking idea from the team behind Tableau. 

How to Use the Tableau Metadata API

The Tableau Metadata API is exposed via GraphQL and Tableau is wrapped in a python library, the Tableau Server Client. This library is one of the easiest APIs to use - Tableau has simplified all authentication and serialization to allow users to just focus on the query they want to execute.

Pros:

  • The graph enables many different entities and data assets within Tableau to be queried
  • The API performance is really good, even when requesting a large number of multidimensional relationships
  • The Python client is extremely simple and intuitive to use, handling the authentication and serialization for the user

Cons:

  • The documentation is sparse - it's not clear when to expect upstream or downstream data lineage assets to be provided or when they will be null
  • A "full" lineage for each data asset is not available, you can only extract lineage from one step upstream or one step downstream (at least from what I could tell from using the API)
  • Tableau releases a new API version every quarter or so but the docs do not depict which features are available in which version

Let's look at some code that can be used to query your Tableau metadata.

Authentication

You can use the Tableau API by authenticating with your username and password but the more secure and suggested approach is to use a client token. I've also created a simple helper function below to authenticate and execute queries.

Python
 




x


 
1
import os
2
import tableauserverclient as TSC
3

          
4
TOKEN_NAME = os.environ.get('TOKEN_NAME' ,'some-token')
5
TOKEN = os.environ.get('TOKEN', 'your-token-value')
6
SITE_NAME = os.environ.get('SITE_NAME', 'your-site')
7

          
8
# If using Tableau Online this might be 'https://prod-useast-b.online.tableau.com'
9
SERVER = os.environ.get('SERVER', 'your-server')
10
SERVER_VERSION = os.environ.get('SERVER_VERSION', '3.9')
11

          
12
tableau_auth = TSC.PersonalAccessTokenAuth(TOKEN_NAME, TOKEN, SITE_NAME)
13
server = TSC.Server(SERVER)
14
server.version = SERVER_VERSION
15

          
16
# Helper function to run queries
17
def run_query(query):
18
    with server.auth.sign_in(tableau_auth):
19
        resp = server.metadata.query(query)
20
        resp = resp['data']
21
        if isinstance(resp, list):
22
            resp = resp[0]
23
        return resp



Define the Query

The Tableau Metadata API is a fantastic way to start learning GraphQL since Tableau handles all of the serialization for you and their Graph follows a consistent and easy to understand set of conventions.

The function below executes a query that will return all calculated fields that exist within your Site. The beautiful thing here with GraphQL is that we can simultaneously ask Tableau to return all of the fields that reference each of the calculated fields and we can go even deeper to request all of the sheets for each field that is referencing a calculated field.

Python
 




x


 
1
def get_all_calculated_fields(batch_size=100):
2
    all_calculated_fields = []
3
    has_next = True
4
    start = 0
5
    while has_next is True:
6
        query = """
7
        {
8
            calculatedFieldsConnection (first: %s, offset: %s){
9
                nodes {
10
                    id
11
                    name
12
                    formula
13
                    referencedByFields {
14
                        fields {
15
                          id
16
                          name
17
                          sheets {
18
                            id
19
                            name
20
                          }
21
                        }
22
                    }
23
                }
24
                pageInfo {
25
                    hasNextPage
26
                    endCursor
27
                }
28
            }
29
        }
30
        """ % (batch_size, start)
31
        resp = run_query(query)
32
        all_calculated_fields.extend(resp['calculatedFieldsConnection']['nodes'])
33
        start = start + batch_size
34
        if resp['calculatedFieldsConnection']['pageInfo']['hasNextPage'] == False:
35
            has_next = False
36

          
37
    return all_calculated_fields
2
def get_all_calculated_fields(batch_size=100):



Create Your Data Lineage

Now that you have your metadata from Tableau, how you structure and use the output is completely up to you. This example will define edges and nodes. These are the fundamental building blocks for network relationships and data lineage.

Python
 




x


 
1
def format_nodes_and_edges(calc_fields):
2
    nodes = []
3
    edges = []
4
    for calc in calc_fields:
5
        # Add each calculated field to the nodes
6
        calc_field_name = 'CalcField - ' + calc['name']
7
        nodes.append(calc_field_name)
8

          
9
        # For each field that references the calculated field, add a node
10
        for ref_field in calc['referencedByFields']:
11
            for field in ref_field['fields']:
12
                # Calculated fields may show up under referenced fields, if that 
13
                # happens, do not overwrite the existing node
14
                if field['id'] not in nodes:
15
                    field_name = 'Field - ' + field['name']
16
                    edges.append((calc_field_name, field_name))
17

          
18
                    # Create a reference to each sheet that uses this field
19
                    for sheet in field['sheets']:
20
                        sheet_name = 'Sheet - ' + sheet['name']
21
                        nodes.append(sheet_name)
22
                        edges.append((field_name, sheet_name))
23
    return list(set(nodes)), edges



View the Edges and Nodes

Running all of the functions above will now result in creating the objects required to visualize your data lineage. The nodes and edges can be plugged into just about any network visualization tool, such as NetworkX,  to view the output.

Python
 




x


 
1
calculated_fields = get_all_calculated_fields()
2
nodes, edges = format_nodes_and_edges(calculated_fields)
3
  
4
nodes 
5
# [
6
#   ...
7
#   'CalcField - Click-to-Open',
8
#   'Sheet - Sheet 5',
9
#   'CalcField - Minutes of Delay per Flight',
10
#   'Sheet - Opportunities ',
11
#   ...
12
# ]
13

          
14
edges 
15
# [
16
#   ...
17
#   ('CalcField - Difference from Region', 'Field - State'),
18
#   ('Field - State', 'Sheet - Obesity Scatter Plot'),
19
#   ('Field - State', 'Sheet - Obesity Map'),
20
#   ...
21
# ]



Closing Thoughts

I applaud Tableau for enabling this form of data access even though I believe this is a highly underutilized and under-leveraged benefit. Many companies do not fully make use of this metadata from Tableau to the full extent. Understanding how data moves and dependencies between data is such a critical feature especially as organizations try to maintain well-managed practices and controls around how their data is used. As you look to leverage Tableau metadata and data lineage within your company, make sure that you're taking the extra step to connect that data lineage with the upstream processes to give a complete and comprehensive perspective of your lineage.

API Data visualization Metadata GraphQL Database

Published at DZone with permission of Grant Seward. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Creating Real-Time Dashboards With AWS AppSync
  • How To Set Up a Scalable and Highly-Available GraphQL API in Minutes
  • How Hasura 2.0 Works: A Design and Engineering Look
  • The Bill You Didn't See Coming

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook