DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • OneStream Fast Data Extracts APIs
  • Building Threat Intelligence Pipelines Using Python, APIs, and Elasticsearch
  • MuleSoft IDP: Enhancing Efficiency and Accuracy in Data Extraction
  • Implementing Secure API Gateways for Microservices Architecture

Trending

  • Detecting Bugs and Vulnerabilities in Java With SonarQube
  • How SaaS Architectures Break at Scale — and the Engineering Decisions That Prevent It
  • A Walk-Through of the DZone Article Editor
  • Optimizing High-Volume REST APIs Using Redis Caching and Spring Boot (With Load Testing Code)
  1. DZone
  2. Data Engineering
  3. Databases
  4. Inside a Large Retailer’s Web Architecture: Data Extraction and Analytical Insights

Inside a Large Retailer’s Web Architecture: Data Extraction and Analytical Insights

In this article, I will go over how to use Costco API to get purchase information and build some tooling around it, as well as share some insights.

By 
Rajesh Vakkalagadda user avatar
Rajesh Vakkalagadda
·
Dec. 26, 25 · Analysis
Likes (1)
Comment
Save
Tweet
Share
1.4K Views

Join the DZone community and get the full member experience.

Join For Free

Recently, I was trying to identify and understand my daily sugar and other nutrient intake. So, I was trying to find ways to get my bills sorted and identify specific product items. Whether shopping at Whole Foods, QFC, or Costco, I wanted to access my information in an accessible way.


To solve this problem, I probably need an app that will read my bills and categorize accordingly. There must be an existing some app for that, or we can now use AI to automate most of it.

To start somewhere, I looked into my Costco purchases (at the same time, I also got their CITI credit card, which is awesome for Costco purchases). And I was browsing their website to get my receipts. To my surprise, Costco only allows downloading up to three months of data at a time. This was not sufficient for me, as I wanted to get my data in full.

After I looked into their network calls, I was able to identify that they are using graphql to fetch their information (this is authenticated, so there are no bugs on their end). I was eager to explore this API that was visible for users. 

HTML
 
https://ecom-api.costco.com/ebusiness/order/v1/orders/graphql


To make any API calls, you also need costco-x-authorization: header; again, if you already logged in, then there is no need to figure out this information. We only need it once to download all the information.

The payload they used for getting the information is surprising; it's a schema and also a text of the query and start and end date. 

JSON-LD
 
{
  "query": "query receiptsWithCounts($startDate: String!, $endDate: String!,$documentType:String!,$documentSubType:String!) {\n    receiptsWithCounts(startDate: $startDate, endDate: $endDate,documentType:$documentType,documentSubType:$documentSubType) {\n    inWarehouse\n    gasStation\n    carWash\n    gasAndCarWash\n    receipts{\n    warehouseName receiptType  documentType transactionDateTime transactionBarcode warehouseName transactionType total \n    totalItemCount\n    itemArray {  \n      itemNumber\n    }\n    tenderArray {   \n      tenderTypeCode\n      tenderDescription\n      amountTender\n    }\n    couponArray {  \n      upcnumberCoupon\n    }  \n  }\n}\n  }",
  "variables": {
    "startDate": "9/01/2025",
    "endDate": "11/30/2025",
    "text": "Last 3 Months",
    "documentType": "all",
    "documentSubType": "all"
  }
}

Now that we know that there are some parameters, I realized we should obviously change the parameters to some random dates and see if this works.


Sample output:

JSON
 
{
  "data": {
    "receiptsWithCounts": {
      "inWarehouse": "",
      "gasStation": "",
      "carWash": "",
      "gasAndCarWash": "",
      "receipts": [
        {
          "warehouseName": "",
          "receiptType": "",
          "documentType": "",
          "transactionDateTime": "",
          "transactionBarcode": "",
          "transactionType": "",
          "total": "",
          "totalItemCount": "",
          "itemArray": [
            {
              "itemNumber": ""
            }
          ],
          "tenderArray": [
            {
              "tenderTypeCode": "",
              "tenderDescription": "",
              "amountTender": ""
            }
          ],
          "couponArray": []
        }
      ]
    }
  }
}


If we look into the response structure, we will find that there are itemNumbers, which are Costco's internally tracked numbers, costco 1823420.A quick Google search using this will return the first result we can use to identify the product. In this case, it is "Kirkland Signature Rustic Italian Bread, 32 oz." Now that we have a way to determine the item and what it represents, we can figure out how to get the ingredients and other sugar content.

Because of the start and end date exposed for users, I thought of expanding the range to see how large can these numbers go. By just changing the startDate to some date in 2018, I was able to get the entire purchase information.  I decided to write a script to parse this information and get some meaningful insights.


Shell
 
curl 'https://ecom-api.costco.com/ebusiness/order/v1/orders/graphql' \
  -H 'Accept: */*' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Type: application/json-patch+json' \
  -H 'Origin: https://www.costco.com' \
  -H 'Pragma: no-cache' \
  -H 'Referer: https://www.costco.com/' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: same-site' \
  -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36' \
  -H 'client-identifier: ' \
  -H 'costco-x-authorization: \
  -H 'costco-x-wcs-clientId: ' \
  -H 'costco.env: ecom' \
  -H 'costco.service: restOrders' \
  -H 'sec-ch-ua: "Not:A-Brand";v="24", "Chromium";v="134"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  --data-raw $'{"query":"query receiptsWithCounts($startDate: String\u0021, $endDate: String\u0021,$documentType:String\u0021,$documentSubType:String\u0021) {\\n    receiptsWithCounts(startDate: $startDate, endDate: $endDate,documentType:$documentType,documentSubType:$documentSubType) {\\n    inWarehouse\\n    gasStation\\n    carWash\\n    gasAndCarWash\\n    receipts{\\n    warehouseName receiptType  documentType transactionDateTime transactionBarcode warehouseName transactionType total \\n    totalItemCount\\n    itemArray {  \\n      itemNumber\\n    }\\n    tenderArray {   \\n      tenderTypeCode\\n      tenderDescription\\n      amountTender\\n    }\\n    couponArray {  \\n      upcnumberCoupon\\n    }  \\n  }\\n}\\n  }","variables":{"startDate":"6/01/2017","endDate":"8/31/2025","text":"2025 June - August","documentType":"all","documentSubType":"all"}}'

This is sample shell command I used to get the order information; we can change the start date to a very old date and we get complete information.


Python
 
import json
from collections import Counter

# Load JSON from file
with open("output.json", "r") as f:
    data = json.load(f)

receipts = data["data"]["receiptsWithCounts"]["receipts"]

# -----------------------------
# 1. Compute total purchase sum
# -----------------------------
total_purchase = sum(r.get("total", 0) for r in receipts)

# -------------------------------------
# 2. Count item numbers across receipts
# -------------------------------------
item_counter = Counter()

for r in receipts:
    for item in r.get("itemArray", []):
        item_num = item.get("itemNumber")
        if item_num:
            item_counter[item_num] += 1

# Get top N items (default: 20)
top_items = item_counter.most_common(20)

# -----------------------------
# Print results
# -----------------------------
print("Total Purchase Amount:", total_purchase)
print("\nTop Purchased Item Numbers:")
for item, count in top_items:
    print(f"ItemNumber: {item}, Count: {count}")


Response from the above code:

Plain Text
 
ItemNumber: 96716, Count: 39
ItemNumber: 1659031, Count: 37
ItemNumber: 782796, Count: 36

My topmost purchases from Costco were spinach, milk, and water bottles. I could also dig deep into the locations, gas purchases, and other inventory. However, this enabled me to understand my purchasing patterns from Costco.


Just to make sure this is a not a bug, I already filed a bug for this and was told that this is not a bug. That's why I waited for a long time before I even posted this article.

There are few learnings we can gain from this information. One interesting pattern I observed from this is that Costco can learn when I was out for vacation (because I would not get my gas/groceries from there). It would also know when I was on a road trip (again gas and store patterns). This shows that information is very powerful and can be interpreted in different ways.  

As for other implementation learnings to consider, I would still consider these as very minor bugs:

  • Do not expose start and and end dates in your API , 
  • Probably add validations around the API calls.
  • There should be a wrapper around the graphQL layer to avoid exposing that this system is used internally.

Thanks for reading so far; in my next article, I will share about my personal surveillance hack I did using my vacuum cleaner and security camera (this is work in progress and will take some time). 

API Data extraction

Opinions expressed by DZone contributors are their own.

Related

  • OneStream Fast Data Extracts APIs
  • Building Threat Intelligence Pipelines Using Python, APIs, and Elasticsearch
  • MuleSoft IDP: Enhancing Efficiency and Accuracy in Data Extraction
  • Implementing Secure API Gateways for Microservices Architecture

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook