DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Document Generation API: How to Automate Personalized Document Creation at Scale
  • Extracting Clean Excel Tables From PDFs Using Python + Docling
  • Why Whole-Document Sentiment Analysis Fails and How Section-Level Scoring Fixes It
  • Google Cloud Document AI Basics

Trending

  • Getting Started With Agentic Workflows in Java and Quarkus
  • Why Round-Robin Won't Save You: Load Balancing Challenges in Data Streaming Services With Heterogeneous Traffic
  • Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering
  • Building AI-Powered Java Applications With Jakarta EE and LangChain4j

Downloading GCP Docs as a PDF

Here's a handy open sourced Node app that lets you compile and download GCP docs into PDFs.

By 
Monotosh K user avatar
Monotosh K
·
Jul. 18, 19 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
13.6K Views

Join the DZone community and get the full member experience.

Join For Free

Google Cloud Platform has done a nice job documenting its products (of which they have many). But to go through the documentation, we have only one option — visit the website and click through all the links in the left-hand navigation pane. Although that's still kind of handy, it can be a bit cumbersome, especially for cases like offline reading.

If you are like me and have felt that pain, then we might have a solution. I have created a small Node.js app to download GCP documentation by product as a PDF. The source code for the app is here. It's fairly simple to use, just clone the repo and run npm link. Once you have done that, you just run the gcp-docs command and answer a few questions.

Image title

The app uses Puppeteer, which is a ‘Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.’ So for a given product, it goes to the docs page, e.g. https://cloud.google.com/iam/docs/, and save the navigation links in a JSON file. Then it visits all the links and saves each page as a PDF. In the end, it merges all these PDFs into one PDF and deletes individual files and containing directory. Here are the output logs from the app:

Image title

As you can see, the app excludes a few links, like the starting page of a section that contains links for the content of the section. It also excludes links that are not from cloud.google.com and have ref or reference in the link (there are a lot of them in some products and they are not that useful in my opinion).

Hope it helps a few of you, as it did help me.

Credit: A special shout out to Amit Malhotra for polishing the app and getting his hands dirty fixing a lot of bugs.

PDF

Opinions expressed by DZone contributors are their own.

Related

  • Document Generation API: How to Automate Personalized Document Creation at Scale
  • Extracting Clean Excel Tables From PDFs Using Python + Docling
  • Why Whole-Document Sentiment Analysis Fails and How Section-Level Scoring Fixes It
  • Google Cloud Document AI Basics

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook