Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Running Chrome on AppEngine

DZone's Guide to

Running Chrome on AppEngine

The next generation of scraper’s and tools will all be built by running a real web browser against a web page and introspecting everything about it.

· Integration Zone ·
Free Resource

SnapLogic is the leading self-service enterprise-grade integration platform. Download the 2018 GartnerMagic Quadrant for Enterprise iPaaS or play around on the platform, risk free, for 30 days.

I’ll let you into a little secret. Seven years ago when I joined Google, I thought I was going to be doing Developer Relations for App Engine. At the time, I was a full-stack web developer building a huge number of web, iPhone, and Android apps all hosted on AppEngine. Instead, I was assigned to work 50% on iGoogle and 50% on Chrome. I loved Chrome as a product, but I really disliked iGoogle as a product — I almost considered quitting before I even started, but Chrome was cool. We had just started a small team, and that was what I fell in love with. Within months I was on it full-time.

App Engine was brilliant. I loved that I could build a paid for service and then let it scale and not have to worry about what we now call DevOps. The problem with AppEngine was that it only supported Python (soon it would support Java) and it was heavily sandboxed, meaning that a lot of the interesting things that I wanted to scale, I couldn’t.

Jump forward seven years and there has been a huge change in the industry especially around virtualization. Docker is almost ubiquitous, and best of all, you can combine it with App Engine so that you can now host Google Chrome on App Engine in these three simple steps.

1. Set Up an App

Create an app.yaml file that says that you want a custom runtime and you want to use the flex environment.

runtime: custom env: flex 

2. Create a Docker Image That Launches Headless Chrome

This image originally came from Justin Riberio, and I’ve modified it a bit, but it takes a build of Headless Chrome (that is, a version of Chrome that can run entirely on the command line) and a simple nodeJS file that connects to Chrome via the DevTools protocol.

# Base docker image FROM ubuntu:16.04 MAINTAINER Paul Kinlan 
# <paulkinlan@google.com> 
# MAINTAINER Justin Ribeiro <justin@justinribeiro.com> 
# Experimental! # # To run: # docker run -d --net host --name headless headless_chrome 
# 
#  Access: 
# http://localhost:9222/ 
# Pull my chrome-headless build ADD chrome-headless.deb /src/chrome-headless.deb 
# Setup deps, install chrome-headless 
RUN apt-get update && apt-get install -y \ build-essential \ software-properties-common \ ca-certificates \ byobu curl git htop man unzip vim wget \ sudo \ gconf-service \ libcurl3 \ libexif-dev \ libgconf-2-4 \ libglib2.0-0 \ libgl1-mesa-dri \ libgl1-mesa-glx \ libnspr4 \ libnss3 \ libpango1.0-0 \ libv4l-0 \ libxss1 \ libxtst6 \ libxrender1 \ libx11-6 \ libxft2 \ libfreetype6 \ libc6 \ zlib1g \ libpng12-0 \ wget \ apt-utils \ xdg-utils \ --no-install-recommends && \ dpkg -i '/src/chrome-headless.deb' && \ curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash - && \ sudo apt-get install -y nodejs && \ sudo apt-get install -y libnss3 && \ rm -rf /var/lib/apt/lists/* COPY ./node_modules /opt/stickmanventures/node_modules ADD ./index.js /opt/stickmanventures/index.js ADD ./package.json /opt/stickmanventures/package.json WORKDIR /opt/stickmanventures/ # expose 8080 so we can connect to it EXPOSE 8080 CMD ["node", "index.js", "/opt/stickmanventures/chrome-headless/headless_shell" ] 

3. Scale and Profit

OK, it's a little more than three steps; you have to have a simple node web app that can take HTTP requests and forward them on to Chrome. For example, I have a /list end point that will connect to Chrome and return a list of open tabs.

app.get('/list', (req, res) => 
        { chrome.List((err, tabs) => 
                      { if (!err) 
                      { console.log(tabs); res.json(tabs); } 
                       else { res.json(tabs); } }); }); 

You also need to implement the health monitoring API so that App Engine knows it should spin up a new instance of your service, but I hope you at least get the idea.

Why?

The next generation of scraper’s and tools will all be built by running a real web browser against a web page and introspecting everything about it.

  • Want to work out what CSS is used on the first render? No problem.
  • Want to get accurate first paint times? No biggie.
  • Want to analyze all the network requests made by a page? Simple.
  • Want to see if there any CSP violations? Great. Not too hard.
  • Want to host a version of Lighthouse? Yep. Cool.

The possibilities are endless. I want to play around with the idea of zero traditional servers. As a user, you are always running a version of your web app client-side, but it just so happens the client is on the server.

Where can I see this running?

The source for my app is here.

What’s Next?

I’ve been playing around with Google Cloud Functions (think Amazon Lambda but on Google) a fair bit recently and I think in the near future it might be able to have Chrome running to service one request. The benefit is that Chrome would only live for the lifetime of the function call, and that means that you could secure and isolate the storage (and more) from other users.

With SnapLogic’s integration platform you can save millions of dollars, increase integrator productivity by 5X, and reduce integration time to value by 90%. Sign up for our risk-free 30-day trial!

Topics:
nodejs ,integration ,google chrome ,appengine ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}