Over a million developers have joined DZone.
Platinum Partner

Capturing a Web Page Without Stylesheets

· Web Dev Zone

The Web Dev Zone is brought to you in partnership with Mendix.  Discover how IT departments looking for ways to keep up with demand for business apps has caused a new breed of developers to surface - the Rapid Application Developer.

It is amazing to live in an environment where the Internet connection is ubiquitous and fast. But in case the tube is having a problem and the bits from the web server are broken into random pieces, how does the web site look like? If the content degrades gracefully, the lack of style sheets may reduce the attractiveness of the page but it should not significantly hamper the experience. Fortunately, there is a way to automatically check the appearance of a web page under that circumstance.

Some time ago, I have demonstrated the use of PhantomJS, headless WebKit, to capture web pages programmatically. The example was also extended to capture just a particular portion of the page via clipping. For CSS-less capture, we just need to extend it with the new feature in PhantomJS 1.9 (as implemented by Vitaliy Slobodin): the ability to abort network requests.

There is a example loadurlwithoutcss.js which demonstrates this feature. In fact, combining this idea with the previous BBC News site capture, we can come up with the following screenshots. The left side shows the normal page (see my previous blog post on web clipping) while the right side demonstrates what happens when all the CSS files are not loaded at all.


The script which produces the above image is as follows:

var page = require('webpage').create();
page.settings.userAgent = 'WebKit/534.46 Mobile/9A405 Safari/7534.48.3';
page.settings.viewportSize = { width: 400, height: 600 };
page.onResourceRequested = function(requestData, request) {
    if ((/http:\/\/.+?\.css$/gi).test(requestData['url'])) {
        console.log('Skipping', requestData['url']);
page.open('http://m.bbc.co.uk/news/health', function (status) {
    if (status !== 'success') {
        console.log('Unable to load BBC!');
    } else {
        window.setTimeout(function () {
            page.clipRect = { left: 0, top: 0, width: 400, height: 600 };
        }, 1000);

It is pretty similar to its previous version. The new addition is a handler for onResourceRequested where we detect the URL for a style sheet and abort its loading. If the script is executed, it will display the message:

Skipping http://static.bbci.co.uk/frameworks/barlesque/2.45.9/mobile/3.5/style/main.css
Skipping http://static.bbci.co.uk/bbcdotcom/0.3.184/style/mobile/bbccom.css
Skipping http://static.bbci.co.uk/news/1.7.1-259/stylesheets/core.css
Skipping http://static.bbci.co.uk/news/1.7.1-259/stylesheets/compact.css

which indicates that these 4 (four) style sheets won’t be part of the rendered output.

The entire process is rather straightforward. Because PhantomJS is cloud-ready, you can even have it running on an instance of Amazon EC2. It should not be too difficult to include this type of spartan rendering of your web site as another layer in the defensive development workflow.

What do you plan to de-CSS-ify today?

The Web Dev Zone is brought to you in partnership with Mendix.  Learn more about The Essentials of Digital Innovation and how it needs to be at the heart of every organization.


Published at DZone with permission of Ariya Hidayat , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}