Over a million developers have joined DZone.

Analysis of a Mobile Redirection Framework and Obfuscated Regular Expressions

· Mobile Zone

I don’t often read Haaretz, but there are from time to time articles that friends share on Facebook or come up in search results – and I find myself on the Haaretz website. Often enough, it happens on my mobile phone – and every time I find myself redirected to a very primitive version of the website. Compare for yourself:

imageimage
(Screenshot on the right obtained by changing the user agent in the Chrome Canary build. Very nice built-in feature.)

I was curious what were the criteria used by the Haaretz website to do this redirect, and started sniffing around the traffic with Fiddler. After most of the Haaretz front page has been downloaded, the browser suddenly issued a request for g.watap.net/w2w/haaretz, which issues not one, but two 302 redirects and eventually lands on the crippled mobile version.

image

Interestingly, I’ve tried more than one mobile user agent, and the resulting mobile website was pretty much the same (so I am getting the same experience with an iPhone, Android, or a feature-phone). I believe this is a poor choice on Haaretz’s behalf, so I started investigating a little.

I started by running a whois query on watap.net, and found that it’s registered through Go Daddy for PassCall Advanced Technologies. Then I turned my attention to PassCall, and found on their website that they are providing a platform that adapts existing websites to mobile browsing. Indeed, I find Haaretz in their list of customers. From what I could tell, all the customers are Israeli companies, and the g.watap.net host resolves to an Israeli IP address, probably hosted by NetVision, a major Israeli ISP.

What is the precise process used by PassCall to determine whether or not to redirect my browser to the dumbed-down mobile version? I was brave enough to start reading through the ~8500 lines of HTML and script that is the Haaretz front page. Very close to the beginning there’s a copyright notice by PassCall with a minified script. I won’t paste the whole thing, but here’s a start:

var passcall_pcmdt={i$i:function(){try{eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--){d[e(c)]=k[c]||e(c)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('3 f=D(p,q,o,2,e){4(6.7.a(\'5=0\')>-1)m;3 b=s r();b.t(b.n()+B);3 d=s r();d.t(d.n()+1);4(9.c.a(\'y=1\')>-1){6.7=\'5=0; j=/; i=\'+d.l();m}E 4(9.c.a(\'G=1\')>-1){6.7=\'5=0; j=/; i=\'+b.l();m}3 8=6.7.a(\'5=1\')>-1;4(!8){3 v=p.h(k.g);3 x=q.h(k.g);3 u=!o.h(k.g);8=(v||x)&&u;6.7=\'5=\'+F(I(8))+"; j=/; i="+b.l()}4(8){2=9.c.w(9.H,2);4(C e!=\'z\'&&e.A){2+=2.a(\'?\')>-1?\'&\':\'?\';2=2.w(\'?&\',\'?\');2+=e}9.c=2}}',45,45,'||r\x65\x64\x69rt\x6f|\x76\x61r|\x69\x66|___\x70\x63\x6d\x64\x74___|\x64\x6f\x63\x75men\x74|\x63\x6f\x6f\x6b\x69\x65|\x72\x65\x64\x69\x72|l\x6f\x63\x61ti\x6fn|\x69n\x64\x65x\x4ff||\x68\x72\x65\x66|\x62\x62|p\x61r\x61ms||u\x73\x65rAge\x6e\x74|\x74\x65\x73\x74|\x65\x78\x70\x69\x72\x65\x73|\x70a\x74h|\x6e\x61\x76\x69\x67\x61t\x6fr|toU\x54\x43S\x74\x72in\x67|\x72e\x74\x75\x72\x6e|g\x65\x74D\x61te|\x723|\x721|r2|\x44\x61\x74\x65|\x6e\x65\x77|\x73\x65\x74D\x61\x…

I took this beauty to jsbeautifier.org where it got a much prettier shape. Here’s the first part, beautified:

var passcall_pcmdt = {
    i$i: function () {
        try {
            eval(function (p, a, c, k, e, d) {
                e = function (c) {
                    return (c < a ? '' : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36))
                };
                if (!''.replace(/^/, String)) {
                    while (c--) {
                        d[e(c)] = k[c] || e(c)
                    }
                    k = [function (e) {
                        return d[e]
                    }];
                    e = function () {
                        return '\\w+'
                    };
                    c = 1
                };
                while (c--) {
                    if (k[c]) {
                        p = p.replace(new RegExp('\\b' + e(c) + '\\b', 'g'), k[c])
                    }
                }
                return p
            }(…

Okay, this is obviously an unpacker – it even says function (p, a, c, k, e, d) right there. Thanks for the hint. So the first part is a slightly minified unpacker, and the hex-encoded strings (not shown here) are probably the actual code. Instead of trying to run the unpacking algorithm with a pen and paper, I simply put a breakpoint in the beginning of the script and started stepping in and out until I got this beautiful function, called f, which does the interesting part:

var f = function(r1, r2, r3, redirto, params) {
  if (document.cookie.indexOf('___pcmdt___=0') > -1)
    return;
  var b = new Date();
  b.setDate(b.getDate() + 360);
  var bb = new Date();
  bb.setDate(bb.getDate() + 1);
  if (location.href.indexOf('snopcmdt=1') > -1) {
    document.cookie = '___pcmdt___=0; path=/; expires=' + bb.toUTCString();
    return
  } else if (location.href.indexOf('nopcmdt=1') > -1) {
    document.cookie = '___pcmdt___=0; path=/; expires=' + b.toUTCString();
    return
  }
  var redir = document.cookie.indexOf('___pcmdt___=1') > -1;
  if (!redir) {
    var b1 = r1.test(navigator.userAgent);
    var b2 = r2.test(navigator.userAgent);
    var b3 = !r3.test(navigator.userAgent);
    redir = (b1 || b2) && b3;
    document.cookie = '___pcmdt___=' + parseInt(Number(redir)) + "; path=/; expires=" + b.toUTCString()
  }
  if (redir) {
    redirto = location.href.replace(location.host, redirto);
    if (typeof params != 'undefined' && params.length) {
      redirto += redirto.indexOf('?') > -1 ? '&' : '?';
      redirto = redirto.replace('?&', '?');
      redirto += params
    }
  location.href = redirto
  }
}

Note how this is no longer obfuscated, and perfectly readable. The script starts by checking if there is a cookie instructing it whether to do the mobile redirect or not. Marked in bold are the interesting parts – this is what we get if we have to make a new decision – and then the redirect itself is simply replacing location.href with a new location. The whole redirect-or-not logic boils down to three regular expressions (r1, r2, r3). Let’s take a look at these regular expressions. Here is r1:

^(((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))|((X|6|E)Z(O|2|j)S)|((9|H|6)D(q|2|h)_(h|3|T))|((H|7|1)D_m(i|0|j)n)|((9|I|6)C(v|1|o)p(q|8|p)i(q|e|j)d(v|8|F)r(v|o|q)m(4|P|9)a(s|2|0)s(c|4|X)a(v|7|l)l(q|7|C)o(q|6|d)e(j|4|0)h(a|6|0)a(r|0|q)e(t|5|x)z)|((v|8|L)G(E|7|3)?[-/_])|((0|M|6)a(u|3|Q)i B(r|7|x)o(w|4|0)s(q|6|e)r)|((P|0|h)C(L|2|q)[4-6][4-6])|((q|6|S)E(h|C|Q)-)|((v|S|j)G(H|2|3)-)|((S|0|2)I(E|X|h)-)|((j|4|S)K_)|((q|6|S)O(j|4|N)I(M|x|q))|((8|S|4)e(0|n|4)d(j|o|Q))|((j|8|T)e(j|l|X)i(4|t|6))|((h|p|q)o(q|r|5)t(q|a|h)l(q|m|v)m(h|8|m)))

Looks like a bad-ass regular expression? Not at all. In fact, this is just a light attempt at obfuscating the regular expression without changing its meaning too much. Note that the whole thing is just a big disjunction over a bunch of strings. Here’s the first component:

((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))

What could it possibly be? Obviously, it’s “Alcatel”:

((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))

How about this guy?

((h|p|q)o(q|r|5)t(q|a|h)l(q|m|v)m(h|8|m))

This one is “portalmmm”, which apparently is a mobile user agent used by i-mode mobile browsers. Finally, what is this:

((9|I|6)C(v|1|o)p(q|8|p)i(q|e|j)d(v|8|F)r(v|o|q)m(4|P|9)a(s|2|0)s(c|4|X)a(v|7|l)l(q|7|C)o(q|6|d)e(j|4|0)h(a|6|0)a(r|0|q)e(t|5|x)z)

Fairly long to be a mobile user agent. Indeed, it becomes ICopiedFromPasscallCode?haaretz – which is a rudimentary copy-protection mechanism.

I can tell you right away that r2 is no different:

((a|3|Q)n(v|5|d)r(o|3|2)i(d|3|x))|((X|6|b)l(a|2|j)c(k|X|q)B(4|e|8)r(r|Q|0)y)|((G|x|h)T-P(v|9|1)0(q|0|x)0)|((H|7|Q)T(q|6|C))|((Q|6|H)u(6|a|5)w(X|5|e)i[u/-])|((i|1|0)p(h|5|a)d)|((q|i|v)p(h|9|0)o(h|6|n)e)|((j|2|m)o(5|t|8)o(4|r|8)o(l|2|q)a)|((4|M|7)O(T|7|0)[-_])|((8|n|6)o(k|3|x)i(h|2|a))|((s|Q|1)o(x|4|n)y(x|8|e)r(q|8|i)c(s|5|Q)s(o|3|Q)n)|((h|6|s)a(4|m|5)s(h|u|x)n(g|6|3))|((j|3|P)a(l|X|h)m)|((x|5|p)h(h|7|i)l(i|7|2)p(3|s|5))|((v|U|Q)P.(v|6|B)r(o|0|j)w(s|7|Q)e(r|3|q))|((w|5|2)i(7|n|5)d(q|3|o)w(6|s|4) (((9|p|5)h(o|X|q)n(e|x|q))|((h|c|q)e)))|((8|I|7)E(X|8|M)o(h|5|b)i(h|6|l)e)|((9|V|7)o(d|Q|j)a(f|8|Q)o(X|4|n)e)|((X|9|o)p(q|e|j)r(h|a|j) (v|m|q)o(v|b|x)i)|((4|o|5)p(e|0|2)r(v|4|a) (2|m|5)i(v|9|n)i)|((q|7|s)y(q|m|0)b(q|6|i)a(n|0|q))|((X|8|1)o(o|X|0)m)|( P(q|r|h)e[/])

This yields stuff like “android”, “ipad”, “iphone”, “windows phone”, “Xoom”, and many others. And then there’s r3, which will disqualify a user agent from redirecting to the mobile site:

((i|3|Q)p(v|5|a)d)|((q|9|V)i(7|e|8)w(j|P|X)a(Q|8|d))|((M|7|Q)Z(q|6|9)0(1|X|q))|((q|8|G)T-P(q|2|1)0(3|0|8)0)|((q|6|G)T-P(v|7|Q)5(v|8|0)0)|((j|5|X)o(3|o|5)m)

…and here we have “ipad” and “Xoom” again, which is quite silly because we just saw them in r2. Probably the obfuscation layer makes it hard for the PassCall developers to make changes :-)

All in all, here is a (partial) set of user agents that PassCall will redirect to the mobile website and a (partial) set of user agents that they won’t (the list is based on what I’ve seen on Haaretz’s website on January 31, 2012):

Will redirect if starts with: Alcatel, ICopiedFromPasscallCode?haaretz, LGE-, Maui Browser, SEC-, SGH-, SIE-, SK-, SONIM-, Sendo-, Telit-, portalmmm

Will redirect if contains: android, blackBerry, GTP-?, HTC, Huawei, ipad, iphone, motorola, MOT-, nokia, sonyericcson, samsung, Palm, philips, UP.Browser, windows phone, windows ce, IEMobile, Vodafone, opera mobi, opera mini, symbian, Xoom, Pre

Will not redirect if contains: ipad, ViewPad, MZ601, Xoom

I am curious about other tablets, such as the Samsung Galaxy Tab, meeting the criteria for a mobile device. Indeed, with the Galaxy Tab user agent (“Mozilla/5.0 (Linux; U; Android 3.0; xx-xx; GT-P7100 Build/HRI83) AppleWebkit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13”) we are getting the mobile version. And the most annoying thing? I don’t see a way on the mobile version to switch back to the desktop version if I’d like. And that’s the number one fallback you should have if you’re using blunt regular expressions to determine which website to show me.

To summarize:

  • Haaretz is using PassCall Advanced Technologies to redirect its mobile visitors to a crippled mobile version of the website
  • PassCall is using a set of regular expressions to determine whether a user’s user agent represents a mobile device and performs an unconditional redirect
  • PassCall provides the same mobile experience for a 2011 iPhone 4S and a 2005 Nokia feature-phone
  • There doesn’t seem to be a way to get back to the desktop version from the dumbed-down mobile one

I would greatly appreciate any comments and corrections. This research has been performed for personal purposes and does not represent the position of my employer.

Topics:

Published at DZone with permission of Sasha Goldshtein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}