Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Compete.com Webstats Scrape Groovy

DZone's Guide to

Compete.com Webstats Scrape Groovy

· ·
Free Resource
// description of your code here
This is a script for collecting webstats data from compete.com. The scripts takes as input the list of domains that you want to analyze and outputs the compete.com webstats data.

import com.gargoylesoftware.htmlunit.WebClient
import com.gargoylesoftware.htmlunit.BrowserVersion

def domainList = (new File("/root/Desktop/Morningstar/AlexaTop3000.txt")).readLines()
def outFile = new File("/root/Desktop/Morningstar/CompeteStats3000.csv")
outFile.delete()
def wc = new WebClient( BrowserVersion.FIREFOX_3_6 )

domainList.each {
  def domainName = it.trim()
  println domainName
  def url = "http://siteanalytics.compete.com/export_csv/${domainName}/"
  def page = wc.getPage( url )
  def pageLines = page.getContent().split("\n")

  def lineCount = 0
  pageLines.each { line ->
   if ( lineCount > 3 ) {
     outFile.append( "\"${domainName}\",${line}\n" )
   }
   lineCount++
  }
  sleep( 400 )
}
Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}