Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Compete.com Webstats Scrape Groovy

DZone's Guide to

Compete.com Webstats Scrape Groovy

·
Free Resource
// description of your code here
This is a script for collecting webstats data from compete.com. The scripts takes as input the list of domains that you want to analyze and outputs the compete.com webstats data.

import com.gargoylesoftware.htmlunit.WebClient
import com.gargoylesoftware.htmlunit.BrowserVersion

def domainList = (new File("/root/Desktop/Morningstar/AlexaTop3000.txt")).readLines()
def outFile = new File("/root/Desktop/Morningstar/CompeteStats3000.csv")
outFile.delete()
def wc = new WebClient( BrowserVersion.FIREFOX_3_6 )

domainList.each {
  def domainName = it.trim()
  println domainName
  def url = "http://siteanalytics.compete.com/export_csv/${domainName}/"
  def page = wc.getPage( url )
  def pageLines = page.getContent().split("\n")

  def lineCount = 0
  pageLines.each { line ->
   if ( lineCount > 3 ) {
     outFile.append( "\"${domainName}\",${line}\n" )
   }
   lineCount++
  }
  sleep( 400 )
}
Topics:

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}