DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone >

Trac Subversion Repository Scraper

Dean Holdren user avatar by
Dean Holdren
·
Jan. 27, 07 · · Code Snippet
Like (0)
Save
Tweet
552 Views

Join the DZone community and get the full member experience.

Join For Free
Ruby script to pull code from a Trac repo browser.  Useful if you are behind a firewall/proxy and the server has no http access to the subversion repo. (I'm sure its not the prettiest/best/rubiest way to do it, but it worked for me)


#!/usr/bin/env ruby

require 'rubygems'
require 'hpricot'
require 'open-uri'


#a Trac repo scraper. pass the url to scrape (the root of a repo)
#  and optionally the local path to write to.  defaults to .

class TracRepoScraper

  def initialize(trac_url, local_path='.')
    @trac_url = trac_url

    trac_url =~ /(http:\/\/.*?)\//
    @trac_server = $1
    @local_path =  local_path
  end

  def getallfiles(url,cur_localpath)
    if cur_localpath != '.'
      Dir.mkdir(cur_localpath)
    end
    doc = Hpricot(open(url).read)
    doc.search("//tbody//tr//td//a[@class='file']").each do |file_anchor|
      #get the file as curpath+/file_name

      #following gives us absolute path (excluding domain)
      actual_file_url = @trac_server + file_anchor['href']+'?format=raw'
      #temp
      puts "Saving #{actual_file_url} to #{cur_localpath}/#{file_anchor.inner_html}"
      #read the file and write to a file in the correct directory
      File.open(cur_localpath+"/"+file_anchor.inner_html, 'w') do |f|
        remote_file = open(actual_file_url)
        remote_file.each { |line|
          f.puts(line)
        }
      end
    end

    doc.search("//tbody//tr//td//a[@class='dir']").each do |dir_anchor|
      #go into the directory
      dir_url = @trac_server + dir_anchor['href']
      puts "*** stepping into #{dir_url}"
      #dir_anchor.inner_html is the name of the subdirectory (relative)
      getallfiles(dir_url, cur_localpath+"/"+dir_anchor.inner_html)
    end
  end

  def start
    getallfiles(@trac_url, @local_path)
  end
end


#### main

trac_url = ARGV[0]
localpath = '.'
if ARGV[1]
  if !ARGV[1].strip.empty?
    localpath = ARGV[1].strip
  end
end

TracRepoScraper.new(trac_url, localpath).start
source control TRAC (programming language) Repository (version control)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Making Machine Learning More Accessible for Application Developers
  • Migrating Secrets Using HashiCorp Vault and Safe CLI
  • Debugging Deadlocks and Race Conditions
  • Using Unsupervised Learning to Combat Cyber Threats

Comments

Partner Resources

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo