Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

An S3 File Bucket Downloader Written in Ruby

DZone's Guide to

An S3 File Bucket Downloader Written in Ruby

· Cloud Zone
Free Resource

Site24x7 - Full stack It Infrastructure Monitoring from the cloud. Sign up for free trial.

Today I wanted to download files from a website that I happened to find out that stored all files in S3. By accessing the website root, I realized that it was just the response of a S3 ListBucket API call. For instance:

    <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">  
       <Name>foo.com</Name>  
       <Prefix/>  
       <Marker/>  
       <MaxKeys>1000</MaxKeys>  
       <IsTruncated>true</IsTruncated>  
       <Contents>  
          <Key>file/1</Key>  
          <LastModified>2011-06-09T06:29:02.000Z</LastModified>  
          <ETag>"5cb3930839817ff4a5c1ddf08e3fea1e"</ETag>  
          <Size>1440231</Size>  
          <StorageClass>STANDARD</StorageClass>  
       </Contents>  
       <Contents>  
          <Key>file/2</Key>  
          <LastModified>2011-06-09T06:29:18.000Z</LastModified>  
          <ETag>"96fdc94d14b6d9817f80ac1e9e2049b4"</ETag>  
          <Size>1310</Size>  
          <StorageClass>STANDARD</StorageClass>  
       </Contents>  
    </ListBucketResult>  

In order to download all files more quickly, I wrote the following Ruby program that downloads all files from this website, and I hope it can be useful for others:

    require 'net/http'  
    require 'rexml/document'  
      
    baseurl = 'foo.com'  
      
    # get the XML data as a string  
    xml_data = Net::HTTP.get_response(URI.parse("http://" + baseurl)).body  
      
    # extract event information  
    doc = REXML::Document.new(xml_data)  
    titles = []  
    links = []  
    Net::HTTP.start(baseurl) do |http|  
      doc.elements.each('ListBucketResult/Contents/Key') do |ele|  
        puts "Downloading " + ele.text  
        resp = http.get("/" + ele.text)  
        open("images/" + ele.text.gsub("/", "_") + ".jpg", "wb") { |file|  
          file.write(resp.body)  
        }  
      end  
    end  
    puts "Done"  

Site24x7 - Full stack It Infrastructure Monitoring from the cloud. Sign up for free trial.

Topics:

Published at DZone with permission of Rodrigo De Castro, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}