DZone
Cloud Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Cloud Zone > An S3 File Bucket Downloader Written in Ruby

An S3 File Bucket Downloader Written in Ruby

Rodrigo De Castro user avatar by
Rodrigo De Castro
·
Jun. 07, 12 · Cloud Zone · Interview
Like (0)
Save
Tweet
5.48K Views

Join the DZone community and get the full member experience.

Join For Free

Today I wanted to download files from a website that I happened to find out that stored all files in S3. By accessing the website root, I realized that it was just the response of a S3 ListBucket API call. For instance:

    <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">  
       <Name>foo.com</Name>  
       <Prefix/>  
       <Marker/>  
       <MaxKeys>1000</MaxKeys>  
       <IsTruncated>true</IsTruncated>  
       <Contents>  
          <Key>file/1</Key>  
          <LastModified>2011-06-09T06:29:02.000Z</LastModified>  
          <ETag>"5cb3930839817ff4a5c1ddf08e3fea1e"</ETag>  
          <Size>1440231</Size>  
          <StorageClass>STANDARD</StorageClass>  
       </Contents>  
       <Contents>  
          <Key>file/2</Key>  
          <LastModified>2011-06-09T06:29:18.000Z</LastModified>  
          <ETag>"96fdc94d14b6d9817f80ac1e9e2049b4"</ETag>  
          <Size>1310</Size>  
          <StorageClass>STANDARD</StorageClass>  
       </Contents>  
    </ListBucketResult>  

In order to download all files more quickly, I wrote the following Ruby program that downloads all files from this website, and I hope it can be useful for others:

    require 'net/http'  
    require 'rexml/document'  
      
    baseurl = 'foo.com'  
      
    # get the XML data as a string  
    xml_data = Net::HTTP.get_response(URI.parse("http://" + baseurl)).body  
      
    # extract event information  
    doc = REXML::Document.new(xml_data)  
    titles = []  
    links = []  
    Net::HTTP.start(baseurl) do |http|  
      doc.elements.each('ListBucketResult/Contents/Key') do |ele|  
        puts "Downloading " + ele.text  
        resp = http.get("/" + ele.text)  
        open("images/" + ele.text.gsub("/", "_") + ".jpg", "wb") { |file|  
          file.write(resp.body)  
        }  
      end  
    end  
    puts "Done"  
AWS

Published at DZone with permission of Rodrigo De Castro, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Choosing Between GraphQL Vs REST
  • How to Determine if Microservices Architecture Is Right for Your Business
  • Refactoring Java Application: Object-Oriented And Functional Approaches
  • How to Leverage Method Chaining To Add Smart Message Routing in Java

Comments

Cloud Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo