Over a million developers have joined DZone.

File Handling in Amazon S3 With Python Boto Library

DZone's Guide to

File Handling in Amazon S3 With Python Boto Library

Read on and understand how to use the Python Boto library for standard S3 workflows.

· Cloud Zone
Free Resource

Deploy and scale data-rich applications in minutes and with ease. Mesosphere DC/OS includes everything you need to elastically run containerized apps and data services in production.

1. Introduction

Amazon Web Services (AWS) Simple Storage Service (S3) is a storage as a service provided by Amazon.  It's a general purpose object store, the objects are grouped under a name space called “buckets.” The buckets are unique across the entire AWS S3.

Boto library is the official Python SDK for software development [1]. It provides APIs to work with AWS services like EC2, S3, and others.

In this article, we will focus on how to use Amazon S3 for regular file handling operations using Python and Boto library.

2. Amazon S3 & Work Flows

In Amazon S3, the user has to first create a bucket. The bucket is a namespace, which is has a unique name across AWS. The users can set access privileges to it based on their requirement. The buckets can contain objects. The objects are referred as a key-value pair, where key is the identifier to operate on the object. The key must be unique inside the bucket. The object can be of any type. It can be used to store strings, integers, JSON, text files, sequence files, binary files, picture & videos. To understand more about Amazon S3 refer to the Amazon Documentation [2].

The following is a possible work flow for operations in Amazon S3:

  • Create a Bucket
  • Upload file to a bucket
  • List the contents of a bucket
  • Download a file from a bucket
  • Move files across buckets
  • Delete a file from bucket
  • Delete a bucket

3. Python Boto Library

Boto library is the official Python SDK for software development. It supports Python 2.7. Work for Python 3.x is ongoing. The code snippets in this article are developed using boto v2.x. To install the boto library, the pip command can be used as below:

pip install -u boto

Also in the below code snippets, I have used connect_s3() API, bypassing the access credentials as arguments. This provides the connection object to work with. But, if you don’t want to code the access credentials in your program, there are other ways of doing it. We can create environmental variables for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.  The other way is to create a credentials file and keep them under .aws directory in the name of “credentials” in the users home directory.  The file should contain the below:

File Name : ~/.aws/credentials

aws_access_key_id = ACCESS_KEY
aws_secret_access_key = SECRET_KEY

4. S3 Work Flow Automation

4.1 Create a Bucket

The first operation to be performed before any other operation to access the S3 is to create a bucket.  The create_bucket() API in connection object performs the same.  The bucket is the name space under which all the objects of the users can be stored.

import boto

keyId = "your_aws_key_id"
#Connect to S3 with access credentials 
conn = boto.connect_s3(keyId,sKeyId) 

#Create the bucket in a specific region.
bucket = conn.create_bucket('mybucket001',location='us-west-2')

In create_bucket() API, the bucketname (‘mybucket001’) is the mandatory parameter, which is the name of the bucket.  The location is an optional parameter, if the location is not given, then the bucket will be created in the default region of the user.

The create_bucket() call might throw an error message if a bucket with the same name already exists. Also, the bucket name is unique across the system.  The naming convention of the bucket depends upon the rules enforced by the AWS region. Generally, a bucket name must be in lower case.

4.2 Upload a File

To upload a file into S3, we can use set_contents_from_file() API of the Key object. The Key object resides inside the bucket object.

import boto
from boto.s3.key import Key

keyId = "your_aws_key_id"
sKeyId= "your_aws_secret_key_id"

file = open(fileName)

conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
#Get the Key object of the bucket
k = Key(bucket)
#Crete a new key with id as the name of the file
#Upload the file
result = k.set_contents_from_file(file)
#result contains the size of the file uploaded

4.3 Download a File

To download the file, we can use get_contents_to_file() API.

import boto
from boto.s3.key import Key

keyId ="your_aws_key_id"

conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)

#Get the Key object of the given key, in the bucket
k = Key(bucket,srcFileName)

#Get the contents of the key into a file 

4.4 Move a File From One Bucket to Another

We can move a file from one bucket to another only by copying the object from one bucket to another. The copy_key() API of bucket object copies the object from a given bucket to local.

import boto

keyId = "your_aws_access_key_id"

conn = boto.connect_s3(keyId,sKeyId)
srcBucket = conn.get_bucket('mybucket001') #Source Bucket Object
dstBucket = conn.get_bucket('mybucket002') #Destination Bucket Object
fileName = "abc.txt"
#Call the copy_key() from destination bucket

4.5 Delete a File

To delete a file inside the object, we have to retrieve the key of the object and call the delete() API of the key object. The key object can be retrieved by calling Key() with bucket name and object name.

import boto
from boto.s3.key import Key

keyId = "your_aws_access_key"
sKeyId = "your_aws_secret_key"
srcFileName="abc.txt" #Name of the file to be deleted
bucketName="mybucket001" #Name of the bucket, where the file resides

conn = boto.connect_s3(keyId,sKeyId) #Connect to S3
bucket = conn.get_bucket(bucketName) #Get the bucket object

k = Key(bucket,srcFileName) #Get the key of the given object

k.delete() #Delete the object

4.6 Delete a Bucket

The delete_bucket() API of the connection object deletes the given bucket in the parameter.

import boto

keyId = "your_aws_access_key_id"
sKeyId= "your_aws_secret_key_id"
conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.delete_bucket('mybucket002')

The delete_bucket() call will fail, if there are objects inside the bucket.

4.7 Empty a Bucket

Emptying a bucket can be achieved by deleting all the objects indie the bucket.  The list() API of bucket object (bucket.list()) will provide all the objects inside the bucket. By calling the delete() API for those objects, we can delete them.

import boto

keyId = "your_aws_access_key_id"
sKeyId= "your_aws_secret_key_id"

conn = boto.connect_s3(keyId,sKeyId) #Connect to S3
bucket = conn.get_bucket(bucketName) #Get the bucket Object

for i in bucket.list():
 i.delete() #Delete the object

4.8 List All Buckets

The get_all_buckets() of the connection object returns a list of all buckets for the user. This can be used to validate the existence of the bucket once you have created or deleted a bucket.

import boto

keyId = "your_aws_access_key_id"
sKeyId= "your_aws_secret_key_id"

conn = boto.connect_s3(keyId,sKeyId) #Connect to S3
buckets = conn.get_all_buckets() #Get the bucket list
for i in buckets:

5. Summary

The boto library provides connection object, bucket object, and key object which exactly represent the design of S3. By understanding the various methods of these objects, we can perform all the possible operations on S3 using this boto library.

Hope this helps.

6. References

[1] Boto S3 API Documentation – http://boto.cloudhackers.com/en/latest/ref/s3.html

[2] Amazon S3 Documentation – https://aws.amazon.com/documentation/s3/

Discover new technologies simplifying running containers and data services in production with this free eBook by O'Reilly. Courtesy of Mesosphere.

bucket ,s3 ,standard ,library ,boto ,sdk ,amazon ,key ,object ,python

Published at DZone with permission of Saravanan Subramanian, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}