DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Managing AWS Managed Microsoft Active Directory Objects With AWS Lambda Functions
  • Why Use AWS Lambda Layers? Advantages and Considerations
  • Efficient String Formatting With Python f-Strings
  • Python Context Managers Simplified

Trending

  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  • Automatic Code Transformation With OpenRewrite
  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • Zero Trust for AWS NLBs: Why It Matters and How to Do It
  1. DZone
  2. Coding
  3. Languages
  4. Boto3: Amazon S3 as Python Object Store

Boto3: Amazon S3 as Python Object Store

Use Amazon Simple Storage Service (S3) as an object store to manage Python data structures.

By 
Saravanan Subramanian user avatar
Saravanan Subramanian
·
Jan. 21, 19 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
96.4K Views

Join the DZone community and get the full member experience.

Join For Free

Use Amazon Simple Storage Service (S3) as an object store to manage Python data structures.

Introduction

Amazon S3 is extensively used as a file storage system to store and share files across the internet. Amazon S3 can be used to store any type of objects, it is a simple key-value store. It can be used to store objects created in any programming languages, such as Java, JavaScript, Python, etc. AWS DynamoDB recommends using S3 to store large items of size more than 400KB. This article focuses on using S3 as an object store using Python.v

Prerequisites

The Boto3 is the official AWS SDK to access AWS services using Python code. Please ensure Boto3 and awscli are installed in the system.

$pip install boto3
$pip install awscli

Also, configure the AWS credentials using "aws configure" command or set up environmental variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY store your keys in the environment. Please DO NOT hard code your AWS Keys inside your Python program.

To configure aws credentials, first install awscli and then use "aws configure" command to setup. For more details, refer to AWS CLI Setup and Boto3 Credentials.

Configure the AWS credentials using the following command:

$aws configure

Do a quick check to ensure you can reach AWS.

$aws s3 ls

The above CLI must show the S3 buckets created in your AWS account. The AWS account will be selected based on the credentials configured. In case, multiple AWS accounts are configured, use the "--profile " option in the AWS CLI. If you don't mention "--profile " option the CLI takes the profile "default".

Use the below commands to configure development profile named "dev" and validate the settings.

$aws configure -profile dev
$aws s3 ls --profile dev

The above command show s3 buckets present in the account which belongs to "dev" profile.

Connecting to S3

Connecting to Default Account (Profile)

The client() API connects to the specified service in AWS. The below code snippet connects to S3 using the default profile credentials and lists all the S3 buckets.

import boto3

s3 = boto3.client('s3')
buckets = s3.list_buckets()
for bucket in buckets['Buckets']:
    print bucket['CreationDate'].ctime(), bucket['Name']

Connecting to Specific Account (Profile)

To connect to a specific account, first, create a session using Session() API. The Session() API allows to mention the profile name and region. It also allows to specify the AWS credentials.

The below code snippet connects to an AWS account configured using "dev" profile and lists all the S3 buckets.

import boto3

session = boto3.Session(profile_name="dev", region_name="us-west-2")
s3 = session.client('s3')buckets = s3.list_buckets()
for bucket in buckets['Buckets']:
    print bucket['CreationDate'].ctime(), bucket['Name']

Storing and Retrieving a Python LIST

Boto3 supports put_object()and get_object() APIs to store and retrieve objects in S3. But the objects must be serialized before storing. The python pickle library supports serialization and deserialization of objects. Pickle is available by default in Python installation.

The APIs pickle.dumps() and pickle.loads() is used to serialize and deserialize Python objects.

Storing a List in S3 Bucket

Ensure serializing the Python object before writing into the S3 bucket. The list object must be stored using a unique "key." If the key is already present, the list object will be overwritten.

import boto3
import pickle

s3 = boto3.client('s3')
myList=[1,2,3,4,5]

#Serialize the object 
serializedListObject = pickle.dumps(myList)

#Write to Bucket named 'mytestbucket' and 
#Store the list using key myList001

s3.put_object(Bucket='mytestbucket',Key='myList001',Body=serializedListObject)

The put_object()API may return a "NoSuchBucket" exception if the bucket does not exists in your account.

NOTE: Please modify bucket name to your S3 bucket name. I don't won this bucket.

Retrieving a List From S3 Bucket

The list is stored as a stream object inside Body. It can be read using read() API of the get_object() returned value. It can throw an "NoSuchKey" exception if the key is not present.

import boto3
import pickle

#Connect to S3
s3 = boto3.client('s3')

#Read the object stored in key 'myList001'
object = s3.get_object(Bucket='mytestbucket',Key='myList001')
serializedObject = object['Body'].read()

#Deserialize the retrieved object
myList = pickle.loads(serializedObject)

print myList

Storing and Retrieving a Python Dictionary

Python dictionary objects can be stored and retrieved in the same way using put_object() and get_object() APIs.

Storing a Python Dictionary Object in S3

import boto3
import pickle


#Connect to S3 default profile
s3 = boto3.client('s3')

myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'}
#Serialize the object
serializedMyData = pickle.dumps(myData)

#Write to S3 using unique key - EmpId007
s3.put_object(Bucket='mytestbucket',Key='EmpId007')

Retrieving Python Dictionary Object From S3 Bucket

Use the get_object() API to read the object. The data is stored as a stream inside the Body object. This can be read using read() API.

import boto3

s3 = boto3.client('s3')

object = s3.get_object(Bucket='mytestbucket',Key='EmpId007')
serializedObject = object['Body'].read()

myData = pickle.loads(serializedObject)

print myData

Working With JSON

When working with Python dictionary, it is recommended to store it as JSON if the consumer applications are not written in Python or do not have support for Pickle library.

The API, json.dumps(), converts the Python Dictionary into JSON, and json.loads() converts a JSON to a Python dictionary.

Storing a Python Dictionary Object As JSON in S3 Bucket

import boto3
import json

s3 = boto3.client('s3')

myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'}
serializedMyData = json.dumps(myData)

s3.put_object(Bucket='mytestbucket',Key='EmpId007')

Retrieving a JSON From S3 Bucket

import boto3
import json

s3 = boto3.client('s3')
object = s3.get_object(Bucket='mytestbucket',Key='EmpId007')
serializedObject = object['Body'].read()

myData = json.loads(serializedObject)

print myData

Upload and Download a Text File

Boto3 supports upload_file() and download_file() APIs to store and retrieve files to and from your local file system to S3. As per S3 standards, if the Key contains strings with "/" (forward slash) will be considered as subfolders.

Uploading a File

import boto3

s3 = boto3.client('s3')
s3.upload_file(Bucket='mytestbucket', Key='subdir/abc.txt', Filename='./abc.txt')

Download a File From S3 Bucket

import boto3

s3 = boto3.clinet('s3')
s3.download_file(Bucket='mytestbucket',Key='subdir/abc.txt',Filename='./abc.txt')

Error Handling

The Boto3 APIs can raise various exceptions depends on the condition. For example, "DataNotFoundError","NoSuchKey","HttpClientError", "ConnectionError","SSLError" are few of them. The Boto3 exceptions inherit Python "Exception" class. So, handle the exceptions by looking for Exceptions class in error and exception handling in the code.

import boto3

try:
s3 = s3.client('s3')
except Exceptions as e:
        print "Exception ",e

Summary

Storing python objects to an external store has many use cases. For example, a game developer can store an intermediate state of objects and fetch them when the gamer resumes from where they left off, and the API developer can use an S3 object store as a simple key-value store. Please refer the URLs in the Reference sections to learn more. Thanks!

References

  1. Boto3
  2. Boto3 S3 API
  3. AWS CLI
  4. AWS Boto3 Credentials
  5. Python 2.7 Pickle Library
  6. Boto3 Exceptions
AWS Python (language) Object (computer science)

Published at DZone with permission of Saravanan Subramanian, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Managing AWS Managed Microsoft Active Directory Objects With AWS Lambda Functions
  • Why Use AWS Lambda Layers? Advantages and Considerations
  • Efficient String Formatting With Python f-Strings
  • Python Context Managers Simplified

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!