Migrating File Storage to Amazon S3
Migrating File Storage to Amazon S3
If you are looking to migrate files to S3, check out some of the methods for doing so, as well as best use cases for each one.
Join the DZone community and get the full member experience.Join For Free
If you have implemented an application that stores files on a web server, you have probably experienced the challenges when it comes to scaling. Uploading files not only consumes the storage while the application grows but also it affects the performance while accessing the files at scale. This is one of the reasons why many choose Cloud Object Storage solutions like Amazon S3 to store these files.
This article discusses three potential paths to migrate an existing application file uploads to Amazon S3 with the nature of the change to your application to help you in taking the right decisions.
First of all, it is important to understand that Amazon S3 is a highly durable and available, distributed object storage provided by AWS. In addition to the basic features as a object storage, it provides an access control model, file organization using buckets and paths, file access via HTTP(s), metadata configuration, versioning, lifecycle management and integration with the other services in AWS both natively as well as via SDKs and APIs.
Frequent File Updates?
One of the limitations in Amazon S3 is that you cannot do partial updates to the files, meaning for each of the file changes, you need to the change outside Amazon S3 and upload it back. Therefore, if your application, do file modification operations more often and needs to share or save the partial changes to Amazon S3, it can become expensive both in terms of cost (increasing the costs for S3 operations) as well as performance since the file needs to be uploaded for each change via the network. For these kinds of operations, you can consider Amazon EFS in comparison to S3.
Potential Paths in Migration
Determining the potential path in migrating your file uploads to S3 can be challenging based on how file handling is already implemented in your application.
Without Modifying the Code
You can mount an Amazon S3 as NFS mount to the storage path where your application stores files. This can be done using AWS Storage Gateway's file interface, or file gateway. Since File gateway offers local caching it will improve the read performance when accessing frequently used files for low-latency access, and data transfer between the data center and AWS is fully managed and optimized by the gateway. However, the challenge in this approach is that it requires to setup virtual on-premises file server which acts as the file system interface for the NFS mount.
On the other hand, there are third-party implementations where an S3 bucket can be directly mounted as a file system to your application storage path. Some of the third-party implementations namely are S3FS-FUSE, ObjectiveFS, and RioFS. You can consider them depending on operating systems they support and the individual characteristics of these file systems. This approach is more suitable as an interim step of a migration rather a permanent solution.
The main advantage of these approaches is that they can be used without changing your application code without affecting the application stability with least amount of effort needed.
If your application mainly creates and retrieve files without other operations such as editing or appending to files, searching on the file system itself, these approaches should work fine.
Directly Moving to Amazon S3
If you are using Amazon S3 directly, it is possible to access it depending on the nature of your application access patterns. For example, you can utilize Amazon CloudFront (CDN) to speedup downloading files and implement support for custom access control mechanisms.
Another advantage of this approach is that it can release some of your server load, where the clients can directly upload files to Amazon S3 without transferred through your server. This is extremely beneficial when handling a large number of files or large file sizes at scale both for increased performance and reduce costs.
One of the challenges of this approach is that you will need to modify your code to upload and retrieve files from S3. If Amazon S3 SDKs are compatible with the Application developed programming language, this can simplify the implementation. If not you can utilize Amazon S3 REST API.
When reading files from S3, using Amazon CloudFront can simplify the access control modifications you need to access S3 by using Signed Cookies (Or Signed URLs). This will be helpful to modify the uploading operation while keeping the file reading unchanged. In addition, if you have a mismatch of URLs between an uploaded path and read path (When having URL mapping to access files), you can use Lambda Edge function to do the URL mappings according to the existing application implementation.
Opinions expressed by DZone contributors are their own.