Integrating Alfresco With Amazon S3: A Challenge No Longer
Integrating Alfresco and Amazon S3 used to be difficult, but it's no longer as challenging as it once was. Here's an overview of why it's easier, with some warning signs and tips.
Join the DZone community and get the full member experience.Join For Free
We have seen the huge rise in companies (both enterprises and start-ups) moving their data to cloud based on-demand setups like Amazon, Google and Microsoft Azure from on-premise infrastructure.
Alfresco also owns its cloud-based product Alfresco One and Alfresco in the Cloud. Both these products come with a few challenges. Along with being paid products, they are not as customizable as the Alfresco Community edition. In order to overcome these challenges, Alfresco and developers came up with solutions which allowed users integration of Alfresco with cloud-based file storage services. We will be focusing on:
- How to integrate Alfresco with Amazon Simple Storage Services (Amazon S3)
- Why integrate Alfresco with Amazon Simple Storage Services (Amazon S3)
Before moving forward here is a warning sign. If you will visit official website of Alfresco you will find a disclaimer:
“The Alfresco S3 Connector module can be applied to Alfresco Enterprise 4.1.1 or later. It requires an Alfresco instance running on Amazon’s Elastic Compute Cloud (EC2), connected to Amazon's Simple Storage Service(SSS). Other devices or services that advertise as being S3 compatible have not been tested and are therefore not supported.”
This explains that if you have not installed Alfresco on EC2 instance Alfresco cannot be integrated with S3. But here we are going to discuss will prove that this is not the case. With some custom coding, you can integrate Alfresco with S3 even when Alfresco was not installed on EC2. All the below-mentioned coding has been tried on editions of both the Alfresco Community and Alfresco Enterprise editions and on both the private servers and Amazon EC2.
Let’s first understand why there came the idea of integration of Alfresco and S3:
There are many pros and cons of integration. It ultimately depends on case to case basis and finding the best available solution for achieving a goal. Let's take a use-case to understand in a much better way why you need Alfresco and S3 both?
For example you are a startup and you have a highly customer centric web application where users are allowed to store high quality images and documents and you are using Alfresco as your backend to manage the stored documents. With rigid configuration and storage you are just using an average server. You may sometimes experience huge user base which may even force you may be force you to buy a new configuration from your service provider, which is of course an expensive resort. This investment can also be risky, as you can suddenly drop in customers and hence huge deletion of data. Hence, in your new business you may face with your user base and data stored on the app servers can fluctuate with high amplitudes.
To make a stable solution you must move your storage to cloud based services which can help in easy scaling up and scaling down. There are two ways of performing this:
- Either buy Alfresco Cloud or Alfresco one license
- or invest in Amazon S3 or similar services and integrate it with your native Alfresco.
Doing this, you get document management capabilities of Alfresco with more scalability, flexibility of cloud, and security of Amazon Servers. But as discussed earlier that Alfresco cloud platform is an expensive resort which is quite rigid also, therefore the combination of S3 and Alfresco gives more flexibility.
Let’s quickly have a look at why one must go for an Alfresco and Amazon S3 combo:
- Speedy, Authenticity, Scalability and Security.
- Personalization, Flexibility and Document Management Capabilities.
- Dependability of Cloud Solution.
Process of Integrating Amazon S3 with Alfresco
For Alfresco Enterprise Edition
If you are already an Alfresco Enterprise Edition user then the integration of Alfresco and Amazon S3 is quite simple and easy.
Step1: Download the Alfresco-s3-connector-188.8.131.52-7.zip file from Alfresco Support Portal. Then extract Alfresco-s3-connector-184.108.40.206-7.amp connector.
Step2: Install the AMP file in the Alfresco repository WAR using the Module Management Tool (MMT) and Restart the Server.
Step3: Now just configure the connector by editing /alfresco-global.properties file.
To get detailed information on installing and configuring S3 Connector, the instructions goes like this:
Instructions For Alfresco Community Editions
- You will not find any plug-and-play module for the community edition like the Enterprise edition.
- You need to take help of Alfresco Content Store which can allow documents to be stored in Amazon’s S3.
- It is suggested to use Amazon EC2 servers for hosting your Alfresco instance. You can even use your on-premise servers for this purpose.
We integrate Amazon S3 and Alfresco Community which is proprietary code. So, here is a explanation where we perform without actually sharing codes.
- Alfresco Content Store allows easy management of how and where document binary files are stored through Alfresco. For micro-managing storage, developers intensively use this highly customizable feature. ContentStore implementation has a subsequent feature, CachingContentStore class which is used to speed up the content retrieval.
- ContentService uses fileContentStore to perform content read and write operations. For this we created a custom class and bean named CustomCachingContentStore which overrides the fileContentStore bean. CustomCachingContentStore class also extends CachingContentStore class allowing to automatically use CachingContentStore as well.
- There was a need of overriding getWriter() method. This fosters that with every new document upload, the document gets written to caching store therefore starting a separate thread that writes document in a backing store. This backing store functionality was executed by a custom bean. We named s3ContentStore and bean extends AbstractContentStore class hence overrides following methods: getReader(), getWriterInternal(), delete(), isWriteSupported().
- getReader(): Method returns instance of S3ContentReaderclass thus extendsAbstractContentReader class.
- S3ContentReaderclass: allows fetching a document based on content URL which is provided by a readable byte channel.
- getWriterInternal(): This methods returns an instance of S3ContentWriter class which extends AbstractContentWriterclass.
- S3ContentWriter: This class provides a writable byte channel and adds a ContentStreamListener to write the uploaded file to Amazon S3. S3StreamListener class implements the ContentStreamListener interface and overrides the contentStreamClosed() method where the files are written to Amazon S3.
- delete(): This method deletes document based on its content URL.
<bean id="fileContentStore" class="com.xseed.alfresco.repo.content.caching.CustomCachingContentStore" init-method="init"> <property name="backingStore" ref="s3ContentStore" /> ... </bean> <bean id="s3ContentStore" class="com.xseed.alfresco.repo.content.s3.S3ContentStore"> ... </bean>
Above code is written to override the deletedContentStore bean for handling deleted contents. The documents which are permanently deleted from user-trash will get pushed into this store and will be cleaned up.
<bean id="deletedContentStore" class="com.xseed.alfresco.repo.content.s3.S3DeletedContentStore"> ... </bean>
Here we also used S3DeletedContentStore class which extends S3ContentStoreclass and overrides is ContentUrlSupported() and exists() methods to best optimize needs.
Maybe you find it little vague but actual code is proprietary and still under testing. In the near future, we may release the code.
Before Stepping Ahead
Amazon S3 is a paid service and before actually investing, do analyze the pros and cons. Before, you actually looking step ahead, it is important to understand whether you are sure to expand your application’s storage capabilities or not. It is always advisable to seek experts help.
Opinions expressed by DZone contributors are their own.