It has been ten years since Amazon launched EC2. The cloud is very much real, and thousands of systems run in it.
Multi-tenancy is a concept tied to cloud from its inception. It lets multiple untrusting parties share resources while giving each other the illusion of its own space. The best example of this idea is an apartment complex, which gives each tenant their own space while sharing resources like plumbing, shared spaces, security, maintenance, etc. For multi-tenancy to be useful, the resource cost per tenant should be less than the cost of owning their own resource (house) exclusively.
A multi-tenant app in the cloud needs to manage three types of resources: OLTP executions, data storage, and OLAP batch executions (analytics). For an example, think of an app that has a web or mobile app, a set of services that provide the backend for the app, a database, and a system to process its analytics. Since the state-of-the-art architectures are built using stateless servers, separate databases, and separate analytics systems, we can tackle multi-tenancy separately for each aspect. Let’s explore each of these aspects and discuss techniques for support them.
Cloud can implement sharing at different levels.
Give each tenant their own machine.
Give each tenant their own VM.
Give each tenant their own container (e.g. Docker).
Let multiple tenants share the same process (e.g. JVM).
As you're going down the list, each option gets more resource sharing, but less isolation (security). IaaS ( Infrastructure as a Service) platforms provide level two in the above list. PaaS and SaaS providers should choose among 2-3.
One of the key advantages of the cloud, if it is properly done, is that a user only “Pay for what she used”. Basically, there are no costs for being available, rather you pay only if it get used. For IaaS such as AWS, this comes almost for free as it shares hardware across users through VMs.
However, “Pay for what you use” does not come free for PaaS and SaaS. Let’s assume a cloud provider “XCloud” that has 1 million users, who have all deployed their PaaS/SaaS apps in XCloud.
At a given time, only a few hundred or a few thousand of those apps will be active. However, we do not know which apps. So when a user arrived at XCloud for AppY, out of the blue, we need to be able to serve him. We have two choices.
Keep at least one instance of each app running.
Boot up the app quickly when a user has arrived.
Option one is really wasteful. Think of AppY as Gmail. Google does not want to keep a VM running for each Gmail account all the time. If they keep a once instance for each user running, it is hard for them to let users pay for only when their app used.
This is the difference between VMs vs. containers or in process multi-tenancy. VMs take seconds to minutes to boot, and unless your app is really simple, there is no way you can boot it up fast enough to serve the user who just came. Typically, you need to get back to user less than ten seconds.
Containers like Docker solve this problem. They are lightweight VMs and can boot up in milliseconds. Thinking about the Gmail example — it is no wonder containers came from Google, as they have the same problem. It is worth noting that using Docker does not guarantee that your app will start fast, and you can very well mess it up in your app’s code. However, with careful coding, you can often hit the mark. Hence, containers remove a major overhead added by infrastructure.
Another interesting point to note is that this is the reason microservices and Docker-based systems are so interested in the start time. If your system can boot up fast enough to serve the user, that will lead to major cost savings, as you do not need to keep an instance running all the time fo each app. Moreover, it will also significantly simplify the autoscaling algorithms — the algorithm does not need to predict the load and keep a buffer of instances running to counter for startup time.
Comparing containers vs. in process multi-tenancy (level 4 in our list), the latter can be even more efficiently. We have done a lot of work in the area [1,2]. However, sharing a process looks scary in retrospect, as you need to trust the programmer of the platform to have not made mistakes. Also, performance isolation is tricky with in-process multi-tenancy.
It seems cloud computing has settled with container-based multi-tenancy for executions, the middle ground. Containers provide acceptable performance and isolation. In retrospect, a sensible choice.
Finally, there is a simpler solution that works in some cases.
It is worth noting that, if your servers are stateless, which is the case in most enterprise deployments, then you can do without multi-tenancy at all. If a server is stateless and it can look at the request, figure out the tenant, and just in time do what is required for that tenant, then you do not need multi-tenancy. Instead, you can run a pool of servers common to all tenants and assign one to each app as users arrive. For data, you can use a multi-tenant database as described in the next section. If this solution is possible, it will reduce the complexity of your architecture significantly.
After a restart, both VMs and containers lose their disks. Hence, multi-tenanting stateful servers such as databases, message brokers, etc., is much more complicated than running them in a container. One option is to use the mounted disks, such as S3 or block devices. However, this could be slow. For most applications, we need to use a database system that runs on top of proper hardware.
Even on top of hardware, we need to share storage across tenants. There are several choices. The article Multi-Tenant Data Architecture provides a great outline of these choices except for the following choice #5, which did not exist at the time of writing.
Database server per tenant.
Database per tenant (the same sever is shared between multiple tenants).
Table per tenant.
Table shared among tenants.
Approaches #1 and #2, sharing the database servers or the database, is acceptable in an IaaS setup, but prohibitively expensive for PaaS or SaaS setups. For example, it is not practical to keep a million databases — one per each tenant.
Approach #3, giving a table per tenant, is better than #1 and #2, but still prohibitive if we are talking about millions of tenants.
Approach #4, having one table shared between many tenants, provides the necessary performance. However, all tenants need to share the same schema with this method, which is acceptable for most PaaS and SaaS scenarios. Just like in-process multi-tenancy, for isolation, we need to trust the programmers of the database system to not have made any mistakes. However, it is easier to verify the SQL filtering rather than verifying Java or C++ code used with the in-process multi-tenancy.
The fifth approach (e.g. Oracle now has a multi-tenant database server) will provide the best of all worlds. If carefully implemented, it can provide level #4 or better performance and isolation at the database level. Although we would have to trust the RDBMS developer to not make a mistake, I believe that within the RDBMS, there is a better chance of handling isolation. Moreover, database developers are likely to understand the domain much better, which will let them do a better job.
Thanks to Big Data, everything must have analytics. For most SaaS applications, analytics have become a competitive advantage. Multi-tenancy requirements for analytics are different from the OLTP use cases that we have considered so far. Implementing them in a PaaS or a SaaS environment needs answers for several challenges.
Analytics includes data collection, data storage, running analysis, and providing controlled access to results. Cloud providers can do following to facilitate data collection.
- Add instrumentations to track transactions.
- Provide data collectors operators that users can place within their apps.
- Provide a data sink API (e.g. REST/JSON API) to which users can publish events.
Each method should track tenant and user information with each event.
While handling the rest, solutions for storage, analysis, and data collection are intertwined with each other. In my opinion, the answer depends on several questions about what a user needs.
Do all tenants share the same schema? Should they able to define their own event types?
Do users need user-level data isolation, or can they live with tenant-level isolation?
Do users and tenants need to run their own queries (analysis) or can they live with pre-canned queries?
Based on these requirements, solutions change. Let’s look at each solution.
Use Super-Tenant Space
If the PaaS controls the data generation, data analysis, and presentation of the final results, then the system can push data into a super-tenant store and handle permissions itself. Although this might be a boring solution, this is a common use case for most SaaS apps, which gives their users pre-cooked analytics and nothing else. If users want to do their own analysis, the system can offer a way to export their data.
Shared Tables Filtered by a Column
Just like with data storage, if all users can share the same set of tables (schema), then we can store data by just adding a user and tenant columns to the table. This works pretty well. Assigning ownership to the results of the data calculated by analysis, however, has some complexity. Hence the analysis logic (e.g. Hadoop Jobs) have to resolve the ownership of each record by adding ownership data as part of the data record. Another advantage is that with this method, a PaaS can run a single set of analytics queries once on all the data has been partitioned using “group by” operators.
Tenant’s Own space
Providing a private space for each tenant is the most flexible. It provides most of the control to the end user. Each tenant can have their own schema and their own analytics queries.
As usual, this flexibility needs to be paid for. It is expensive for several reasons.
If there are a lot of tenants, giving a database table for each tenant is a problem if you are using conventional databases, as they are not designed to handle millions of tables or databases. As discussed before, one solution is to use a natively multi-tenant database (e.g. Oracle 12). Otherwise, the PaaS needs to handle many database servers and partition tables between those servers, which will be very complicated.
Unlike the earlier method, where SaaSes can run a single analytics job to process data across all tenants (e.g. partitioned using “group by”), having a tenant’s own space would need analytics jobs for each tenant. Often initiating an analytics job has significant overhead. Hence, in contrast to running a single job, having many jobs will be significantly slow. In this setup, SaaSes need a much larger computing cluster. Another challenge is that users can run a heavy job that will hog the cluster and slow down others. Hence, the SaaS needs a way to limit the amount of computing power used by a single tenant.
It is possible to do a hybrid solution where SaaS provides a shared schema for shared data and provide tenants their own space for user-defined data. PaaS can charge an additional cost from the users for private space and thereby limit the number of users. In either case, costs would be proportional to the data generated by tenants and cost structures need to be adjusted accordingly.
After 10 years of cloud computing, there are thousand of systems that run in the cloud. This article explores multi-tenancy, a key idea in the cloud. Multi-tenancy is the ability to run multiple tenants (users) within the same system. While realizing multi-tenancy, a cloud may choose between multiple levels of isolation, ranging from sharing the same hardware using VMs to sharing the same processes through clever programming. We explored how to build a multi-tenant app. A multi-tenant app in the cloud needs to manage three types of resources: executions, data storage, and OLAP batch executions (analytics). We explored each and saw that the executions are converging toward containers and that databases are converging toward multi-tenant databases. Furthermore, the article discussed several solutions to support multi-tenant analytics.
Milinda Pathirage, Srinath Perera, Sanjiva Weerawarana, Indika Kumara, A Multi-tenant Architecture for Business Process Execution, 9th International Conference on Web Services (ICWS), 2011
Afkham Azeez, Srinath Perera, Dimuthu Gamage, Ruwan Linton, Prabath Siriwardana, Dimuthu Leelaratne, Sanjiva Weerawarana, Paul Fremantle, Multi-Tenant SOA Middleware for Cloud Computing 3rd International Conference on Cloud Computing, Florida, 2010