DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Spring Data: Data Auditing Using JaVers and MongoDB
  • CRUD Operations on Deeply Nested Comments: Scalable Spring Boot and Spring Data approach
  • Upgrade Guide To Spring Boot 3.0 for Spring Data JPA and Querydsl
  • Testcontainers With Kotlin and Spring Data R2DBC

Trending

  • Beyond Microservices: The Emerging Post-Monolith Architecture for 2025
  • Medallion Architecture: Efficient Batch and Stream Processing Data Pipelines With Azure Databricks and Delta Lake
  • Virtual Threads: A Game-Changer for Concurrency
  • How to Format Articles for DZone
  1. DZone
  2. Data Engineering
  3. Databases
  4. Multi-Tenant Cassandra Clusters With Spring Data Cassandra

Multi-Tenant Cassandra Clusters With Spring Data Cassandra

What options does Cassandra provide for operating multi-tenant clusters? Do you have your services prepared to do the work properly? Read on to find out.

By 
Oresztesz Margaritisz user avatar
Oresztesz Margaritisz
DZone Core CORE ·
Dec. 04, 17 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
14.3K Views

Join the DZone community and get the full member experience.

Join For Free

Apache Cassandra is getting more and more popular these days. Some companies may want to operate a multi-tenant cluster. What options does Cassandra provide for this? Do you have your services prepared to do the work properly?

In this article, we will go through a practical example using Spring Boot and spring-data-cassandra.

How to Prepare Cassandra for Multitenancy

When thinking about multi-tenancy, we have to decide how to manage tenants inside our cluster. There are multiple options, each with their own pros and cons.

Row-Level Multitenancy

Each table has a tenant_id column in its partition key that determines which tenant the record belongs to.

Pros

  • Easy to apply, as you just have to add a new column to partition keys.

  • Less maintenance cost. Everything above table level looks like one single cluster.

Cons

  • Schema changes for tenants need to be deployed at once for all tenants.

  • Hard to optimize for tenant-specific usage (compaction, caching, etc.).

  • Cassandra doesn't allow for altering of primary keys. We can't just add this column to existing tables.

Table-Level Multitenancy

Each table hastenant_id suffix in its name. Tenants are isolated by tables.

Pros

  • Different table schema allowed for tenants.

  • More option in fine-tuning table properties according to tenant-specific usage. You can change bloom filter settings, GC grace periods, etc. on each table.
  • Granting-revoking permissions on a per-table basis.

Cons

  • CQL commands have to be manipulated to append tenant_id into the table's name.

  • Increased number of tables causes maintenance overhead. For instance, more MemTables created and maintained means that more flush operations have to be handled in the background and more random IO has to be performed.

  • The soft limit for number of tables inside a cluster is around 10,000.

Keyspace-Level Multitenancy

Tenants use different keyspaces. Keyspace name maps to a single tenant_id. 

Pros

  • Different kind of replication strategies are available per tenant.

  • We can control which datacenter is available for each tenant.

Cons

  • Similar to table-level multitenancy.

Higher-Level Multitenancy

Other levels of isolation can be introduced for better separation. For example each tenant can have it's own cluster in Cassandra. This would mean that we have to map cluster_name in cassandra.yaml to our tenant_id. With this approach only one node can belong to only one tenant. We'll skip discussing them in detail. 

How to Prepare Your Services for Accessing the Cluster

We will discuss keyspace-level multitenancy in detail with Spring Data Cassandra.

The picture below shows a simplified dataflow diagram. Let's say that in this particular setup, DC1 and DC2 contain Tenant A's dataset, while DC2 and DC3 contain Tenant B's dataset. If you deploy your services to DC1 and DC3, you could just wire-in the target keyspace into the application at deployment time, as all the incoming traffic will be from the same tenant. But we might want to set up a shared datacenter for multiple tenants (like the one named DC2). We could still deploy our services by defining keyspace name at deployment time (called spring.data.cassandra.keyspace-name in the Spring Boot configuration file), but that would mean that we have to route our incoming traffic to services by considering tenant ID.

Image title

Allowing all services to be able to serve any kind of tenant-id would not only help spread the load and increase scalability but would also introduce statelessness to our services and simplify routing logic. With this approach, we can no longer define tenant-id during deployment. We have to handle tenant isolation on a per-request basis.

Requests should encapsulate tenant ID, so we'll introduce a special HTTP header for storing it.

Tenant-ID: ACME

Next, we should be able to get the defined HTTP header inside our services. In Spring, we can define a request-scoped bean that will encapsulate our tenant-id and bind it to our incoming requests. This will be injected to any other Spring component that should be tenant aware.

@Component
@Scope(scopeName = "request", proxyMode= ScopedProxyMode.TARGET_CLASS)
public class TenantId {

    private String tenantId;

    public void set(String id) {
        this.tenantId = id;
    }

    public String get() {
        return tenantId;
    }
}

We have to set the TenantId 's state in our Controllers whenever an HTTP request is handled.

@RestController
    public class UserController {

        @Autowired
        private UserRepository userRepo;
        @Autowired
        private TenantId tenantId;

        @RequestMapping(value = "/userByName")
        public ResponseEntity<String> getUserByUsername(
                @RequestHeader("Tenant-ID") String tenantId,
                @RequestParam String username) {
            // Setting the tenant ID
            this.tenantId.set(tenantId);
            // Finding user
            User user = userRepo.findOne(username);
            ...
        }
 }

To use TenantId in our repositories, we have to extend the existing Repository functionality in Spring Data Cassandra. We have the option to overwrite common operations that are defined in  CrudRepository class like save, findOne, or findAll. We'll define our own repository extension in the Application class of Spring Boot.

@SpringBootApplication
@EnableCassandraRepositories(
  repositoryBaseClass = KeyspaceAwareCassandraRepository.class)
public class DemoApplication {
    ...
}

And we override the common operations one-by-one in our KeyspaceAwareCassandraRepository.

public class KeyspaceAwareCassandraRepository<T, ID extends Serializable>
        extends SimpleCassandraRepository<T, ID>  {
    ...
    @Autowired
    private TenantId tenantId;
    ...
    @Override
    public T findOne(ID id) {
        injectDependencies();
        CqlIdentifier primaryKey = operations.getConverter()
                .getMappingContext()
                .getPersistentEntity(metadata.getJavaType())
                .getIdProperty().getColumnName();

        Select select = QueryBuilder.select().all()
                .from(tenantId.get(),
                        metadata.getTableName().toCql())
                .where(QueryBuilder.eq(primaryKey.toString(), id))
                .limit(1);

        return operations.selectOne(select, metadata.getJavaType());
    }
    ...
}

Note that @Autowired is not supported in this class, as it's not container-managed. We have to get the servlet context through some static methods.

For domain-specific queries, like getting a user by name and e-mail, we have to write the opreation one-by-one for each class to be able to define keyspace in queries by our injected TenantId. For example, here's a  Repository extension that will find User by username and e-mail. Note the KeyspaceAwareUserRepository that we're using as an extension for the custom operation.

public interface UserRepository
        extends CrudRepository<User, String>, KeyspaceAwareUserRepository {
}
public interface KeyspaceAwareUserRepository {
    User findByUsernameAndEmail(String username, String email);
}

In the implementation, we're writing our select with using the TenantId in our request scoped bean.

public class UserRepositoryImpl
        implements KeyspaceAwareUserRepository {

    @Autowired
    private TenantId tenantId;
    @Autowired
    private CassandraOperations operations;

    @Override
    public User findByUsernameAndEmail(String username, String email) {
        Select select = QueryBuilder.select().all()
                .from(tenantId.get(),"user")
                .where(QueryBuilder.eq("username", username))
                .and(QueryBuilder.eq("email", email))
                .limit(1);

        return operations.selectOne(select, User.class);
    }
}

How to Ensure That Things Are Working Properly

For testing that our services are using keyspaces consistently, running a short stress-test that puts a decent load on a small cluster is fine. Make sure that there are good-enough context switches inside the Spring Boot application during test execution.

For running in production, I suggest having good monitoring in place in your Spring Boot and Cassandra instances. You should be tracing at least these metrics for each tenant:

  • Putting too much stress on the cluster 

  • Exceeding quota

  • Exceeding table limit

Summary

In its current state, it's not easy to introduce multi-tenancy to spring-data-cassandra. The Datastax Java driver is handling all request asynchronously, so just throwing-in a USE keyspace command somewhere in our code won't do the trick. This is also explained in the official driver documentation:

"Be very careful though: if the session is shared by multiple threads, switching the keyspace at runtime could easily cause unexpected query failures. Generally, the recommended approach is to use a single session with no keyspace, and prefix all your queries."

We also have to ensure that our solution is somewhat future-proof and honors intentional extension points in Spring Data. Currently, extending repositories, according to the official reference manual, seems to be the only acceptable solid implementation for me.

Spring Framework cluster Database Spring Data Spring Boot Data (computing) Keyspace (distributed data store)

Opinions expressed by DZone contributors are their own.

Related

  • Spring Data: Data Auditing Using JaVers and MongoDB
  • CRUD Operations on Deeply Nested Comments: Scalable Spring Boot and Spring Data approach
  • Upgrade Guide To Spring Boot 3.0 for Spring Data JPA and Querydsl
  • Testcontainers With Kotlin and Spring Data R2DBC

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: