My five years at Yesware have coincided with tremendous business and database growth — and all the growing pains that come with scaling. In those five years, our platform has evolved from one monolithic application to a collection of around 40 microservices, data has grown to many terabytes, and the engineering team has gone from five members to over 20 and is still growing. Experiencing this growth — and the accompanying challenge — taught us some valuable lessons along the way about how to manage a rapidly expanding quantity of data.
Lesson #1: Database-as-a-Service (DBaaS) Is a Valuable Asset for Startup Development Teams
For startups with limited resources, outsourcing database management to a Database-as-a-Service is an increasingly popular option. We’ve tried self-hosting in some limited use cases with very mixed results, and tend to prefer working with DBaaS. You benefit from your provider's expertise and know that your database is configured from the beginning with best practices built in. At Yesware, we use a variety of databases across our platform such as MongoDB, PostgreSQL, and Redis, and use several DBaaS providers. For example, we’ve used mLab's MongoDB solution since the beginning and they provide architectural guidance and performance tuning as part of their support.
When selecting a DBaaS, look for reliability and a track record of satisfied customers. Quality support staff is key to avoiding missteps and managing fires if they happen.
Lesson #2: Query Indexing Is Critical for Smooth Operations and Peace of Mind
We use MongoDB as one of our primary data stores because of its flexible data model and querying language. However, a tradeoff to flexible querying is that developers need to ensure that the proper indexes are in place. Careless indexing is the main culprit behind most performance issues, and maintaining a growing database will inevitably teach you the value of indexing efficiency.
Newer developers may be unaware of the importance of indexing because it's initially possible to scale without paying close attention to indexes. Development teams may choose to simply purchase more capacity to deal with increasing data sizes. However, in the long run, this isn’t economically feasible. From the beginning, you will want to ensure that all of your queries are indexed, which will result in efficient query performance and avoid full table scans. Over time, additional benefits can be obtained by optimizing as many queries as possible to be fully covered, taking advantage of the index to eliminate the need to fetch any data not already present therein.
Lesson #3: Driver and Tool Selection Is Important
The drivers your applications use to connect to and query against your database may abstract important information. When selecting a driver or tool, make sure you review the documentation and the team behind it.
For example, we’ve long used a Ruby ODM (Object Document Mapper) called MongoMapper that was working well for our team. However, the library has not been consistently maintained - sometimes making it difficult for us to upgrade to newer versions of MongoDB. As a result, we tried using Mongoid in a few newer applications. Mongoid is the official MongoDB ODM library and as such seems to have better support, but we’ve run into features that are not present in Mongoid and upon which we have come to rely in MongoMapper. In addition, some versions of Mongoid use a different raw Ruby driver to interact directly with the database. Ultimately, ODM selection can be a difficult but important choice.
ODMs make it very convenient to work with objects that are mapped from a MongoDB document. However, they are designed for ease of development — not for sophisticated querying. Often, developers will write software using an ODM without knowing what query is actually being generated and issued to the database. This leaves room for query optimization and can obscure index usage. In our most performance-sensitive applications, we sometimes use the Ruby driver to make raw driver calls for querying and updating. This gives us more direct control over the number of operations issued to the database, along with what those operations are doing. Understanding your tools' strengths and shortcomings helps you ensure optimal performance and avoid issues down the line.
Lesson Learned #4: Bringing New Engineers Up to Speed
As data grows, so does the team handling it. How do you bring new team members up to speed on working with the database? Early team members receive a kind of trial-by-fire education as they pioneer integrations with new services or build new features. I certainly have learned more about MongoDB indexing at 2 in the morning, after being paged to deal with a crisis situation, than a new engineer is likely to learn at 2 in the afternoon while writing new code.
As the organization matures, veterans inevitably create solutions and work patterns that newer engineers can duplicate and reuse. While this is great for productivity, it can also reduce the opportunities for newcomers to gain a deep knowledge of how the technologies work.
When bringing on new developers, we’ve found that it pays off to find small, self-contained jobs that challenge and teach newcomers about the ecosystem, platform, and different database technologies used, exposing them to one area of information at a time. Once new engineers have gained experience and perspective, then we attempt to fold them into larger projects that may touch multiple parts of the stack.
From the technology to the providers to the personnel and beyond, taking these lessons to heart and optimizing your operations can help any emerging company maintain high database performance while achieving the scalability it will hopefully need heading well into the future.