Database Administration on Snowflake
This is part one of a series of articles that will unpack just what near zero management means.
Join the DZone community and get the full member experience.Join For Free
Snowflake is a cloud-based Data Warehouse platform that proudly boasts near zero management. This is part one of a series of articles that will unpack just what that means.
I was recently working for a major UK customer, where the system manager said, “Snowflake says they need Zero Management, but surely that’s just a marketing ploy.” Most Solution Architects use the term “Near Zero Management” which is probably more accurate. However, it got me thinking — is it true?
Will DBA’s go the way of the 1970s Computer Operators and become extinct as a species?
Listed below are some of the tasks needed for a System or Database Administrator (DBA), and why they are no longer needed with Snowflake.
- Disk Space Capacity Planning: While your administrators should keep an eye upon the total storage used, there’s no need to create spreadsheets with projections for disk space requirements and capacity planning because it’s impossible to run out of space. In short, your company would go bankrupt before it used up all the disk capacity of a major cloud provider like Amazon or Microsoft.
- Up Front Machine Sizing: Normally, when you deploy a new analytics application your DBA needs to agonize overestimates of compute processing, memory requirements, and estimates of initial user numbers and projected growth rates. These are no longer needed with Snowflake, where the administrator can define an unlimited number of Virtual Warehouses in one of a range of T-Shirt Sizes, and then if the compute processing is too small, or user numbers suddenly double, these can be adjusted on-the-fly, typically applied within milliseconds.
- The SQL statement above illustrates the command needed to resize a virtual warehouse. In this case, the size is set to LARGE, and although initially one cluster of 8 nodes is allocated, the system will automatically scale out to 4 clusters as additional users run queries, a total of 32 nodes.
- Provide a Disaster Recovery Data Centre: On both AWS and Azure, as data is committed, it is automatically written to three data centers in a given region, and the system can withstand the loss of up to two systems. This means, with no additional effort, your data warehouse is probably safer than your primary feeder systems.
- Physical or Network Security: Snowflake includes a huge array of built-in security features to guarantee the security of your data. This includes end-to-end 256-bit encryption, multi-factor authentication, an optional private link connection between the Snowflake and your private Virtual Private Cloud and IP address whitelisting or blacklisting of access using network access policies. Finally, for financial services applications, Snowflake provides the option of a Virtual Private Snowflake deployment using a dedicated and managed instance in addition to the standard security features described.
- Physical Database Design and Indexing: Oracle includes an entire armory of indexes including Bitmap, B-Tree, Clustered, Function-Based Indexes and Index Organised tables. Each index can then be globally or locally partitioned, and these need to be carefully managed to ensure they are not accidentally disabled, and they are correct for the workload applied. Every table over 2Gb in size is recommended to be partitioned, and this produces further design and maintenance overhead. Snowflake, however, has no indexes whatsoever and instead of defining partitioning on specific keys to maximize query performance, every column is automatically partitioned with no overhead or maintenance required.
- Apply Software Upgrades: A large corporate could host thousands of production, test and development databases with a constant stream of operating system, network and database system patch releases and upgrades. Every few years, database vendors release a major software release which often needs downtime and data migration to apply. Snowflake, however, silently applies changes without any effort, downtime or interruption to service. New features are constantly being rolled out, and improvements made. Even hardware upgrades are silently applied, and the only difference the customer knows is when tasks run faster.
No DBA? Really?
In reality, any large, and complex computer system will always need specialists to monitor and manage the system, and (if nothing else), to monitor the disk space used, and the cost of running the system. The screenshot below shows a pie chart showing the percentage cost allocated by each virtual warehouse.
The reality is, most running systems will still need specialists to manage the following:-
- Backup and Recovery: Although hugely simplified, it’s important to be able to quickly recover from system or human error, and quickly restore corrupted data.
- Manage Costs: Although it’s possible to spin up, resize or suspend a compute clusters within seconds, somebody must maintain control, and be accountable for costs. As billing is charged by the second, it becomes increasingly important to suspend warehouses when they are no longer needed, and while there’s the temptation to turn the Data Lake into an Ocean, somebody needs to organize the data, and ensure resources are the money is spent wisely.
- Software Development Lifecycle Management: Will always be a challenge for multi-terabyte systems. As compute power becomes increasingly available, business users will want to do more, but somebody will still need to carefully manage code and data migration from development, test, and production.
- Tune Warehouses: Snowflake gives the unique opportunity to adjust both the size of the cluster (to run faster, more complex queries) and the concurrency which means it can be adjusted to handle more users. The DBA needs to monitor the system, and adjust virtual warehouses when appropriate.
In the next article, I’ll explain in more detail which tasks still remain for DBAs and System Administrators. It’s fair to say, however, with the wholesale migration to the Cloud, the landscape is changing, and DBAs will be freed up from the more mundane tasks.
Disclaimer: The opinions expressed in my articles are my own, and will not necessarily reflect those of my employer (past or present) or indeed any client I have worked with.
Note: This article was first published on Analytics Today as Database Administration on Snowflake.
Published at DZone with permission of John Ryan, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.