Setting up a Redshift cluster that hangs on some number of query executions is always a hassle. When users run queries in Amazon Redshift, the queries are routed to query queues. Usually, the hangups could be mitigated in advance with a good Redshift query queues setup.
Amazon Redshift has implemented mechanism with which we can modify the queues to our advantage. In this article, you will learn the challenges and some best practices on how to modify query queues and execution of queries to maintain an optimized query runtime.
Before we go into the challenges, let’s start with discussing key components of Redshift.
Workload Manager (WLM)
Amazon Redshift workload manager is a tool for managing user-defined query queues in a flexible manner. We use Redshifts Workload Management console to define new user-defined queues and to define or modify their parameters. We can also use it to define the parameters of existing default queues.
3 Queue Types
By default, Amazon Redshift has three queues types: for super users, default queue, and user-defined queues.
1. Superuser Queue
The super user queue is reserved for running commands related to the system, troubleshooting or for some emergency manual operations. This queue cannot be configured and can only process one query at a time.
2. Default Queues
Every Redshift cluster has a default queue. The default queue comes with the default setting of concurrency level of 5. For default queues, you can change the concurrency, timeout and memory allocation. Any queries that are not routed to other queues run in the default queue.
3. User-Defined Queues
Besides the default queue, you can add other user-defined queues. For user-defined queues besides parameters listed in the default queues, you can change user groups parameters and query groups parameters.
5 Queue Parameters
Smart use of queues parameters allows users to optimize time and execution cost of a query. We will look at parameters of queues:
1. Concurrency level
Specifies the number of queries that run concurrently within a particular queue.
2. User Groups
Executing a query by a member of a user group runs the query inside the queue assigned to that user group.
3. Query Groups
Query group is a simple label. Users can assign queries to a particular queue on the fly using this label.
4. Memory Allocation
You have the option of changing the percentage of memory assigned to each queue by setting WLM memory percent parameter. The rate for all the queues adds up to 100%.
With this parameter, you specify the amount of time, in milliseconds, that the Redshift waits for a query to execute before canceling the query. Note that the timeout is based on query execution time which doesn’t include time spent waiting in a queue.
Now that we know what are the main points… let’s move to the challenges.
Unlike transactional systems which have queries of uniform size and execution cost, data warehouse queries vary greatly in execution cost, time and result-set. Optimal execution of these queries necessitates a balanced structure of execution queues configurations dedicated to different query size and/or priority. We want to make sure that the slow running queries are not blocking quickly running queries that execute in a manner of minutes or seconds.
Arriving at an optimal queue setting for the Redshift cluster is a challenge and needs to take into account the needs of the specific implementation of user requirements.
The WLM configuration properties are either dynamic or static. If you change any of the dynamic properties, you don’t need to reboot your cluster for the changes to take effect unlike the change of the static properties.
The following WLM properties are static:
- User groups.
- User group wild-card.
- Query groups.
- Query group wild-card.
With defined queue parameter a dynamic execution parameter can be set for specific queries that impact their performance.
The following WLM properties are dynamic:
- Percent of memory to use.
As mentioned above, the user can change dynamic properties without restarting the Redshift cluster. It allows dynamic memory management when needed, we will look at some examples in the tips section.
An example of a setup of the WLM configuration that handles a solid DWH/BI configuration looks something like this:
We defined the
fast_etl_execution query with the user group called
etl. This user group handles ETL executions. Another group is for BI-related queries. In this configuration, ad-hoc queries are handled by the default queue. It is important to define ETL and BI user groups beforehand, or you will have to restart your Redshift cluster as these parameters are static.
Assign all your ETL users to the
etl user group:
create group etl with user etl_execution;
Now when the user
etl_execution executes an
etl job, if it takes more than 2 minutes (3,000,000 milliseconds), the timeout parameter of the first defined queue of the user (
fast_etl_execution) will cancel the execution in that queue and route it to the long_etl_execution queue. The slow_etl_execution queue has more memory and lower concurrency level so each query has more power to finish the job.
We can check the memory allocation of our queues with the statement:
select name , service_class , num_query_tasks as slots , query_working_mem as memory from STV_WLM_SERVICE_CLASS_CONFIG
The result shows the memory and the available slots for different “Service class #x” queues, where x denotes a queue mapped to the Redshift console “Query X” queue.
You can also see the internal query queues which are not accessible to users,
service_class 1-4, and a super user query queue,
Queues Management Tips
Let’s look at some general tips on working with query queues.
Having only default execution queue can cause bottlenecks. If a large time-consuming query blocks the only default queue small, fast queries have to wait. Make sure you create at least one user-defined query besides the Redshift default query queue. I recommend creating a separate query queue for fast and slow queries — in our example,
fast_etl_execution. Mind the level of concurrent processes that run across all the query queues in Redshift. Another recommendation is having a level of concurrency of at least 2 in particular queues.
Dynamic Management for Loads
We can modify the dynamic properties to tune the execution of particular queries that execute within queue via memory allocation. The main way you control this is with wlm_query_slot_count parameter. You should change dynamically wlm_query_slot_count when you perform resource intensive statements like:
VACUUM, which reclaims space and resorts rows in either a specified table or all tables in the current database.
ANALYZE, which gathers table statistics for Redshifts optimizer.
COPY, which transfers data into Redshift.
You should set the statement to use all the available resources of the query queue. In this case where the concurrency setting of the queue is 10 we set the slot count to 10 (meaning the following query will use all the available slots of the queue):
set wlm_query_slot_count to 10; vacuum; set wlm_query_slot_count to 1;
After the statement finishes (
VACUUM will take some time if you have a large database), you reset the session to use the normal slot count of one.
It is wise to increase the query slot count for
COPY statements when ingesting data into your Redshift cluster.
COPY works best with maximal parallelism enabled so that Redshift can route all the data into the nodes simultaneously.
Superuser Queue for Housekeeping
You can switch the query group of a query to the superuser queue for housekeeping activities like analyzing of even killing a query.
Changing the queue is done with the set
query_group command. Command to gather statistics with the superuser queue is:
set query_group to 'superuser'; analyze; reset query_group;
As usual, there is no one universal setup to cover all the query setups of the Redshift cluster; it heavily depends on user requirements that you are implementing. Working with query queues without a predefined tool is more of an art form than a science. You will come to your optimal configuration after some trial and error.
We covered some rules that get you to a great Redshift cluster setup. More often than not, you will setup a different a separate user defined queue besides the default one. You will set the level of concurrency of at least 2 for a query queue. Finally, you will tune execution of your more demanding statements to use all the resources available in the query queue.