Azure in Action
EARLY ACCESS EDITION
Chris Hay and Brian H. Prince
MEAP Began: August 2009
Softbound print: Summer 2010 | 425 pages
This article is taken from the book Azure in Action. Message queues are the third part of the Azure storage system (blobs and tables are the other two). The concept of queues has been around a long time, and you’ve likely worked with some technology related to queues already. A common architectural goal during design is to produce a system that’s tightly integrated but also loosely coupled. The easiest way to provide a loosely coupled system is to provide a way for the components to talk to each other through messages. We want these messages to follow a “Tell, don’t ask” approach. We shouldn’t ask an object for a bunch of data, do some work, and then give the results back to the object for recording. We should tell the object what we want it to do. We should do that at the component and system levels as well. This approach helps us to create code that’s well abstracted and compartmentalized. This is the first part of a two-part article that discusses basic queue and message concepts.
You have many ways to decouple your system, but they usually center on messaging of some sort. One of the most common ways is to use queues in between the different parts of the system or between completely different systems that must work together. In this article, we’ll look at how queues help us decouple our systems.
How Queues Work
Before we discuss the specific of queues, let’s take a high-level look at how they work. Queues have two ends. The producer end is where messages are put into the queue. This is usually represented as the bottom. The other end is the consumer end, where a consumer pulls messages off the top of the queue. A queue is a FIFO structure: first message in, first message out. This contrasts with a stack, which is LIFO, or last in, first out.
A real-world example of a queue is the line for tickets at the movie theater on the opening day of new blockbuster. As people arrive, they stand at the end of the line. As the consumer (the ticket booth) completes sales, it works with the next person at the head of the line. As people buy their tickets, the line moves forward. Figure 1 shows this FIFO structure.
Figure 1 A queue forms for tickets on the opening night of a new blockbuster movie. Moviegoers enter (while wearing their fan-boy outfits and light sabers) at the bottom, or end of the line. As the ticket booth processes each ticket request, each moviegoer moves forward in the queue until he’s at the head of the line.
At a busy movie theater, there may be many ticket booths consuming customers from the line. Management may open more ticket booths, based on the length of the line or on how long people have to wait in line. As the processing capacity of the ticket counter increases, the theater is able to sell tickets to more customers each minute. If the line is short at a particular time, the theater manager might close down ticket booths until only one or two are left open.
Figure 2 shows how your system can use this queue concept. As the producer side of your system (the shopping cart checkout process, for example) produces messages, they’re placed in the queue. The consumer side of the system (the ERP system that processes the orders and charges credit cards) pulls messages off the queue one by one.
Figure 2 Producers place messages into the queue, whereas consumers get them out. Each queue can have multiple producers and consumers.
Having a queue in between the two sides keeps them tightly integrated but loosely coupled.
Queues are one way in nature. A message goes in at the bottom, moves toward the top, and is eventually consumed. In order for the consumer message to communicate back to the producer, a separate process must be used. This could be a queue going in the opposite direction but is usually some other mechanism, like a shared storage location.
There’s an inherent order to a queue, but you can’t usually rely on queues for strict-ordered delivery. In some scenarios, message order can be important. A consumer processing checkouts from an ecommerce website won’t need the messages in a precise order, but a consumer processing a set of doctor’s orders for a patient might. For the ecommerce site, it won’t matter which checkout is processed first, as long as it’s in a reasonable order. But the tests, drugs, and surgeries for a patient likely need to be processed in precise order. We’ll explore some ways of handling this situation in the context of Azure in part 2 of this article.
Your Azure storage account can have multiple queues. At any time, a queue can have multiple messages, which we’ll discuss next.
What’s in a Message?
Messages are the lifeblood of a queue system and should represent the messages that a producer is telling to a consumer. You can think of a queue as having a name, some properties, and a collection of ordered messages.
In Azure, messages are limited to 8 KB in size. This low limit is designed for performance and scalability reasons. If a message could be up to 1 GB in size, then writing to and reading from the queue would take a long time. This would also make it hard for the queue to respond quickly when many different consumers are reading messages off the top of the queue.
Because of this limit, most Azure queue messages follow a work ticket pattern. The message usually doesn’t contain the data needed by the consumer itself. Instead, the message contains a pointer of some sort to the real work that needs to be done. Figure 3 depicts the flow of a work ticket for video compression.
Figure 3 Work tickets are used in queues to tell the consumer what work needs to be done. This keeps the messages light and the queue scalable and performant. The work ticket is usually a pointer to where the real work is.
Replace #1-4 in figure in the following paragraph with a cueball
A queue that contains messages for video compression won’t include the video that needs to be compressed. The producer will store the video in a shared storage location (1), perhaps a blog container or a table. Once the video is stored, the producer places a message in the queue with the name of the blob that needs to be compressed (2). There will likely be other jobs in the queue as well. The consumer will then pick up the work ticket, fetch the proper video from blob storage, compress the video (3), and then store the new video back in blob storage (4). Sometimes the process ends there, with the original producer being smart enough to look in the blob storage for the compressed version of the video, or perhaps a flag in a database is flipped to show that the processing has been completed.
The content of a message is always stored as a string. The string must be in a format that can be included in an XML message and be UTF-8 encoded, as shown in listing 1. This is because a message is returned from the queue in an XML format, with your content as part of that XML. It’s possible to store binary data, but you’d need to serialize and deserialize the data yourself. Keep in mind that when you’re deserializing, the content coming out of the message will be Base64 encoded.
Listing 1 A message in its native XML format
<MessageId>20be3f61-b70f-47c7-9f87-abbf4c71182b</MessageId> // #1
<InsertionTime>Fri, 07 Aug 2009 00:58:41 GMT</InsertionTime> // #2
<ExpirationTime>Fri, 14 Aug 2009 00:58:41 GMT</ExpirationTime>
<TimeNextVisible>Fri, 07 Aug 2009 00:59:16 GMT</TimeNextVisible>
#1 Each MessageId is unique
#2 Message placed in queue
Replace #1-2 with a cueball
The content of the message isn’t the only part of the message that you may want to work with. Every message has several important properties.
The ID property #1 is assigned by the storage system and is unique. This is the only way to uniquely differentiate messages from each other, since several messages could contain the same content.
A message also includes the time and date #2 the message was inserted into the queue. It can be handy to see how long the message has been waiting to be processed. For example, you might use this to determine if the messages are becoming stale in the queue. The storage service also uses InsertionTime to determine if your message should be garbage-collected. Any message that’s about a week old in any queue will be collected and discarded.
Now that we’ve discussed the anatomy and process of queues and taken a look at the properties of the messages that they hold, we’re ready to discuss how you work with the queue itself.
Setting Up a Queue
To reiterate, the queue is the mechanism that holds the messages, in a rough order, until they’re consumed. The queue is replicated in triplicate throughout the storage service, just like tables and blobs, for redundancy and performance reasons.
Queues can be created in a static manner, perhaps as part of deploying your application. They can also be created and destroyed in a dynamic manner. This is handy when you need a way to organize and direct messages in different directions based on real-time data or user needs.
Each queue can have an unlimited number of messages. The only real limit is how fast you can process the messages and whether you can do so before they’re garbage-collected after one week’s time.
Naming a Queue
Because a queue’s name appears in the URI for the REST request, it needs to follow the constraints that DNS names have.
- It must start with a letter or number and can contain only letters, numbers, and the dash (-) character.
- The first and last characters in the queue name must be alphanumeric. The dash (-) can’t be the first or last character.
- All letters in a queue name must be lowercase. (This requirement gets me every time.)
- A queue name must be from 3 to 63 characters long.
Attaching Metadata to a Queue
A queue also has a set of metadata associated with it. This metadata can be up to 8 KB and is a simple collection of name/value key pairs. This metadata can help you track and manage your queues. Although the name of a queue can help you understand the use of the queue, the metadata can be useful in a more dynamic situation. The name of the queue might be the customer number that the queue is related to, but you could store the customer’s service level (tin, silver, molybdenum, or gold) as a piece of metadata. This metadata then lives with the queue and can be accessed by any producer or consumer of the queue.
Queues are both a reliable and a persistent way to store and move messages. They’re reliable in that you should never lose a message, no matter what happens. This means not only that the system won’t go down and you won’t lose data, but also that when a consumer fails, the message will reappear on the queue. Queues are also strict in how they persist your messages. If a server goes down, the messages aren’t lost; they remain in the queue. This would differ from a purely memory-based system in which all of the messages would be lost if the server were to have a failure. We’ll look at these issues in more detail when we discuss the message lifecycle. Let’s turn now to the mechanics of working with the queue API.
Working with Basic Queue Operations
To learn how to use the basic queue API operations, we’re going to use a Simple Queue Browser. This little tool (shown in figure 4) will help us debug any system we’re building by helping us look at the queues that are being used and see how they’re working.
Download the Code
You can download the code at my blog, http://brianhprince.com/blog/downloads. You’ll need to have VS2010 Beta 2 (http://www.microsoft.com/visualstudio) and the Azure SDK (http://www.microsoft.com/windowsazure/tools/) installed.
Figure 4 We’ll use this Simple Queue Browser to work with queue and message methods. Please note that the authors, although charming, aren’t graphic designers or user interface specialists. This tool will act as a vehicle for understanding the basics of working with queues.
You’ll need to be able to work with several basic queue operations (see table 1). We’ll focus on the simple code that’s needed to use these operations with the queue browser.
Table 1 Basic queue methods
|ListQueues()||Lists the queues that exist in your storage account|
|Create() or CreateIfNotExist()||Creates queues in your account|
|Clear()||Clears a queue of all its pending messages|
|Delete()||Deletes a queue or a message from the system|
We aren’t going to focus on how the Windows Presentation Foundation (WPF) works or the best application architecture for this little application. The queue browser is meant to be “me-ware,” something that works for you and doesn’t have to work for anyone else. You should use it as a harness to play with the different APIs and learn how they work. Think of it as a sandbox in which to explore.
The first queue operation that we’re going to look at is a method that’ll tell you what queues exist in your account. You may not always need this. Usually, you’ll know what the queue for your application is and provision it for yourself.
To get a list of the queues available to you, you need to first connect to the queue service and then call the method, as shown in the following code.
private CloudQueueClient Qsvc;
private IEnumerable<CloudQueue> qList;
CloudStorageAccount storageAccount =
Qsvc = storageAccount.CreateCloudQueueClient(); // #1
qList = Qsvc.ListQueues();
#1 Connection to the queue service in the cloud
Replace #1 with a cueball in the next paragraph.
You’ll use something like #1 quite often. This creates a connection to the service, similarly to how you create a connection object to a database. You’ll want to create this once for a block of code and hold it in memory so that you aren’t always reconnecting to the service. In our application, we create this object in the constructor for the window itself and store it in a variable at the form level. This makes it available to every method in the form and saves us the trouble of having to continually re-create the object.
The CloudStorageAccount serves as a factory that creates service objects that represent the Azure queue storage service, called the CloudQueueClient. You can create the CloudQueueClient in several ways. The most common approach is to use it as we did here, by using the FromConfigurationSetting method. This looks into your configuration and sets up all of the URIs, usernames, account names, and the like. This is better than having to set four or five different parameters when you’re “newing up” the connection.
Once you have a handle to the queue service, you can call ListQueues(). This doesn’t just return a list of strings as you might expect but instead returns a collection of queue objects. Each queue object represents a queue in your account and is fully usable. The equivalent call in REST looks like this:
You can see that it’s a simple GET. If you don’t have any queues, you’ll get back an empty collection. Our next step is to create a queue.
Creating a Queue
Creating a queue in Windows Azure is relatively easy. Once you’ve used the previous code to get a connection to the queue service, you can call either Create() or CreateIfNotExist() to create a message queue:
CloudQueue q = Qsvc.GetQueueReference(“newordersqueue”); // #1
q.CreateIfNotExist(); // #2
#1 Gets handle to specific queue
#2 If queue doesn’t exist, creates it
Replace #1-2 with cueballs.
In #1, we create a CloudQueue object. This is an empty object, and this line doesn’t connect to the service. It’s merely an empty reference that doesn’t point to anything. Then the CreateIfNotExist method is called #2 on the queue object we just created. This will check to see if a queue with that name exists, and if it doesn’t, it will create one. This is very handy.
You can check whether a queue exists before you try to create it by using q.DoesQueueExist(). This method returns a Boolean value, telling you if the queue exists or not.
Our next step is to attach some metadata to the queue.
We can store up to 8 KB of data in the property bag for each queue. You might want to use SetMetadata() to track some core data about the queue or, perhaps, some core metrics on how often it should be polled. Figure 5 shows metadata that displays the back-off pace rate of a queue.
Figure 5 Displaying the metadata we set on a queue. You can attach up to 8 KB of metadata to a queue. Metadata, such as what the back-off pace rate should be, can help in managing a queue.
The following code shows how you might add and remove queue properties, which is how you attach queue metadata:
CloudQueue q = Qsvc.GetQueueReference(“newordersqueue”);
q.Metadata.Add(“ProjectName”, “ElectronReintroductionPhasing”); // #1
q.Metadata.Keys["BadKeyNoDoughnut"].Remove(); // #2
q.SetMetadata(); // #3
#1 Calls Add()and specifies key name and value
#2 Calls Remove() on key specified
#3 Applies changes to queue in the cloud
Replace #1-3 with Cueballs.
The metadata for a queue is attached as a property on the CloudQueue object #1. You work with it as you would any other name value collection. At #1 we’re adding a new entry to the metadata called ProjectName, with a value of ElectronReintroductionPhasing. This new entry won’t be saved back to the queue service, though, until we call SetMetaData() #3. This connects to the service and uploads our metadata for the queue in the cloud.
You can remove existing properties from the bag if you no longer need them. At #2, you can see how easy it is for us to remove the BadKeyNoDoughnut from use. Removing an item from the metadata collection must also be followed by a SetMetaData() call to persist the changes to the cloud.
Now that we’ve created a queue and set its metadata, let’s look at how to delete a queue.
Deleting a Queue
It’s good practice to clear a queue before you delete it. This removes all of the messages from the queue. The clear-queue method is handy for resetting a system or clearing out some poison messages that may have stopped up the flow. Clearing and deleting a queue is simple:
CloudQueue q = Qsvc.GetQueueReference(“newordersqueue”);
q = null;
Deleting a queue is as simple as that, when using the client library. The equivalent REST call looks like this:
Being able to create and destroy queues with a single line of code makes them simple objects to work with. In the past, using a queue in your system would require days, if not weeks, of installing several queue servers (for redundancy purposes). They’d also require a lot of care and feeding. Queues in the cloud are much easier to work with and require no grooming or maintenance. The real power of queues, however, is the messages that flow through them, which we’ll delve into in the second part of this article.