Storage Architecture in Windows Azure Photo Mosaics Program
Join the DZone community and get the full member experience.
Join For FreePrior to my vacation break, I presented the overall workflow of the Azure Photo Mosaics program, and now it's time to dig a little deeper. I'm going to start off looking at the storage aspects of the application and then tie it all together when I subsequently tackle the the three roles that comprise the application processing. If you want to crack open the code as you go through this blog post (and subsequent ones), you can download the source for the Photo Mosaics application from where else but my Azure storage account!
For the Photo Mosaics application, the storage access hierarchy can be represented by the following diagram. At the top is the core Windows Azure Storage capabilities, accessed directly via a REST API but with some abstractions for ease of programming (namely the Storage Client API), and that is followed by my application-specific access layers (in green). In this post I'll cover each of these components in turn.
Windows Azure Storage
As you likely know, Windows Azure Storage has four primary components:
- Tables - highly scalable, schematized, but non-relational storage,
- Blobs - unstructured data organized into containers,
- Queues - used for messaging between role instances, and
- Drives - (in beta) enabling mounting of NTFS VHDs as page blobs
All of your Windows Azure Storage assets (blobs, queues, tables, and drives) fall under the umbrella of a storage account tied to your Windows Azure subscription, and each storage account has the following attributes:
- can store up to 100TB of data
- can house an unlimited number of blob containers
- with a maximum of 200MB per block blob and
- a maximum of 1TB per page blob (and/or Windows Azure Drive)
- can contain an unlimited number of queues
- with a maximum of 8KB per queue message
- can store an unlimited number of tables
- with a maximum of 1MB per entity ('row') and
- a maximum of 255 properties ('columns') per entity
- three copies of all data maintained at all times
- REST-based access over HTTP and HTTPS
- strongly (versus eventually) consistent
REST API Access
The native API for storage access is REST-based, which is awesome for openness to other application platforms - all you need is a HTTP stack. When you create a storage account in Windows Azure, the account name becomes the first part of the URL used to address your storage assets, and the second part of the name identifies the type of storage. For example in a storage account named azuremosaics:
- http://azuremosaics.blob.core.windows.net/flickr/001302.jpg uniquely identifies a blob resource (here an image) that is named 001302.jpg and located in a container named flickr. The HTTP verbs POST, GET, PUT, and DELETE when applied to that URI carry out the traditional CRUD semantics (create, read, update, and delete).
- http://azuremosaics.table.core.windows.net/Tables provides an entry point for listing, creating, and deleting tables; while a GET request for the URI http://azuremosaics.table.core.windows.net/jobs?$filter=id eq 'F9168C5E-CEB2-4faa-B6BF-329BF39FA1E4' returns the entities in the jobs table having an id property with value of the given GUID.
- A POST to http://azuremosaics.queue.core.windows.net/slicerequest/messages would add the message included in the HTTP payload to the slicerequest queue.
Conceptually it's quite simple and even elegant, but in practice it's a bit tedious to program against directly. Who wants to craft HTTP requests from scratch and parse through response payloads? That's where the Windows Azure Storage Client API comes in, providing a .NET object layer over the core API. In fact, that's just one of the client APIs available for accessing storage; if you're using other application development frameworks there are options for you as well:
Ruby | WAZ Storage Gem |
PHP | Windows Azure SDK for PHP Developers |
Java | Windows Azure SDK for Java Developers |
Python | Python Client Wrapper for Windows Azure Storage |
Note too that with the exception of opting into a public or shared access policy for blobs, every request to Windows Azure Storage must include an authorization header whose value contains a hashed sequence calculated by applying the HMAC-SHA256 algorithm over a string that was formed by concatenating various properties of the specified HTTP request that is to be sent. Whew! The good news is that the Storage Client API takes care of all that grunt work for you as well.
Storage Client API
The Storage Client API, specifically the Microsoft.WindowsAzure.StorageClient namespace, is where .NET Developers will spend most of their time interfacing with Windows Azure Storage. The Storage Client API contains around 40 classes to provide a clean programmatic interface on top of the core REST API, and those classes roughly fall into five categories:
- Blob classes
- Queue classes
- Table classes
- Drive classes
- Classes for cross-cutting concerns like exception handling and access control
Each of the core storage types (blobs, queues, tables, and drives) have what I call an entry-point class that is initialized with account credentials and then used to instantiate additional classes to access individual items such as messages within queues and blobs within containers. For blob containers, that class is CloudBlobClient, and for queues it's CloudQueueClient. Table access is via both CloudTableClient and a TableServiceContext; the latter extends System.Data.Services.Client.DataServiceContext which should ring a bell for those that have been working with LINQ to SQL or the Entity Framework. And, finally, CloudDrive is the entry point for mounting a Windows Azure Drive on the local file system, well local to the Web or Worker Role that's mounting it.
While the Storage Client API can be used directly throughout your code, I'd recommend encapsulating it in a data access layer, which not only aids testability but also provides a single entry point to Azure Storage and, by extension, a single point for handling the credentials required to access that storage. In the Photo Mosaics application, that's the role of the CloudDAL project discussed next.
Cloud DAL
Within the CloudDAL, I've exposed three points of entry to each of the storage types used in the Photo Mosaics application; I've called them TableAccessor, BlobAccessor, and QueueAccessor. These classes offer a repository interface for the specific data needed for the application and theoretically would allow me to replace Windows Azure Storage with some other storage provider altogether without pulling apart the entire application (though I haven't actually vetted that :)) Let's take a look at each of these 'accessor' classes in turn.
TableAccessor (in Table.cs)
The TableAccessor class has a single public constructor requiring the account name and credentials for the storage account that this particular instance of TableAccessor is responsible for. The class itself is essentially a proxy for a CloudTableClient and a TableServiceContext reference:
- _tableClient provides an entry point to test the existence of and create (if necessary) the required Windows Azure table, and
- _context provides LINQ-enabled access to the two tables, jobs and status, that are part of the Photo Mosaics application. Each table also has an object representation implemented via a descendant of the TableServiceEntity class that has been crafted to represent the schema of the underlying Windows Azure table; here those descendant classes are StatusEntity and JobEntity.
The methods defined on TableAccessor all share a similar pattern of testing for the table existence (via _tableClient) and then issuing a query (or update) via the _context and appropriate TableServiceEntity classes. In a subsequent blog post, we'll go a bit deeper into the implementation of TableAccessor (and some of its shortcomings!).
BlobAccessor (in Blob.cs)
BlobAccessor is similarly a proxy for a CloudBlobClient reference and like the TableAccessor is instantiated using a connection string that specifies a Windows Azure Storage Account and its credentials. The method names are fairly self-descriptive, but I'll also have a separate post in the future on the details of blob access within the Photo Mosaics application.
QueueAccessor (in Queue.cs and Messages.cs)
The QueueAccessor class is the most complex (and the image above doesn't even show the associated QueueMessage classes!). Like BlobAccessor and TableAccessor, QueueAccessor is a proxy to a CloudQueueClient reference to establish the account credentials and manage access to the individual queues required by the application.
You may notice though that QueueAccessor is a bit lean on methods, and that's because the core queue functionality is encapsulated in a separate class, ImageProcessingQueue, and QueueAccessor instantiates a static reference to instances of this type to represent each of the four application queues. It's through the ImageProcessingQueue references (e.g., QueueAccessor.ImageRequestQueue) that the web and worker roles read and submit messages to each of the four queues involved in the Photo Mosaics application. Of course, we'll look at how that works too in a future post.
With the CloudDAL layer, I have a well-defined entry point into the three storage types used by the application, and we could stop at that; however, there's one significant issue in doing so: authentication. The Windows Forms client application needs to access the blobs and tables in Windows Azure storage, and if it were to do so by incorporating the CloudDAL layer directly, it would need access to the account credentials in order to instantiate a TableAccessor and BlobAccessor reference. That may be viable if you expect each and every client to have his/her own Windows Azure storage account, and you're careful to secure the account credentials locally (via protected configuration in app.config or some other means). But even so, by giving clients direct access to the storage account key, you could be giving them rope to hang themselves. It's tantamount to giving an end-user credentials to a traditional client-server application's SQL Server database, with admin access to boot!
Unless you're itching for a career change, such an implementation isn't in your best interest, and instead you'll want to isolate access to the storage account to specifically those items that need to travel back and forth to the client. An excellent means of isolation here is via a Web Service, specifically implemented as a WCF Web Role; that leads us to the StorageBroker service.
StorageBroker Service
StorageBroker is a C# WCF service implementing the IStorageBroker interface, and each method of the IStorageBroker interface is ultimately serviced by the CloudDAL (for most of the methods the correspondence is names is obvious). Here, for instance, is the code for GetJobsForClient, which essentially wraps the identically named method of TableAccessor.
public IEnumerable<JobEntry> GetJobsForClient(String clientRegistrationId) { try { String connString = new StorageBrokerInternal(). GetStorageConnectionStringForClient(clientRegistrationId); return new TableAccessor(connString).GetJobsForClient(clientRegistrationId); } catch (Exception e) { Trace.TraceError("Client: {0}{4}Exception: {1}{4}Message: {2}{4}Trace: {3}", clientRegistrationId, e.GetType(), e.Message, e.StackTrace, Environment.NewLine); throw new SystemException("Unable to retrieve jobs for client", e); } }
Since this is a WCF service, the Windows Forms client can simply use a service reference proxy (Add Service Reference… in Visual Studio) to invoke the method over the wire, as in JobListWindow.vb below. This is done asynchronously to provide a better end-user experience in light of the inherent latency when accessing resources in the cloud.
Private Sub JobListWindow_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load Dim client = New AzureStorageBroker.StorageBrokerClient() AddHandler client.GetJobsForClientCompleted, Sub(s As Object, e2 As GetJobsForClientCompletedEventArgs) ' details elided End Sub client.GetJobsForClientAsync (Me._uniqueUserId) End Sub
Now, if you look closely as the methods within StorageBroker, you'll notice calls like the following:
String connString = new StorageBrokerInternal (). GetStorageConnectionStringForClient(clientRegistrationId);
What's with StorageBrokerInternal? Let's find out.
StorageBrokerInternal Service
Although StorageBrokerInternal service may flirt with the boundaries of YAGNI, I incorporated it for two major reasons:
I disliked having to replicate configuration information across each of the (three) roles in my cloud service. It's not rocket science to cut and paste the information within the ServiceConfiguration.cscfg file (and that's easier than using the Properties UI for each role in Visual Studio), but it just seems 'wrong' to have to do that. StorageBrokerInternal allows the configuration information to be specified once (via the AzureClientInterface Web Role's configuration section) and then exposed to any other roles in the cloud application via a WCF service exposed on an internal HTTP endpoint – you might want to check out my blog post on that adventure topic as well.
I wanted some flexibility in terms of what storage accounts were used for various facets of the application, with an eye toward multi-tenancy. In the application, some of the storage is tied to the application itself (the queues implementing the workflow, for instance), some storage needs are more closely aligned to the client (the jobs and status tables and imagerequest and imageresponse containers), and the rest sits somewhere in between.
Rather than 'hard-code' storage account information for data associated with clients, I wanted to allow that information to be more configurable. Suppose I wanted to launch the application as a bona fide business venture. I may have some higher-end clients that want a higher level of performance and isolation, so perhaps they have a dedicated storage account for the tables and blob containers, but I may also want to offer a free or trial offering and store the data for multiple clients in a single storage account, with each client's information differentiated by a unique id.
The various methods in StorageBrokerInternal provide a jumping off point for such an implementation by exposing methods that can access storage account connection strings tied to the application itself, or to a specific blob container, or indeed to a specific client. In the current implementation, these methods all do return information 'hard-coded' in the ServiceConfiguration.cscfg file, but it should be apparent that their implementations could easily be modified, for example, to query a SQL Azure database by client id and return the Windows Azure storage connection string for that particular client.
Within the implementation of the public StorageBroker service, you'll see instantiations of the StorageBrokerInternal class itself to get the information (since they are in the same assembly and role), but if you look at the other roles within the cloud service (AzureImageProcessor and AzureJobController) you'll notice they access the connection information via WCF client calls implemented in the InternalStorageBrokerClient.cs file of the project with the same name. For example, consider this line in WorkerRole.cs of the AzureJobController
// instantiate storage accessors blobAccessor = new BlobAccessor( StorageBroker.GetStorageConnectionStringForAccount(requestMsg.ImageUri));
It's clearly instantiating a new BlobAccessor (part of the CloudDAL) but what account is used is determined as follows:
- The worker role accesses the URI of the image reference within the queue message that it's currently processing.
- It makes a call to the StorageBrokerInternal service (encapsulated by the static StorageBroker class) passing the image URI
- The method of the internal storage broker (GetStorageConnectionStringForAccount) does a lookup to determine what connection string is appropriate for the given reference. (Again the current lookup is more or less hard-coded, but the methods of StorageBrokerInternal do provide an extensibility point to partition the storage among discrete accounts in just about any way you'd like).
I'd like to say I meant for it to be that way, but this is clearly not a complete implementation. There's a reason for the apparent oversight! First of all, domain authentication won't really work here, it's a public service and there is no Active Directory in the cloud. Some type of forms-based authentication is certainly viable though, and that would require setting up forms authentication in Windows Azure using either Table Storage or SQL Azure as the repository. There are sample providers available to do that (although I don't believe any are officially supported), but doing so means managing users as part of my application and incurring additional storage costs, especially if I bring in SQL Azure just for the membership support.
For this application, I'd prefer to offload user authentication to someone else, such as by providing customers access via their Twitter or Facebook credentials, and it just so happens that the Windows Azure AppFabric Access Control Service will enable me to do just that. That's not implemented yet in the Photo Mosaics application, partly because I haven't gotten to it and partly because I didn't want to complicate the sample that much more at this point in the process of explaining the architecture. So consider fixing this hole in the implementation as something on our collective 'to do' list, and we'll get to it toward the end of the blog series.
Client
At
long last we reach the bottom of our data access stack: the client.
Here, the client is a simple Windows Forms application, but since the
public storage interface occurs via a WCF service over a BasicHttpBinding,
providing a Web or mobile client in .NET or any Web Service capable
language should be straightforward. In the Windows Forms client
distributed with the sample code, I used the Add Service Reference…
mechanism in Visual Studio to create a convenient client proxy (see
right) and insulate myself from the details of channels, bindings, and
the such.
One important thing to note is that all of the access to storage from the client is done asynchronously to provide a responsive user experience regardless of the latency in accessing Windows Azure storage.
Key Takeaways
I realize that was quite a bit to wade through and it's mostly not even specific to the application itself; it's all been infrastructure so far! That observation though is a key point to take away. When you're building applications for the cloud – for scale and growth – you'll want to tackle key foundational aspects first (and not back yourself into a corner later on!):
How do I handle multi-tenancy?
How do I expose and manage storage and other access credentials?
How do future-proof my application for devices and form factors I haven't yet imagined?
There's no one right approach to all of these concerns, but hopefully by walking you through how I tackled them, I've provided you some food for thought as to how to best architect your own applications as you head to the cloud.
Next up: we'll look at how blob storage is used in the context of the Azure Photo Mosaics application.
Opinions expressed by DZone contributors are their own.
Trending
-
A Data-Driven Approach to Application Modernization
-
Effective Java Collection Framework: Best Practices and Tips
-
How Agile Works at Tesla [Video]
-
DevOps Midwest: A Community Event Full of DevSecOps Best Practices
Comments