Windows Azure Storage provides three ways to store your unstructured data: BLOBs, queues, and tables.
All three forms use the same backend infrastructure. They are all accessible through the .NET Storage Client Library (provided with the SDK) or directly with REST. Because you can access all storage with REST, the code using the storage doesn’t have to be running in Windows Azure. It could be running on a phone, in a browser, or on a server in your data center.
Windows Azure stores all data in triplicate. When data is written, a response is not returned until two replicas write the data successfully. The third replica is written asynchronously.
All data lives in a storage account, which can contain any combination of storage types. A single storage account can hold up to 100TB of data and has an account name and two account keys. The account name is like your user name, and the account key is like your password. You should never share these with anyone.
If you are using the .NET Storage Client Library, then you can store your credentials in your ServiceConfiguration.csdef and it will be automatically picked up.
<Setting name=”DataConnectionString” value=”DefaultEndpointsProtocol=https;
AccountName=[YOUR_ACCOUNT_NAME];AccountKey=[YOUR_ACCOUNT_KEY]” />
<ConfigurationSettings>
<Setting name=”DataConnectionString”/>
</ConfigurationSettings>
The first object you will work with is CloudStorageAccount. This represents the storage account you created in the portal and contains the credentials needed to use storage. From the storage account, you create a client object. There is one type of client object for each storage type.
Use BLOBs for Video, Image and Other Digital Files
BLOB stands for Binary Large Object. You can think of BLOB storage as a file system in the cloud. A storage account can contain any number of BLOB containers. Each container is like a root folder. Containers do not contain child containers. A container will have one of several different access levels set:
- Private: This is the default setting. All reads and writes must use the account name and account key.
- Full public read: This provides full anonymous read permissions. The reader can read BLOBs and list the container’s contents.
- Public read only: This is similar to Full Public Read, but the user does not have permissions to list the contents of the container.
To connect to and work with BLOBs, you need to create a CloudStorageAccount object and then a CloudBlobClient object. This will let you get a reference to a BLOB container, which is like a root folder. All of the storage client objects have you get references to objects before they are created in the cloud. For example, you will make a reference to a container and then call container.CreateIfNotExists() to create the container. To upload a file, you would create a CloudBlockBlob object from your container reference and then call one of the upload methods. As an example, use the following code to create a container and upload a local file.
using Microsoft.WindowsAzure;
using Microsoft.WindowsAzure.StorageClient;
using Microsoft.WindowsAzure.ServiceRuntime;
CloudStorageAccount storageAccount = CloudStorageAccount.FromConfigurationSe
tting(“DataConnectionString”);
CloudBlobClient blobStorage = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobStorage.GetContainerReference(“mydocs”);
container.CreateIfNotExist();
CloudBlockBlob blob = container.GetBlockBlobReference(“birthday.mpg”);
blob.UploadFile(@”c:\tempfiles\birthday.mpg”);
Queues Will Decouple Front End from Back End
Windows Azure Storage queues are very similar to queues you have probably used before. They are most commonly used to communicate from a front-end server to a back-end server. Queues are very handy in decoupling the front end from the back end of your system.
Queues allow you to send small 8KB messages from producers (commonly the front end) to consumers (commonly the back end). Queues are First In, First Out (FIFO). The first message in is the first message out.
Queues can store an essentially unlimited number of messages at one time. A queue can have any number of producers submitting messages and any number of consumers taking messages.
Because of the size limitation of a queue message, you will usually use a work ticket pattern. You will store the real data to be worked on in a BLOB container, or a table, and put a pointer to the data in the message.
Messages go through a specific lifecycle in the queue. A consumer will read a message and provide a timeout (defaults to 30 seconds and can be as long as 2 hours). The queue will mark the message as invisible and lock it for this period of time. Before the timeout expires, the reader of the message must call back with a delete message command. If it doesn’t, then the queue assumes the consumer has failed, cancels the lock, and marks the message as visible on the queue again.
To work with a queue, you need to create a CloudStorageAccount and a CloudQueueClient object to manage queues and messages. To connect to and create a queue:
CloudStorageAccount storageAccount = CloudStorageAccount.FromConfigurationSe
tting(“DataConnectionString”);
CloudQueueClient queueClient = storageAccount.CreateCloudQueueClient();
CloudQueue q = queueClient.GetQueueReference(“newordersqueue”);
q.CreateIfNotExist();
To add a message to the queue, you simple call the AddMessage method of the queue object and pass in the string value of the message you want to send.
CloudQueueMessage theNewMessage = new CloudQueueMessage(“shopping
rt:1337”);
q.AddMessage(theNewMessage);
To get the next message off of the queue, you simply call GetMessage. You can get more than one message at a time. If the queue is empty, GetMessage will return a null.
CloudQueueMessage currentMsg;
currentMsg = q.GetMessage();
Finally, to delete a message you have finished working with, call DeleteMessage.
q.DeleteMessage(currentMsg);
Tables: Up to 100TB of Non-SQL Data
You can have several tables per storage account. Each table can hold up to 100TB of data and is meant for highly scalable non-relational data. Each table is comprised of entities (like rows in a normal database), and each entity is comprised of many properties. Entities in the same table can have a different schema, giving Windows Azure tables a great deal of flexibility. Tables do not have any relationships with other tables; there are no joins or foreign keys.
Each entity in a table must have a RowKey and a PartitionKey property. These two properties together act as a sort of composite primary key for the entity. A property named TimeStamp is also required.
The entities in tables are grouped into partitions using the PartitionKey. Partitions are how the table service scales. As a particular partition becomes busy, it is spun out to a new storage server so that it has more resources to handle the requests. This could happen to all of your partitions at once, fanning out to different machines to make sure that the system is scaling to the demands put on it. Choosing a PartitionKey strategy is important when designing any tables that might require high performance and scale.
The RowKey is a unique row identifier for that row in its partition. Both the RowKey and the PartitionKey can be anything you want them to be, but it is best to keep them simple INTs or strings. Tables can be accessed directly with REST or through the .NET Storage Client Library. Tables present an OData endpoint, which makes it easy to work with them as a data source.
To work with tables, you need a few working pieces. You need a class that represents your data; and in most modern architectures, this is the data transfer object (DTO) or plain old .NET object (PONO) that has just properties (i.e., no methods). This class needs to inherit from Microsoft.WindowsAzure.StorageClient. TableServiceEntity and must provide for the RowKey and PartitionKey values.
public class ShoppingCartEntry :
Microsoft.WindowsAzure.StorageClient.TableServiceEntity
{
public ShoppingCartEntry(int _shoppingCartID)
{
PartitionKey = “carts”;
RowKey = _shoppingCartID;
}
public int CustomerID { get; set; }
public string Sku { get; set; }
public int quantity { get; set; }
}
The Client Storage Library uses WCF Data Services to work with Windows Azure Tables, which means you will need a context class. This is a class that sits between the entity class (shown above) and the table itself. This is like any other WCF Data Services context class. This class must inherit from TableServiceContext. This base class provides all the plumbing you need.
public class ShoppingCartDataContext : TableServiceContext
{
public ShoppingCartDataContext (string baseAddress,
StorageCredentials credentials)
: base(baseAddress, credentials)
{ }
public IQueryable<ShoppingCartEntry> ShoppingCartEntry
{
get
{
return this.CreateQuery<ShoppingCartEntry>(“ShoppingCartEntry”);
}
}
}
Once you have these two classes, you can start working with the data you have in your table. For example, we will add a shopping cart called aNewShoppingCart. To start, you need to provide a storage account and a data context object. You should keep the data context class around as much as you can instead of creating a new object on each call.
CloudStorageAccount storageAccount = CloudStorageAccount.FromConfigurationSe
tting(“DataConnectionString”);
ShoppingCarDataContext context = new ShoppingCarDataContext
(storageAccount.TableEndpoint.AbsoluteUri, storageAccount.Credentials);
context.AddObject(“ShoppingCartEntry”, aNewShoppingCart);
context.SaveChanges();
You must always remember to call SaveChanges on the context object once you have made changes. If you don’t, the changes are never sent to the cloud. A similar approach is used for updating data.
aNewShoppingCart.Sku = “31415”;
context.UpdateObject(aNewShoppingCart);
context.SaveChanges();
You can batch many operations against the context object before calling SaveChanges.
Querying against the table is easy using the context object and LINQ.
var results = from g in this.context.ShoppingCartEntry
where g.Sku == “31415”
select g;
This will return a list of objects that have their SKU numbers set to 31415.
{{ parent.title || parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}