DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Comparing Storage Options in Azure

Comparing Storage Options in Azure

Rob Sanders user avatar by
Rob Sanders
·
Sep. 13, 12 · Interview
Like (1)
Save
Tweet
Share
4.54K Views

Join the DZone community and get the full member experience.

Join For Free

How do you choose a storage approach for Azure applications? First, make sure you're asking the right questions. Important considerations include data type (Relational? Structured? Unstructured?), lifecycle, APIs, and scalability. Is the cost based on capacity or bandwidth? Take a close look at each system and compare these details in-depth.

1. SQL Azure

Pros

Ticks a lot of the boxes.  Structured, API access, random access with good scalability but manual horizontal and vertical scaling.  Cost is by price.

Cons

Throttling can be inconsistent,  backups don’t work the same way as SQL Server, threshold under load can be unpredictable and the feature set is not identical to SQL Server.  Need to actively police retry attempts and manage outages.

2. Table storage (NoSQL)

Pros

Structured, permanent, web service API (plus managed APIs – many languages), random access, programmatic vertical and horizontal scale (easy to arrange data).  Need to understand how to design for it.  No relationships (key and column based).  Data is flat, like an index able CSV file.  Cost based on size/transaction.

Cons

Identifying sets are table name, partition key and row key (typed columns, very flat).  No secondary indexes.
Records limited to around 1MB with single columns at 64kb each.  Limited data types (mainly primitives).  Data modelling could be difficult compared to relational modelling.  How is a domain model mapped into rows and columns, with no relationships?

3. Blob Storage

Pros

Unstructured, permanent, random access, web service API, a bunch of data with metadata, automatic scaling (H & V) and cost is based on size and transactions.  Could store, for example, a serialized object into BLOB storage.

Two types: Block Blobs & Page blobs.
Block: Up to 200 GB, sequential write, not easy to update.

Page: Up to 1 TB, individually addressable 512k blocks.

4. Queues

Pros

Distributing workload, unstructured, permanent, sequential ordered access (FIFO), web service based, managed API and cost based on size/transactions.  Great failure recovery support.  Separate dequeue and delete operations, Failure to delete will see the message return.  Multiple readers/writers, cost based on transactions.

Cons

No notification mechanism, but supports polling.  Be wary of over used polling (cost/transaction based).  Not brilliant performance (especially enqueuing), small messages (~64kb).  Can dequeue 32 messages at a time.  Not guaranteed FIFO behaviour.  Larger message needs a pointer to a blob.

5. Cache

Transient, unstructured (key/value store), web service API, .Net SDK, auto or manual scale, shared and dedicated available, cost based on size (128mb-4gb). Equivalent of memcache (Java).

Stored in-memory, local in—memory if required, distributed notification model (to invalidate local copies), automatically purges if reach quota, limitations on bandwidth and connections.

6. Content Delivery Network (CDN)

Geo distributed cache, used by Akamai, not strictly part of Azure. Only way to deliver to a targeted geography. Mirrors HTTP(S) content, control availability by HTTP headers, generally cheaper than delivering content through Azure.  Microsoft CDN is perhaps easier to use. Can connect BLOB storage to CDN. Transient storage.

7. Apache Hadoop

Java based distributed processing of large data sets. Highly scalable, with an option to deploy a Hadoop cluster from Azure. Reliable computation, suitable for structured and unstructured data. Storage and processing capability.

8. Virtual Machine

Install anything you need (MySQL, memcache, Oracle?). Why would you resort to a VM? Any legacy or dependencies or use of MySQL (and others). Possibly not the right approach for new green field apps.

Performance

Emulator environments are not reliable measures of performance. Test on real Azure. The platform is dynamic, so perform additional testing. Make sure high-volume situations are tested. Test beyond read/write scenarios. Test common scenarios and keep an eye on edge case scenarios.

Testing Methodologies

A sample test plan: Build a simple application, control API, use multiple workers and test at different levels of load.  Stress-testing the platform is okay.

20120912-142648.jpg

Results: mid sized batches – table storage seems to be the winner. Depends on your own specific application, so should profile multiple storage options. SQL needs to be designed with sharding or caching to keep load manageable.

Patterns for Performance

Tiered storage, output caching, queued updates. Use the right storage for the right data.

[Local Cache] (Transient)
[Azure Cache]
[Table Storage]
[SQL Azure] (Structured)

Architecturally challenging – performance, instrumentation, development support, etc  Table storage allows for denormalization – multiple copies. Queued updates can help. Output caching can be of benefit – cache at the presentation tier, generated JSON/etc, locally (IIS) but beware stale data. CDN caching edge caching returned reduce load on servers, geographically targeted content.

How do you choose?

Understand your design needs, resource and environment. Ensure that design is proportional to scale needs. Determine your data complexity needs/requirements.

Examples:

Line of business – transactional, small user base (<100), developers generally SQL-experienced, hybrid applications (online/offline) and very complex data.  Uses SQL Azure and Azure Cache for acceleration (simple – keep transactions/size down = lower cost).

Internet scale application – No transactions, read optimized, data partitionable, often simple data model.  Use Azure Table Storage, Apache Hadoop for analysis, Azure Cache for acceleration.

Database azure sql Relational database Content delivery network Web Service hadoop Data (computing) Cache (computing) application

Published at DZone with permission of Rob Sanders, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Handling Automatic ID Generation in PostgreSQL With Node.js and Sequelize
  • The Enterprise, the Database, the Problem, and the Solution
  • Integration: Data, Security, Challenges, and Best Solutions
  • Explaining: MVP vs. PoC vs. Prototype

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: