Cloud Architecture Resources

The Latest Cloud Architecture Topics

Autoscaling your webserver tier is typically straightforward. Image your apache server with source code or without, then sync down files from S3 upon spinup. Roll that image into the autoscale configuration and you’re all set. With the database tier though, things can be a bit tricky. The typical configuration we see is to have a single master database where your application writes. But scaling out or horizontally on Amazon EC2 should be as easy as adding more slaves, right? Why not automate that process? Below we’ve set out to answer some of the questions you’re likely to face when setting up slaves against your master. We’ve included instructions on building an AMI that automatically spins up as a slave. Fancy! How can I autoscale my database tier? Build an auto-starting MySQL slave against your master. Configure those to spinup. Amazon’s autoscaling loadbalancer is one option, another is to use a roll-your-own solution, monitoring thresholds on servers, and spinning up or dropping off slaves as necessary. Does an AWS snapshot capture subvolume data or just the SIZE of the attached volume? In fact, if you have an attached EBS volume and you create an new AMI off of that, you will capture the entire root volume, plus your attached volume data. In fact we find this a great way to create an auto-building slave in the cloud. How do I freeze MySQL during AWS snapshot? mysql> flush tables with read lock;mysql> system xfs_freeze -f /data At this point you can use the Amazon web console, ylastic, or ec2-create-image API call to do so from the command line. When the server you are imaging off of above restarts – as it will do by default – it will start with /data partition unfrozen and mysql’s tables unlocked again. Voila! If you’re not using xfs for your /data filesystem, you should be. It’s fast! The xfsprogs docs seem to indicate this may also work with foreign filesystems. Check the docs for details. How do I build an AMI mysql slave that autoconnects to master? Install mysql_serverid script below. Configure mysql to use your /data EBS mount. Set all your my.cnf settings including server_id Configure the instance as a slave in the normal way. When using GRANT to create the ‘rep’ user on master, specify the host with a subnet wildcard. For example ’10.20.%’. That will subsequently allow any 10.20.x.y servers to connect and replicate. Point the slave at the master. When all is running properly, edit the my.cnf file and remove server_id. Don’t restart mysql. Freeze the filesystem as described above. Use the Amazon console, ylastic or API call to create your new image. Test it of course, to make sure it spins up, sets server_id and connects to master. Make a change in the test schema, and verify that it propagates to all slaves. How do I set server_id uniquely? As you hopefully already know, in MySQL replication environment each node requires a unique server_id setting. In my Amazon Machine Images, I want the server to startup and if it doesn’t find the server_id in the /etc/my.cnf file, to add it there, correctly! Is that so much to ask? Here’s what I did. Fire up your editor of choice and drop in this bit of code: #!/bin/shif grep -q “server_id” /etc/my.cnf then : # do nothing – it’s already set else # extract numeric component from hostname – should be internet IP in Amazon environment export server_id=`echo $HOSTNAME | sed ‘s/[^0-9]*//g’` echo “server_id=$server_id” >> /etc/my.cnf # restart mysql /etc/init.d/mysql restart fi Save that snippet at /root/mysql_serverid. Also be sure to make it executable: $ chmod +x /root/mysql_serverid Then just append it to your /etc/rc.local file with an editor or echo: $ echo "/root/mysql_serverid" >> /etc/rc.local Assuming your my.cnf file does *NOT* contain the server_id setting when you re-image, then it’ll set this automagically each time you spinup a new server off of that AMI. Nice! Can you easily slave off of a slave? How? It’s not terribly different from slaving off of a normal master. A. First enable slave updates. The setting is not dynamic, so if you don’t already have it set, you’ll have to restart your slave. log_slave_updates=true B. Get an initial snapshot of your slave data. You can do that the locking way: mysql> flush tables with read lock;mysql> show master status\G; mysql> system mysqldump -A > full_slave_dump.mysql mysql> unlock tables; You may also choose to use Percona’s excellent xtrabackup utility to create hotbackups without locking any tables. We are very lucky to have an open-source tool like this at our disposal. MySQL Enterprise Backup from Oracle Corp can also do this. C. On the slave, seed the database with your dump created above. $ mysql < full_slave_dump.mysql D. Now point your slave to the original slave. mysql> change master to master_user='rep', master_password='rep', master_host='192.168.0.1', master_log_file='server-bin-log.000004', master_log_pos=399;mysql> start slave; mysql> show slave status\G; Slave master is set as an IP address. Is there another way? It’s possible to use hostnames in MySQL replication, however it’s not recommended. Why? Because of the wacky world of DNS. Suffice it to say MySQL has to do a lot of work to resolve those names into IP addresses. A hickup in DNS can interrupt all MySQL services potentially as sessions will fail to authenticate. To avoid this problem do two things: A. Set this parameter in my.cnf skip_name_resolve = true Remove entries in mysql.user table where hostname is not an IP address. Those entries will be invalid for authentication after setting the above parameter. Doesn’t RDS take care of all of this for me? RDS is Amazon’s Relational Database Service which is built on MySQL. Amazon’s RDS solution presents MySQL as a service which brings certain benefits to administrators and startups: Simpler administration. Nuts and bolts are handled for you. Push-button replication. No more struggling with the nuances and issues of MySQL’s replication management. Simplicity of administration of course has it’s downsides. Depending on your environment, these may or may not be dealbreakers. No access to the slow query log. This is huge. The single best tool for troubleshooting slow database response is this log file. Queries are a large part of keeping a relational database server healthy and happy, and without this facility, you are severely limited. Locked in downtime window When you signup for RDS, you must define a thirty minute maintenance window. This is a weekly window during which your instance *COULD* be unavailable. When you host yourself, you may not require as much downtime at all, especially if you’re using master-master mysql and zero-downtime configuration. Can’t use Percona Server to host your MySQL data. You won’t be able to do this in RDS. Percona server is a high performance distribution of MySQL which typically rolls in serious performance tweaks and updates before they make it to community addition. Well worth the effort to consider it. No access to filesystem, server metrics & command line. Again for troubleshooting problems, these are crucial. Gathering data about what’s really happening on the server is how you begin to diagnose and troubleshoot a server stall or pileup. You are beholden to Amazon’s support services if things go awry. That’s because you won’t have access to the raw iron to diagnose and troubleshoot things yourself. Want to call in an outside consultant to help you debug or troubleshoot? You’ll have your hands tied without access to the underlying server. You can’t replicate to a non-RDS database. Have your own datacenter connected to Amazon via VPC? Want to replication to a cloud server? RDS won’t fit the bill. You’ll have to roll your own – as we’ve described above. And if you want to replicate to an alternate cloud provider, again RDS won’t work for you. Related posts: Deploying MySQL on Amazon EC2 – 8 Best Practices Review: Host Your Web Site In The Cloud, Amazon Web Services Made Easy 5 Ways to Boost MySQL Scalability Top MySQL DBA interview questions (Part 2) MySQL Cluster In The Cloud – Managers Guide

July 20, 2012

by Sean Hull

· 18,573 Views

Working with MongoDB MultiMaster

Learn all about working with MondoDB multimaster.

July 11, 2012

by Rick Copeland

· 28,289 Views · 2 Likes

Everything You Need To Know About Couchbase Architecture

After receiving a lot of good feedback and comment on my last blog on MongoDb, I was encouraged to do another deep dive on another popular document oriented db; Couchbase. I have been a long-time fan CouchDb and has wrote a blog on it many years ago. After it merges with Membase, I am very excited to take a deep look into it again. Couchbase is the merge of two popular NOSQL technologies: Membase, which provides persistence, replication, sharding to the high performance memcached technology CouchDB, which pioneers the document oriented model based on JSON Like other NOSQL technologies, both Membase and CouchDB are built from the ground up on a highly distributed architecture, with data shard across machines in a cluster. Built around the Memcached protocol, Membase provides an easy migration to existing Memcached users who want to add persistence, sharding and fault resilience on their familiar Memcached model. On the other hand, CouchDB provides first class support for storing JSON documents as well as a simple RESTful API to access them. Underneath, CouchDB also has a highly tuned storage engine that is optimized for both update transaction as well as query processing. Taking the best of both technologies, Membase is well-positioned in the NOSQL marketplace. Programming model Couchbase provides client libraries for different programming languages such as Java / .NET / PHP / Ruby / C / Python / Node.js For read, Couchbase provides a key-based lookup mechanism where the client is expected to provide the key, and only the server hosting the data (with that key) will be contacted. Couchbase also provides a query mechanism to retrieve data where the client provides a query (for example, range based on some secondary key) as well as the view (basically the index). The query will be broadcasted to all servers in the cluster and the result will be merged and sent back to the client. For write, Couchbase provides a key-based update mechanism where the client sends in an updated document with the key (as doc id). When handling write request, the server will return to client’s write request as soon as the data is stored in RAM on the active server, which offers the lowest latency for write requests. Following is the core API that Couchbase offers. (in an abstract sense) # Get a document by key doc = get(key) # Modify a document, notice the whole document # need to be passed in set(key, doc) # Modify a document when no one has modified it # since my last read casVersion = doc.getCas() cas(key, casVersion, changedDoc) # Create a new document, with an expiration time # after which the document will be deleted addIfNotExist(key, doc, timeToLive) # Delete a document delete(key) # When the value is an integer, increment the integer increment(key) # When the value is an integer, decrement the integer decrement(key) # When the value is an opaque byte array, append more # data into existing value append(key, newData) # Query the data results = query(viewName, queryParameters) In Couchbase, document is the unit of manipulation. Currently Couchbase doesn't support server-side execution of custom logic. Couchbase server is basically a passive store and unlike other document oriented DB, Couchbase doesn't support field-level modification. In case of modifying documents, client need to retrieve documents by its key, do the modification locally and then send back the whole (modified) document back to the server. This design tradeoff network bandwidth (since more data will be transferred across the network) for CPU (now CPU load shift to client). Couchbase currently doesn't support bulk modification based on a condition matching. Modification happens only in a per document basis. (client will save the modified document one at a time). Transaction Model Similar to many NOSQL databases, Couchbase’s transaction model is primitive as compared to RDBMS. Atomicity is guaranteed at a single document and transactions that span update of multiple documents are unsupported. To provide necessary isolation for concurrent access, Couchbase provides a CAS (compare and swap) mechanism which works as follows … When the client retrieves a document, a CAS ID (equivalent to a revision number) is attached to it. While the client is manipulating the retrieved document locally, another client may modify this document. When this happens, the CAS ID of the document at the server will be incremented. Now, when the original client submits its modification to the server, it can attach the original CAS ID in its request. The server will verify this ID with the actual ID in the server. If they differ, the document has been updated in between and the server will not apply the update. The original client will re-read the document (which now has a newer ID) and re-submit its modification. Couchbase also provides a locking mechanism for clients to coordinate their access to documents. Clients can request a LOCK on the document it intends to modify, update the documents and then releases the LOCK. To prevent a deadlock situation, each LOCK grant has a timeout so it will automatically be released after a period of time. Deployment Architecture In a typical setting, a Couchbase DB resides in a server clusters involving multiple machines. Client library will connect to the appropriate servers to access the data. Each machine contains a number of daemon processes which provides data access as well as management functions. The data server, written in C/C++, is responsible to handle get/set/delete request from client. The Management server, written in Erlang, is responsible to handle the query traffic from client, as well as manage the configuration and communicate with other member nodes in the cluster. Virtual Buckets The basic unit of data storage in Couchbase DB is a JSON document (or primitive data type such as int and byte array) which is associated with a key. The overall key space is partitioned into 1024 logical storage unit called "virtual buckets" (or vBucket). vBucket are distributed across machines within the cluster via a map that is shared among servers in the cluster as well as the client library. High availability is achieved through data replication at the vBucket level. Currently Couchbase supports one active vBucket zero or more standby replicas hosted in other machines. Curremtly the standby server are idle and not serving any client request. In future version of Couchbase, the standby replica will be able to serve read request. Load balancing in Couchbase is achieved as follows: Keys are uniformly distributed based on the hash function When machines are added and removed in the cluster. The administrator can request a redistribution of vBucket so that data are evenly spread across physical machines. Management Server Management server performs the management function and co-ordinate the other nodes within the cluster. It includes the following monitoring and administration functions Heartbeat: A watchdog process periodically communicates with all member nodes within the same cluster to provide Couchbase Server health updates. Process monitor: This subsystem monitors execution of the local data manager, restarting failed processes as required and provide status information to the heartbeat module. Configuration manager: Each Couchbase Server node shares a cluster-wide configuration which contains the member nodes within the cluster, a vBucket map. The configuration manager pull this config from other member nodes at bootup time. Within a cluster, one node’s Management Server will be elected as the leader which performs the following cluster-wide management function Controls the distribution of vBuckets among other nodes and initiate vBucket migration Orchestrates the failover and update the configuration manager of member nodes If the leader node crashes, a new leader will be elected from surviving members in the cluster. When a machine in the cluster has crashed, the leader will detect that and notify member machines in the cluster that all vBuckets hosted in the crashed machine is dead. After getting this signal, machines hosting the corresponding vBucket replica will set the vBucket status as “active”. The vBucket/server map is updated and eventually propagated to the client lib. Notice that at this moment, the replication level of the vBucket will be reduced. Couchbase doesn’t automatically re-create new replicas which will cause data copying traffic. Administrator can issue a command to explicitly initiate a data rebalancing. The crashed machine, after reboot can rejoin the cluster. At this moment, all the data it stores previously will be completely discard and the machine will be treated as a brand new empty machine. As more machines are put into the cluster (for scaling out), vBucket should be redistributed to achieve a load balance. This is currently triggered by an explicit command from the administrator. Once receive the “rebalance” command, the leader will compute the new provisional map which has the balanced distribution of vBuckets and send this provisional map to all members of the cluster. To compute the vBucket map and migration plan, the leader attempts the following objectives: Evenly distribute the number of active vBuckets and replica vBuckets among member nodes. Place the active copy and each replicas in physically separated nodes. Spread the replica vBucket as wide as possible among other member nodes. Minimize the amount of data migration Orchestrate the steps of replica redistribution so no node or network will be overwhelmed by the replica migration. Once the vBucket maps is determined, the leader will pass the redistribution map to each member in the cluster and coordinate the steps of vBucket migration. The actual data transfer happens directly between the origination node to the destination node. Notice that since we have generally more vBuckets than machines. The workload of migration will be evenly distributed automatically. For example, when new machines are added into the clusters, all existing machines will migrate some portion of its vBucket to the new machines. There is no single bottleneck in the cluster. Throughput the migration and redistribution of vBucket among servers, the life cycle of a vBucket in a server will be in one of the following states “Active”: means the server is hosting the vBucket is ready to handle both read and write request “Replica”: means the server is hosting the a copy of the vBucket that may be slightly out of date but can take read request that can tolerate some degree of outdate. “Pending”: means the server is hosting a copy that is in a critical transitional state. The server cannot take either read or write request at this moment. “Dead”: means the server is no longer responsible for the vBucket and will not take either read or write request anymore. Data Server Data server implements the memcached APIs such as get, set, delete, append, prepend, etc. It contains the following key datastructure: One in-memory hashtable (key by doc id) for the corresponding vBucket hosted. The hashtable acts as both a metadata for all documents as well as a cache for the document content. Maintain the entry gives a quick way to detect whether the document exists on disk. To support async write, there is a checkpoint linkedlist per vBucket holding the doc id of modified documents that hasn't been flushed to disk or replicated to the replica. To handle a "GET" request Data server routes the request to the corresponding ep-engine responsible for the vBucket. The ep-engine will lookup the document id from the in-memory hastable. If the document content is found in cache (stored in the value of the hashtable), it will be returned. Otherwise, a background disk fetch task will be created and queued into the RO dispatcher queue. The RO dispatcher then reads the value from the underlying storage engine and populates the corresponding entry in the vbucket hash table. Finally, the notification thread notifies the disk fetch completion to the memcached pending connection, so that the memcached worker thread can revisit the engine to process a get request. To handle a "SET" request, a success response will be returned to the calling client once the updated document has been put into the in-memory hashtable with a write request put into the checkpoint buffer. Later on the Flusher thread will pickup the outstanding write request from each checkpoint buffer, lookup the corresponding document content from the hashtable and write it out to the storage engine. Of course, data can be lost if the server crashes before the data has been replicated to another server and/or persisted. If the client requires a high data availability across different crashes, it can issue a subsequent observe() call which blocks on the condition that the server persist data on disk, or the server has replicated the data to another server (and get its ACK). Overall speaking, the client has various options to tradeoff data integrity with throughput. Hashtable Management To synchronize accesses to a vbucket hash table, each incoming thread needs to acquire a lock before accessing a key region of the hash table. There are multiple locks per vbucket hash table, each of which is responsible for controlling exclusive accesses to a certain ket region on that hash table. The number of regions of a hash table can grow dynamically as more documents are inserted into the hash table. To control the memory size of the hashtable, Item pager thread will monitor the memory utilization of the hashtable. Once a high watermark is reached, it will initiate an eviction process to remove certain document content from the hashtable. Only entries that is not referenced by entries in the checkpoint buffer can be evicted because otherwise the outstanding update (which only exists in hashtable but not persisted) will be lost. After eviction, the entry of the document still remains in the hashtable; only the document content of the document will be removed from memory but the metadata is still there. The eviction process stops after reaching the low watermark. The high / low water mark is determined by the bucket memory quota. By default, the high water mark is set to 75% of bucket quota, while the low water mark is set to 60% of bucket quota. These water marks can be configurable at runtime. In CouchDb, every document is associated with an expiration time and will be deleted once it is expired. Expiry pager is responsible for tracking and removing expired document from both the hashtable as well as the storage engine (by scheduling a delete operation). Checkpoint Manager Checkpoint manager is responsible to recycle the checkpoint buffer, which holds the outstanding update request, consumed by the two downstream processes, Flusher and TAP replicator. When all the request in the checkpoint buffer has been processed, the checkpoint buffer will be deleted and a new one will be created. TAP Replicator TAP replicator is responsible to handle vBucket migration as well as vBucket replication from active server to replica server. It does this by propagating the latest modified document to the corresponding replica server. At the time a replica vBucket is established, the entire vBucket need to be copied from the active server to the empty destination replica server as follows The in-memory hashtable at the active server will be transferred to the replica server. Notice that during this period, some data may be updated and therefore the data set transfered to the replica can be inconsistent (some are the latest and some are outdated). Nevertheless, all updates happen after the start of transfer is tracked in the checkpoint buffer. Therefore, after the in-memory hashtable transferred is completed, the TAP replicator can pickup those updates from the checkpoint buffer. This ensures the latest versioned of changed documents are sent to the replica, and hence fix the inconsistency. However the hashtable cache doesn’t contain all the document content. Data also need to be read from the vBucket file and send to the replica. Notice that during this period, update of vBucket will happen in active server. However, since the file is appended only, subsequent data update won’t interfere the vBucket copying process. After the replica server has caught up, subsequent update at the active server will be available at its checkpoint buffer which will be pickup by the TAP replicator and send to the replica server. CouchDB Storage Structure Data server defines an interface where different storage structure can be plugged-in. Currently it supports both a SQLite DB as well as CouchDB. Here we describe the details of CouchDb, which provides a super high performance storage mechanism underneath the Couchbase technology. Under the CouchDB structure, there will be one file per vBucket. Data are written to this file in an append-only manner, which enables Couchbase to do mostly sequential writes for update, and provide the most optimized access patterns for disk I/O. This unique storage structure attributes to Couchbase’s fast on-disk performance for write-intensive applications. The following diagram illustrate the storage model and how it is modified by 3 batch updates (notice that since updates are asynchronous, it is perform by "Flusher" thread in batches). The Flusher thread works as follows: 1) Pick up all pending write request from the dirty queue and de-duplicate multiple update request to the same document. 2) Sort each request (by key) into corresponding vBucket and open the corresponding file 3) Append the following into the vBucket file (in the following contiguous sequence) All document contents in such write request batch. Each document will be written as [length, crc, content] one after one sequentially. The index that stores the mapping from document id to the document’s position on disk (called the BTree by-id) The index that stores the mapping from update sequence number to the document’s position on disk. (called the BTree by-seq) The by-id index plays an important role for looking up the document by its id. It is organized as a B-Tree where each node contains a key range. To lookup a document by id, we just need to start from the header (which is the end of the file), transfer to the root BTree node of the by-id index, and then further traverse to the leaf BTree node that contains the pointer to the actual document position on disk. During the write, the similar mechanism is used to trace back to the corresponding BTree node that contains the id of the modified documents. Notice that in the append-only model, update is not happening in-place, instead we located the existing location and copy it over by appending. In other words, the modified BTree node will be need to be copied over and modified and finally paste to the end of file, and then its parent need to be modified to point to the new location, which triggers the parents to be copied over and paste to the end of file. Same happens to its parents’ parent and eventually all the way to the root node of the BTree. The disk seek can be at the O(logN) complexity. The by-seq index is used to keep track of the update sequence of lived documents and is used for asynchronous catchup purposes. When a document is created, modified or deleted, a sequence number is added to the by-seq btree and the previous seq node will be deleted. Therefore, for cross-site replication, view index update and compaction, we can quickly locate all the lived documents in the order of their update sequence. When a vBucket replicator asks for the list of update since a particular time, it provides the last sequence number in previous update, the system will then scan through the by-seq BTree node to locate all the document that has sequence number larger than that, which effectively includes all the document that has been modified since the last replication. As time goes by, certain data becomes garbage (see the grey-out region above) and become unreachable in the file. Therefore, we need a garbage collection mechanism to clean up the garbage. To trigger this process, the by-id and by-seq B-Tree node will keep track of the data size of lived documents (those that is not garbage) under its substree. Therefore, by examining the root BTree node, we can determine the size of all lived documents within the vBucket. When the ratio of actual size and vBucket file size fall below a certain threshold, a compaction process will be triggered whose job is to open the vBucket file and copy the survived data to another file. Technically, the compaction process opens the file and read the by-seq BTree at the end of the file. It traces the Btree all the way to the leaf node and copy the corresponding document content to the new file. The compaction process happens while the vBucket is being updated. However, since the file is appended only, new changes are recorded after the BTree root that the compaction has opened, so subsequent data update won’t interfere with the compaction process. When the compaction is completed, the system need to copy over the data that was appended since the beginning of the compaction to the new file. View Index Structure Unlike most indexing structure which provide a pointer from the search attribute back to the document. The CouchDb index (called View Index) is better perceived as a denormalized table with arbitrary keys and values loosely associated to the document. Such denormalized table is defined by a user-provided map() and reduce() function. map = function(doc) { … emit(k1, v1) … emit(k2, v2) … } reduce = function(keys, values, isRereduce) { if (isRereduce) { // Do the re-reduce only on values (keys will be null) } else { // Do the reduce on keys and values } // result must be ready for input values to re-reduce return result } Whenever a document is created, updated, deleted, the corresponding map(doc) function will be invoked (in an asynchronous manner) to generate a set of key/value pairs. Such key/value will be stored in a B-Tree structure. All the key/values pairs of each B-Tree node will be passed into the reduce() function, which compute an aggregated value within that B-Tree node. Re-reduce also happens in non-leaf B-Tree nodes which further aggregate the aggregated value of child B-Tree nodes. The management server maintains the view index and persisted it to a separate file. Create a view index is perform by broadcast the index creation request to all machines in the cluster. The management process of each machine will read its active vBucket file and feed each surviving document to the Map function. The key/value pairs emitted by the Map function will be stored in a separated BTree index file. When writing out the BTree node, the reduce() function will be called with the list of all values in the tree node. Its return result represent a partially reduced value is attached to the BTree node. The view index will be updated incrementally as documents are subsequently getting into the system. Periodically, the management process will open the vBucket file and scan all documents since the last sequence number. For each changed document since the last sync, it invokes the corresponding map function to determine the corresponding key/value into the BTree node. The BTree node will be split if appropriate. Underlying, Couchbase use a back index to keep track of the document with the keys that it previously emitted. Later when the document is deleted, it can look up the back index to determine what those key are and remove them. In case the document is updated, the back index can also be examined; semantically a modification is equivalent to a delete followed by an insert. The following diagram illustrates how the view index file will be incrementally updated via the append-only mechanism. Query Processing Query in Couchbase is made against the view index. A query is composed of the view name, a start key and end key. If the reduce() function isn’t defined, the query result will be the list of values sorted by the keys within the key range. In case the reduce() function is defined, the query result will be a single aggregated value of all keys within the key range. If the view has no reduce() function defined, the query processing proceeds as follows: Client issue a query (with view, start/end key) to the management process of any server (unlike a key based lookup, there is no need to locate a specific server). The management process will broadcast the request to other management process on all servers (include itself) within the cluster. Each management process (after receiving the broadcast request) do a local search for value within the key range by traversing the BTree node of its view file, and start sending back the result (automatically sorted by the key) to the initial server. The initial server will merge the sorted result and stream them back to the client. However, if the view has reduce() function defined, the query processing will involve computing a single aggregated value as follows: Client issue a query (with view, start/end key) to the management process of any server (unlike a key based lookup, there is no need to locate a specific server). The management process will broadcast the request to other management process on all servers (include itself) within the cluster. Each management process do a local reduce for value within the key range by traversing the BTree node of its view file to compute the reduce value of the key range. If the key range span across a BTree node, the pre-computed of the sub-range can be used. This way, the reduce function can reuse a lot of partially reduced values and doesn’t need to recomputed every value of the key range from scratch. The original server will do a final re-reduce() in all the return value from each other servers, and then passed back the final reduced value to the client. To illustrate the re-reduce concept, lets say the query has its key range from A to F. Instead of calling reduce([A,B,C,D,E,F]), the system recognize the BTree node that contains [B,C,D] has been pre-reduced and the result P is stored in the BTree node, so it only need to call reduce(A,P,E,F). Update View Index as vBucket migrates Since the view index is synchronized with the vBuckets in the same server, when the vBucket has migrated to a different server, the view index is no longer correct; those key/value that belong to a migrated vBucket should be discarded and the reduce value cannot be used anymore. To keep track of the vBucket and key in the view index, each bTree node has a 1024-bitmask indicating all the vBuckets that is covered in the subtree (ie: it contains a key emitted from a document belonging to the vBucket). Such bit-mask is maintained whenever the bTree node is updated. At the server-level, a global bitmask is used to indicate all the vBuckets that this server is responsible for. In processing the query of the map-only view, before the key/value pair is returned, an extra check will be perform for each key/value pair to make sure its associated vBucket is what this server is responsible for. When processing the query of a view that has a reduce() function, we cannot use the pre-computed reduce value if the bTree node contains a vBucket that the server is not responsible for. In this case, the bTree node’s bit mask is compared with the global bit mask. In case if they are not aligned, then the reduce value need to be recomputed. Here is an example to illustrate this process Couchbase is one of the popular NOSQL technology built on a solid technology foundation designed for high performance. In this post, we have examined a number of such key features: Load balancing between servers inside a cluster that can grow and shrink according to workload conditions. Data migration can be used to re-achieve workload balance. Asynchronous write provides lowest possible latency to client as it returns once the data is store in memory. Append-only update model pushes most update transaction into sequential disk access, hence provide extremely high throughput for write intensive applications. Automatic compaction ensures the data lay out on disk are kept optimized all the time. Map function can be used to pre-compute view index to enable query access. Summary data can be pre-aggregated using the reduce function. Overall, this cut down the workload of query processing dramatically. For a review on NOSQL architecture in general and some theoretical foundation, I have wrote a NOSQL design pattern blog, as well as some fundamental difference between SQL and NOSQL. For other NOSQL technologies, please read my other blog on MongoDb, Cassandra and HBase, Memcached Special thanks to Damien Katz and Frank Weigel from Couchbase team who provide a lot of implementation details of Couchbase.

July 7, 2012

by Ricky Ho

· 84,882 Views · 5 Likes

Current Challenges of Moving Apps to the Cloud, and How to Anticpate Them

In my last post, I discussed some of the key considerations when moving an application to the cloud. To provide a better understanding, I’m using a simple scenario-based example to illustrate how an application could be moved to the cloud. This article will explain the challenges a company might face, the current architecture of the example application, and finally what the company should expect when moving an application to the cloud. My next article will discuss the recommended solution in more detail. Disclaimer Company name, logo, business, scenario, and incidents either are used fictitiously. Any resemblance to an actual company is entirely coincidental. Background Idelma is a ticket selling provider that sells tickets to concerts, sports event, and music gigs. Tickets are sold offline through ticket counters and online through a website called TicketOnline. Customers visiting TicketOnline can browse list of available shows, find out more information on each show, and finally purchase tickets online. When a ticket is purchased, it’s reserved but will not be processed immediately. Other processes such as generating ticket and sending the generated ticket along with the receipt will be done asynchronously in a few minutes time. Current Challenges During peak season (typically in July and December), TicketOnline suffered from heavy traffic that caused slow response time. The traffic for off-peak season is normally about 100,000 to 200,000 hits per day, with the average of 8 to 15 on-going shows. In peak season, the traffic may reach five to seven times more than off-peak season. The following diagram illustrates the web server hits counter of TicketOnline over the last three years. Figure 1 – TicketOnline web server hits counter for the last three years Additionally, the current infrastructure setup is not designed to be highly-available. This results in several periods of downtime each year. The options: on-premise vs cloud Idelma’s IT Manager Mr. Anthony recognizes the issues and decides to make some improvement to bring better competitive advantages to the company. When reading an article online, he discovered that cloud computing may be a good solution to address the issues. Another option would be to purchase a more powerful set of hardware that could handle the load. With that, he has done a pros and cons analysis of the two options: On-premise hardware investment There are at least two advantages of investing in more hardware. One, they will have full control over the infrastructure, and can use the server for other purposes when necessary. Second, there might be less or no modification needed on the application at all, depending on how it is architected and designed. If they decide to scale up (vertically), they might not need to make any changes. However, if they decide to scale out (horizontally) to a web farm model, a re-design would be needed. On the other hand, there are also several disadvantages of on-premise hardware investment. For sure, upfront investment in purchasing hardware and software are considered relatively expensive. Next, they would need to be able to answer the following questions: How much hardware and software should be purchased? What are the hardware specifications? If the capacity planning is not properly done, it may lead to either a waste of capacity or insufficient of capacity. Another concern is, when adding more hardware, more manpower might be needed as well. Cloud For cloud computing, there’s almost no upfront investment required for hardware, and in some cases software doesn’t pose a large upfront cost either. Another advantage is the cloud’s elastic nature fits TicketOnline periodic bursting very much. Remember, they face high load only in June and December. Another advantage would be less responsibility. The administrator can have more time to focus on managing the application since the infrastructure is managed by the provider. Though there are a number of advantages, there are also some disadvantages when choosing a cloud platform. For one thing, they might have less control over the infrastructure. As discussed in the previous article, there might also be some architectural changes when moving an application to the cloud. However, these can be dealt with in a one-time effort. The figure below summarizes the considerations between the two options: Figure 2 – Considerations of an On-premise or Cloud solution After looking at his analysis, Mr. Anthony believes that the cloud will bring more competitive advantages to the company. Understanding that Windows Azure offers various services for building internet-scale application, and Idelma is also an existing Microsoft customer, Mr. Anthony decided to explore Windows Azure. After evaluating the pricing, he is even more comfortable to step ahead. Quick preview of the current system Now, let’s take a look of the current architecture of TicketOnline. Figure 3 – TicketOnline Current Architecture TicketOnline web application The web application is hosted on a single instance physical server. It is running on Windows Server 2003 R2 as operating system with Internet Information Services (IIS) 6 as the web server and ASP.NET 2.0 as the web application framework. Database SQL Server 2005 is used as database engine to store mainly relational data for the application. Additionally, it is also used to store logs such as trace logs, performance-counters logs, and IIS logs. File server Unstructured files such as images and documents are stored separately in a file server. Interfacing with another system The application would need to interface with a proprietary CRM system that runs on a dedicated server to retrieve customer profiles through asmx web service. Batch Job As mentioned previously, receipt and ticket generation will happen asynchronously after purchasing is made. A scheduler-based batch job will perform asynchronous tasks every 10 minutes. The tasks include verifying booking details, generating tickets, and sending the ticket along with the receipt as an email to customer. The intention of an asynchronous process is to minimize concurrent access load as much as possible. This batch job is implemented as a Windows Service installed in a separated server. SMTP Server On-premise SMTP Server will be used to send email, initiated either from the batch job engine or the web application. Requirements for migration The application should be migrated to the cloud with the following requirements: The customer expects a cost effective solution in terms of the migration effort as well as the monthly running cost. There aren’t any functional changes on the system. Meaning, the user (especially front-end user) should not see any differences in term of functionality. As per policy, this propriety CRM system will not be moved to the cloud. The web service consumption should be consumed in secured manner. Calling for partners As the in-house IT team does not have competency and experience with Windows Azure, Mr. Anthony contacted Microsoft to suggest a partner who is capable to deliver the migration. Before a formal request for proposal (RFP) is made, he expects partner to provide the following: High-level architecture diagram how the system will look when moving to the cloud. Explanation of each component illustrated on the diagram. The migration processes, effort required, and potential challenges. If Microsoft recommends you as the partner, how will you handle this case? What will the architecture look like in your proposed solution? The most exciting part will come in the next article when I go into more detail on which solution is recommended and how the migration process takes place.

July 5, 2012

by Wely Lau

· 6,872 Views · 1 Like

Top 10 Causes of Java EE Enterprise Performance Problems

Performance problems are one of the biggest challenges to expect when designing and implementing Java EE related technologies.

June 20, 2012

by Pierre - Hugues Charbonneau

· 274,233 Views · 20 Likes

Infographics: Cloud Computing and History

infographic: clouds computing and history i have prepared three new infographics for you;aall of them related with cloud computing. these infographics will tell you about history of cloud computing, its definition, and who needs this cloud. i think that this will be interesting for you. information graphics (known as infographics) are one of the best ways to transfer some information into a reader’s mind. it can be something new, or other useful information gathered in one place. nowadays many people don’t have enough time to read a lot of text on multiple screens. infographics makes the information intuitive and understandable. that’s why we would like to share the best relevant infographics from all over the web. original source: cloud computing by the small business authority original source: a complete history of cloud computing original source: hosting decisions, from the chalkboard

June 6, 2012

by Andrei Prikaznov

· 11,988 Views

The Limited Usefulness of AsyncContext.start()

Some time ago I came across What's the purpose of AsyncContext.start(...) in Servlet 3.0? question. Quoting the Javadoc of aforementioned method: Causes the container to dispatch a thread, possibly from a managed thread pool, to run the specified Runnable. To remind all of you, AsyncContext is a standard way defined in Servlet 3.0 specification to handle HTTP requests asynchronously. Basically HTTP request is no longer tied to an HTTP thread, allowing us to handle it later, possibly using fewer threads. It turned out that the specification provides an API to handle asynchronous threads in a different thread pool out of the box. First we will see how this feature is completely broken and useless in Tomcat and Jetty - and then we will discuss why the usefulness of it is questionable in general. Our test servlet will simply sleep for given amount of time. This is a scalability killer in normal circumstances because even though sleeping servlet is not consuming CPU, but sleeping HTTP thread tied to that particular request consumes memory - and no other incoming request can use that thread. In our test setup I limited the number of HTTP worker threads to 10 which means only 10 concurrent requests are completely blocking the application (it is unresponsive from the outside) even though the application itself is almost completely idle. So clearly sleeping is an enemy of scalability. @WebServlet(urlPatterns = Array("/*")) class SlowServlet extends HttpServlet with Logging { protected override def doGet(req: HttpServletRequest, resp: HttpServletResponse) { logger.info("Request received") val sleepParam = Option(req.getParameter("sleep")) map {_.toLong} TimeUnit.MILLISECONDS.sleep(sleepParam getOrElse 10) logger.info("Request done") } } Benchmarking this code reveals that the average response times are close to sleep parameter as long as the number of concurrent connections is below the number of HTTP threads. Unsurprisingly the response times begin to grow the moment we exceed the HTTP threads count. Eleventh connection has to wait for any other request to finish and release worker thread. When the concurrency level exceeds 100, Tomcat begins to drop connections - too many clients are already queued. So what about the the fancy AsyncContext.start() method (do not confuse with ServletRequest.startAsync())? According to the JavaDoc I can submit any Runnable and the container will use some managed thread pool to handle it. This will help partially as I no longer block HTTP worker threads (but still another thread somewhere in the servlet container is used). Quickly switching to asynchronous servlet: @WebServlet(urlPatterns = Array("/*"), asyncSupported = true) class SlowServlet extends HttpServlet with Logging { protected override def doGet(req: HttpServletRequest, resp: HttpServletResponse) { logger.info("Request received") val asyncContext = req.startAsync() asyncContext.setTimeout(TimeUnit.MINUTES.toMillis(10)) asyncContext.start(new Runnable() { def run() { logger.info("Handling request") val sleepParam = Option(req.getParameter("sleep")) map {_.toLong} TimeUnit.MILLISECONDS.sleep(sleepParam getOrElse 10) logger.info("Request done") asyncContext.complete() } }) } } We are first enabling the asynchronous processing and then simply moving sleep() into a Runnable and hopefully a different thread pool, releasing the HTTP thread pool. Quick stress test reveals slightly unexpected results (here: response times vs. number of concurrent connections): Guess what, the response times are exactly the same as with no asynchronous support at all (!) After closer examination I discovered that when AsyncContext.start() is called Tomcat submits given task back to... HTTP worker thread pool, the same one that is used for all HTTP requests! This basically means that we have released one HTTP thread just to utilize another one milliseconds later (maybe even the same one). There is absolutely no benefit of calling AsyncContext.start() in Tomcat. I have no idea whether this is a bug or a feature. On one hand this is clearly not what the API designers intended. The servlet container was suppose to manage separate, independent thread pool so that HTTP worker thread pool is still usable. I mean, the whole point of asynchronous processing is to escape the HTTP pool. Tomcat pretends to delegate our work to another thread, while it still uses the original worker thread pool. So why I consider this to be a feature? Because Jetty is "broken" in exactly same way... No matter whether this works as designed or is only a poor API implementation, using AsyncContext.start() in Tomcat and Jetty is pointless and only unnecessarily complicates the code. It won't give you anything, the application works exactly the same under high load as if there was no asynchronous logic at all. But what about using this API feature on correct implementations like IBM WAS? It is better, but still the API as is doesn't give us much in terms of scalability. To explain again: the whole point of asynchronous processing is the ability to decouple HTTP request from an underlying thread, preferably by handling several connections using the same thread. AsyncContext.start() will run the provided Runnable in a separate thread pool. Your application is still responsive and can handle ordinary requests while long-running request that you decided to handle asynchronously are processed in a separate thread pool. It is better, unfortunately the thread pool and thread per connection idiom is still a bottle-neck. For the JVM it doesn't matter what type of threads are started - they still occupy memory. So we are no longer blocking HTTP worker threads, but our application is not more scalable in terms of concurrent long-running tasks we can support. In this simple and unrealistic example with sleeping servlet we can actually support thousand of concurrent (waiting) connections using Servlet 3.0 asynchronous support with only one extra thread - and without AsyncContext.start(). Do you know how? Hint: ScheduledExecutorService. Postscriptum: Scala goodness I almost forgot. Even though examples were written in Scala, I haven't used any cool language features yet. Here is one: implicit conversions. Make this available in your scope: implicit def blockToRunnable[T](block: => T) = new Runnable { def run() { block } } And suddenly you can use code block instead of instantiating Runnable manually and explicitly: asyncContext start { logger.info("Handling request") val sleepParam = Option(req.getParameter("sleep")) map { _.toLong} TimeUnit.MILLISECONDS.sleep(sleepParam getOrElse 10) logger.info("Request done") asyncContext.complete() } Sweet!

May 22, 2012

by Tomasz Nurkiewicz

· 17,621 Views · 1 Like

Virtualization in WPF with VirtualizingStackPanel

First blogged about this on my previous blog site here: http://consultingblogs.emc.com/merrickchaffer/archive/2011/02/14/virtualization-in-wpf-with-virtualizingstackpanel.aspx However, having come across this again today on a project, I thought it was important enough to re-blog! Finally managed to figure out how to get virtualization to actually behave itself in a listbox wpf control. Turns out that in order for Virtualization to work, you need three things satisfied. Use a control that supports virtualization (e.g. list box or list view). (see Controls That Implement Performance Features section at bottom of this page for more info http://msdn.microsoft.com/en-us/library/cc716879.aspx#Controls ) Ensure that the ScrollViewer.CanContentScroll attached property is set to True on the containing list box / list view control. Ensure that either the list box has a height set, or that it is contained within a parent Grid row, where that row definition has a height set (Height="*" will do if you want it to occupy the Client window height). Note: Do not use height=”Auto” as this will not work, as this instructs WPF to simply size the row to the height needed to fit all the items of the list box in, hence you do not get the vertical scroll bar appearing. Ensure that there is no wrapping ScrollViewer control around the list box, as this will prevent virtualization from occuring. Ensure that you use a VirtualizingStackPanel in the ItemsPanelTemplate for the ListBox.ItemsPanel Example

May 14, 2012

by Merrick Chaffer

· 28,335 Views

Managing and Monitoring Drupal Sites on Windows Azure

A few weeks ago, I co-authored an article (with my colleague Rama Ramani) about how the Screen Actors Guild Awards website migrated its Drupal deployment from LAMP to Windows Azure: Azure Real World: Migrating a Drupal Site from LAMP to Windows Azure. Since then, Rama and another colleague, Jason Roth, have been working on writing up how the SAG Awards website was managed and monitored in Windows Azure. The article below is the fruit of their work…a very interesting/educational read. Overview Drupal is an open source content management system that runs on PHP. Windows Azure offers a flexible platform for hosting, managing, and scaling Drupal deployments. This paper focuses on an approach to host Drupal sites on Windows Azure, based on learning from a BPD Customer Programs Design Win engagement with the Screen Actors Guild Awards Drupal website. This paper covers guidelines and best practices for managing an existing Drupal web site in Windows Azure. For more information on how to migrate Drupal applications to Windows Azure, see Azure Real World: Migrating a Drupal Site from LAMP to Windows Azure. The target audience for this paper is Drupal administrators who have some exposure to Windows Azure. More detailed pointers to Windows Azure content is provided throughout the paper as links. Drupal Application Architecture on Windows Azure Before reviewing the management and monitoring guidelines, it is important to understand the architecture of a typical Drupal deployment on Windows Azure. First, the following diagram displays the basic architecture of Drupal running on Windows and IIS7. In the Windows Server scenario, you could have one or more machines hosting the web site in a farm. Those machines would either persist the site content to the file system or point to other network shares. For Windows Azure, the basic architecture is the same, but there are some differences. In Windows Azure the site is hosted on a web role. A web role instance is hosted on a Windows Server 2008 virtual machine within the Windows Azure datacenter. Like the web farm, you can have multiple instances running the site. But there is no persistence guarantee for the data on the file system. Because of this, much of the shared site content should be stored in Windows Azure Blob storage. This allows them to be highly available and durable. Usually, a large portion of the site caters to static content which lends well to caching. And caching can be applied in a set of places – browser level caching, CDN to cache content in the edge closer to the browser clients, caching in Azure to reduce the load on backend, etc. Finally, the database can be located in SQL Azure. The following diagram shows these differences. For monitoring and management, we will look at Drupal on Windows Azure from three perspectives: Availability: Ensure the web site does not go down and that all tiers are setup correctly. Apply best practices to ensure that the site is deployed across data centers and perform backup operations regularly. Scalability: Correctly handle changes in user load. Understand the performance characteristics of the site. Manageability: Correctly handle updates. Make code and site changes with no downtime when possible. Although some management tasks span one or more of these categories, it is still helpful to discuss Drupal management on Windows Azure within these focus areas. Availability One main goal is that the Drupal site remains running and accessible to all end-users. This involves monitoring both the site and the SQL Azure database that the site depends on. In this section, we will briefly look at monitoring and backup tasks. Other crossover areas that affect availability will be discussed in the next section on scalability. Monitoring With any application, monitoring plays an important role with managing availability. Monitoring data can reveal whether users are successfully using the site or whether computing resources are meeting the demand. Other data reveals error counts and possibly points to issues in a specific tier of the deployment. There are several monitoring tools that can be used. The Windows Azure Management Portal. Windows Azure diagnostic data. Custom monitoring scripts. System Center Operations Manager. Third party tools such as Azure Diagnostics Manager and Azure Storage Explorer. The Windows Azure Management Portal can be used to ensure that your deployments are successful and running. You can also use the portal to manage features such as Remote Desktop so that you can directly connect to machines that are running the Drupal site. Windows Azure diagnostics allows you to collect performance counters and logs off of the web role instances that are running the Drupal site. Although there are many options for configuring diagnostics in Azure, the best solution with Drupal is to use a diagnostics configuration file. The following configuration file demonstrates some basic performance counters that can monitor resources such as memory, processor utilization, and network bandwidth. For more information about setting up diagnostic configuration files, see How to Use the Windows Azure Diagnostics Configuration File. This information is stored locally on each role instance and then transferred to Windows Azure storage per a defined schedule or on-demand. See Getting Started with Storing and Viewing Diagnostic Data in Windows Azure Storage. Various monitoring tools, such as Azure Diagnostics Manager, help you to more easily analyze diagnostic data. Monitoring the performance of the machines hosting the Drupal site is only part of the story. In order to plan properly for both availability and scalability, you should also monitor site traffic, including user load patterns and trends. Standard and custom diagnostic data could contribute to this, but there are also third-party tools that monitor web traffic. For example, if you know that spikes occur in your application during certain days of the week, you could make changes to the application to handle the additional load and increase the availability of the Drupal solution. Backup Tasks To remain highly available, it is important to backup your data as a defense-in-depth strategy for disaster recovery. This is true even though SQL Azure and Windows Azure Storage both implement redundancy to prevent data loss. One obvious reason is that these services cannot prevent administrator error if data is accidentally deleted or incorrectly changed. SQL Azure does not currently have a formal backup technology, although there are many third-party tools and solutions that provide this capability. Usually the database size for a Drupal site is relatively small. In the case of SAG Awards, it was only ~100-150 MB. So performing an entire backup using any strategy was relatively fast. If your database is much larger, you might have to test various backup strategies to find the one that works best. Apart from third-party SQL Azure backup solutions, there are several strategies for obtaining a backup of your data: · Use the Drush tool and the portabledb-export command. · Periodically copy the database using the CREATE DATABASE Transact-SQL command. · Use Data-tier applications (DAC) to assist with backup and restore of the database. SQL Azure backup and data security techniques are described in more detail in the topic, Business Continuity in SQL Azure. Note that bandwidth costs accrue with any backup operation that transfers information outside of the Windows Azure datacenter. To reduce costs, you can copy the database to a database within the same datacenter. Or you can export the data-tier applications to blob storage in the same datacenter. Another potential backup task involves the files in Blob storage. If you keep a master copy of all media files uploaded to Blob storage, then you already have an on-premises backup of those files. However, if multiple administrators are loading files into Blob storage for use on the Drupal site, it is a good idea to enumerate the storage account and to download any new files to a central location. The following PHP script demonstrates how this can be done by backing up all files in Blob storage after a specified modification date. setProxy(true, 'YOUR_PROXY_IF_NEEDED', 80); $blobs = (array)$blobObj->listBlobs(AZURE_STORAGE_CONTAINER, '', '', 35000); backupBlobs($blobs, $blobObj); function backupBlobs($blobs, $blobObj) { foreach ($blobs as $blob) { if (strtotime($blob->lastmodified) >= DEFAULT_BACKUP_FROM_DATE && strtotime($blob->lastmodified) <= DEFAULT_BACKUP_TO_DATE) { $path = pathinfo($blob->name); if ($path['basename'] != '$$$.$$$') { $dir = $path['dirname']; $oldDir = getcwd(); if (handleDirectory($dir)) { chdir($dir); $blobObj->getBlob( AZURE_STORAGE_CONTAINER, $blob->name, $path['basename'] ); chdir($oldDir); } } } } } function handleDirectory($dir) { if (!checkDirExists($dir)) { return mkdir($dir, 0755, true); } return true; } function checkDirExists($dir) { if(file_exists($dir) && is_dir($dir)) { return true; } return false; } ?> This script has a dependency on the Windows Azure SDK for PHP. Also note there are several parameters that you must modify such as the storage account, secret, and backup location. As with SQL Azure, bandwidth and transaction charges apply to a backup script like this. Scalability Drupal sites on Windows Azure can scale as load increased through typical strategies of scale-up, scale-out, and caching. The following sections describe the specifics of how these strategies are implemented in Windows Azure. Typically you make scalability decisions based on monitoring and capacity planning. Monitoring can be done in staging during testing or in production with real-time load. Capacity planning factors in projections for changes in user demand. Scale Up When you configure your web role prior to deployment, you have the option of specifying the Virtual Machine (VM) size, such as Small or ExtraLarge. Each size tier adds additional memory, processing power, and network bandwidth to each instance of your web role. For cost efficiency and smaller units of scale, you can test your application under expected load to find the smallest virtual machine size that meets your requirements. The workload usually in most popular Drupal websites can be separated out into a limited set of Drupal admins making content changes and a large user base who perform mostly read-only workload. End users can be allowed to make ‘writes’, such as uploading blogs or posting in forums, but those changes are not ‘content changes’. Drupal admins are setup to operate without caching so that the writes are made directly to SQL Azure or the corresponding backend database. This workload performs well with Large or ExtraLarge VM sizes. Also, note that the VM size is closely tied to all hardware resources, so if there are many content-rich pages that are streaming content, then the VM size requirements are higher. To make changes to the Virtual Machine size setting, you must change the vmsize attribute of the WebRole element in the service definition file, ServiceDefinition.csdef. A virtual machine size change requires existing applications to be redeployed. Scale Out In addition to the size of each web role instance, you can increase or decrease the number of instances that are running the Drupal site. This spreads the web requests across more servers, enabling the site to handle more users. To change the number of running instances of your web role, see How to Scale Applications by Increasing or Decreasing the Number of Role Instances. Note that some configuration changes can cause your existing web role instances to recycle. You can choose to handle this situation by applying the configuration change and continue running. This is done by handling the RoleEnvironment.Changing event. For more information see, How to Use the RoleEnvironment.Changing Event. A common question for any Windows Azure solution is whether there is some type of built-in automatic scaling. Windows Azure does not provide a service that provides auto-scaling. However, it is possible to create a custom solution that scales Azure services using the Service Management API. For an example of this approach, see An Auto-Scaling Module for PHP Applications in Windows Azure. Caching Caching is an important strategy for scaling Drupal applications on Windows Azure. One reason for this is that SQL Azure implements throttling mechanisms to regulate the load on any one database in the cloud. Code that uses SQL Azure should have robust error handling and retry logic to account for this. For more information, see Error Messages (SQL Azure Database). Because of the potential for load-related throttling as well as for general performance improvement, it is strongly recommended to use caching. Although Windows Azure provides a Caching service, this service does not currently have interoperability with PHP. Because of this, the best solution for caching in Drupal is to use a module that uses an open-source caching technology, such as Memcached. Outside of a specific Drupal module, you can also configure Memcached to work in PHP for Windows Azure. For more information, see Running Memcached on Windows Azure for PHP. Here is also an example of how to get Memcached working in Windows Azure using a plugin: Windows Azure Memcached plugin. In a future paper, we hope to cover this architecture in more detail. For now, here are several design and management considerations related to caching. Area Consideration Design and Implementation For a technology like Memcached, will the cache be collocated (spread across all web role instances)? Or will you attempt to setup a dedicated cache ring with worker roles that only run Memcached? Configuration What memory is required and how will items in the cache be invalidated? Performance and Monitoring What mechanisms will be used to detect the performance and overall health of the cache? For ease of use and cost savings, collocation of the cache across the web role instances of the Drupal site works best. However, this assumes that there is available reserve memory on each instance to apply toward caching. It is possible to increase the virtual machine size setting to increase the amount of available memory on each machine. It is also possible to add additional web role instances to add to the overall memory of the cache while at the same time improving the ability of the web site to respond to load. It is possible to create a dedicated cache cluster in the cloud, but the steps for this are beyond the scope of this paper[RR1] . For Windows Azure Blob storage, there is also a caching feature built into the service called the Content Delivery Network (CDN). CDN provides high-bandwidth access to files in Blob storage by caching copies of the files in edge nodes around the world. Even within a single geographic region, you could see performance improvements as there are many more edge nodes than Windows Azure datacenters. For more information, see Delivering High-Bandwidth Content with the Windows Azure CDN. Manageability It is important to note that each hosted service has a Staging environment and a Production environment. This can be used to manage deployments, because you can load and test and application in staging before performing a VIP swap with production. From a manageability standpoint, Drupal has an advantage on Windows Azure in the way that site content is stored. Because the data necessary to serve pages is stored in the database and blob storage, there is no need to redeploy the application to change the content of the site. Another best practice is to use a separate storage account for diagnostic data than the one that is used for the application itself. This can improve performance and also helps to separate the cost of diagnostic monitoring from the cost of the running application. As mentioned previously, there are several tools that can assist with managing Windows Azure applications. The following table summarizes a few of these choices. Tool Description Windows Azure Management Portal The web interface of the Windows Azure management portal shows deployments, instance counts and properties, and supports many different common management and monitoring tasks. Azure Diagnostics Managerq[RR2] [JR3] A Red Gate Software product that provides advanced monitoring and management of diagnostic data. This tool can be very useful for easily analyzing the performance of the Drupal site to determine appropriate scaling decisions. Azure Storage Explorer A tool created by Neudesic for viewing Windows Azure storage account. This can be useful for viewing both diagnostic data and the files in Blob storage.

April 25, 2012

by Brian Swan

· 8,793 Views

Amazon EMR Tutorial: Running a Hadoop MapReduce Job Using Custom JAR

See original post at https://muhammadkhojaye.blogspot.com/2012/04/how-to-run-amazon-elastic-mapreduce-job.html Introduction Amazon EMR is a web service which can be used to easily and efficiently process enormous amounts of data. It uses a hosted Hadoop framework running on the web-scale infrastructure of Amazon EC2 and Amazon S3. Amazon EMR removes most of the cumbersome details of Hadoop while taking care of provisioning of Hadoop, running the job flow, terminating the job flow, moving the data between Amazon EC2 and Amazon S3, and optimizing Hadoop. In this tutorial, we will use a developed WordCount Java example using Hadoop and thereafter, we execute our program on Amazon Elastic MapReduce. Prerequisites You must have valid AWS account credentials. You should also have a general familiarity with using the Eclipse IDE before you begin. The reader can also use any other IDE of their choice. Step 1 – Develop MapReduce WordCount Java Program In this section, we are first going to develop a WordCount application. A WordCount program will determine how many times different words appear in a set of files. In Eclipse (or whatever the IDE you are using), Create simple Java Project with the name "WordCount". Create a java class name Map and override the map method as follow, public class Map extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Create a java class named Reduce and override the reduce method as shown below, public class Reduce extends Reducer { @Override protected void reduce(Text key, java.lang.Iterable values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } Create a java class named WordCount and defined the main method as below, public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } Export the WordCount program in a jar using eclipse and save it to some location on disk. Make sure that you have provided the Main Class (WordCount.jar) during extraction ofu8u the jar file as shown below. Our jar is ready!!! Step 2 – Upload the WordCount JAR and Input Files to Amazon S3 Now we are going to upload the WordCount jar to Amazon S3. First, go to the following URL: https://console.aws.amazon.com/s3/home Next, click “Create Bucket”, give your bucket a name, and click the “Create” button. Select your new S3 bucket in the left-hand pane. Upload the WordCount JAR and sample input file for counting the words. Step 3 – Running an Elastic MapReduce job Now that the JAR is uploaded into S3, all we need to do is to create a new Job flow. let's execute the steps below. (I encourage readers to check out the following link for details regarding each step, How to Create a Job Flow Using a Custom JAR ) Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/ Click Create New Job Flow. In the DEFINE JOB FLOW page, enter the following details, a) Job Flow Name = WordCountJob b) Select Run your own applications) Select Custom JAR in the drop-down list) Click Continue In the SPECIFY PARAMETERS page, enter values in the boxes using the following table as a guide, and then click Continue.JAR Location = bucketName/jarFileLocationJAR Arguments =s3n://bucketName/inputFileLocations3n://bucketName/outputpath Please note that the output path must be unique each time we execute the job. The Hadoop always create a folder with the same name specified here. After executing the job, just wait and monitor your job that runs through the Hadoop flow. You can also look for errors by using the Debug button. The job should be complete within 10 to 15 minutes (can also depend on the size of the input). After completing the job, You can view results in the S3 Browser panel. You can also download the files from S3 and can analyze the outcome of the job. Amazon Elastic MapReduce Resources Amazon Elastic MapReduce Documentation,http://aws.amazon.com/documentation/elasticmapreduce/ Amazon Elastic MapReduce Getting Started Guide,http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/ Amazon Elastic MapReduce Developer Guide,http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/ Apache Hadoop,http://hadoop.apache.org/ See more at https://muhammadkhojaye.blogspot.com/2012/04/how-to-run-amazon-elastic-mapreduce-job.html

April 23, 2012

by Muhammad Ali Khojaye

· 59,095 Views

How to Use Sigma.js with Neo4j

i’ve done a few posts recently using d3.js and now i want to show you how to use two other great javascript libraries to visualize your graphs. we’ll start with sigma.js and soon i’ll do another post with three.js . we’re going to create our graph and group our nodes into five clusters. you’ll notice later on that we’re going to give our clustered nodes colors using rgb values so we’ll be able to see them move around until they find their right place in our layout. we’ll be using two sigma.js plugins, the gefx (graph exchange xml format) parser and the forceatlas2 layout. you can see what a gefx file looks like below. notice it comes from gephi which is an interactive visualization and exploration platform, which runs on all major operating systems, is open source, and is free. ... ... in order to build this file, we will need to get the nodes and edges from the graph and create an xml file. get '/graph.xml' do @nodes = nodes @edges = edges builder :graph end we’ll use cypher to get our nodes and edges: def nodes neo = neography::rest.new cypher_query = " start node = node:nodes_index(type='user')" cypher_query << " return id(node), node" neo.execute_query(cypher_query)["data"].collect{|n| {"id" => n[0]}.merge(n[1]["data"])} end we need the node and relationship ids, so notice i’m using the id() function in both cases. def edges neo = neography::rest.new cypher_query = " start source = node:nodes_index(type='user')" cypher_query << " match source -[rel]-> target" cypher_query << " return id(rel), id(source), id(target)" neo.execute_query(cypher_query)["data"].collect{|n| {"id" => n[0], "source" => n[1], "target" => n[2]} } end so far we have seen graphs represented as json, and we’ve built these manually. today we’ll take advantage of the builder ruby gem to build our graph in xml. xml.instruct! :xml xml.gexf 'xmlns' => "http://www.gephi.org/gexf", 'xmlns:viz' => "http://www.gephi.org/gexf/viz" do xml.graph 'defaultedgetype' => "directed", 'idtype' => "string", 'type' => "static" do xml.nodes :count => @nodes.size do @nodes.each do |n| xml.node :id => n["id"], :label => n["name"] do xml.tag!("viz:size", :value => n["size"]) xml.tag!("viz:color", :b => n["b"], :g => n["g"], :r => n["r"]) xml.tag!("viz:position", :x => n["x"], :y => n["y"]) end end end xml.edges :count => @edges.size do @edges.each do |e| xml.edge:id => e["id"], :source => e["source"], :target => e["target"] end end end end you can get the code on github as usual and see it running live on heroku. you will want to see it live on heroku so you can see the nodes in random positions and then move to form clusters. use your mouse wheel to zoom in, and click and drag to move around. credit goes out to alexis jacomy and mathieu jacomy . you’ve seen me create numerous random graphs, but for completeness here is the code for this graph. notice how i create 5 clusters and for each node i assign half its relationships to other nodes in their cluster and half to random nodes? this is so the forceatlas2 layout plugin clusters our nodes neatly. def create_graph neo = neography::rest.new graph_exists = neo.get_node_properties(1) return if graph_exists && graph_exists['name'] names = 500.times.collect{|x| generate_text} clusters = 5.times.collect{|x| {:r => rand(256), :g => rand(256), :b => rand(256)} } commands = [] names.each_index do |n| cluster = clusters[n % clusters.size] commands << [:create_node, {:name => names[n], :size => 5.0 + rand(20.0), :r => cluster[:r], :g => cluster[:g], :b => cluster[:b], :x => rand(600) - 300, :y => rand(150) - 150 }] end names.each_index do |from| commands << [:add_node_to_index, "nodes_index", "type", "user", "{#{from}"] connected = [] # create clustered relationships members = 20.times.collect{|x| x * 10 + (from % clusters.size)} members.delete(from) rels = 3 rels.times do |x| to = members[x] connected << to commands << [:create_relationship, "follows", "{#{from}", "{#{to}"] unless to == from end # create random relationships rels = 3 rels.times do |x| to = rand(names.size) commands << [:create_relationship, "follows", "{#{from}", "{#{to}"] unless (to == from) || connected.include?(to) end end batch_result = neo.batch *commands end

April 12, 2012

by Max De Marzi

· 15,475 Views

Hadoop Basics—Creating a MapReduce Program

The Map Reduce Framework works in two main phases to process the data, which are the "map" phase and the "reduce" phase.

March 18, 2012

by Carlo Scarioni

· 212,854 Views · 4 Likes

Joins with MapReduce

i have been reading up on join implementations available for hadoop for past few days. in this post i recap some techniques i learnt during the process. the joins can be done at both map side and join side according to the nature of data sets of to be joined. reduce side join let’s take the following tables containing employee and department data. let’s see how join query below can be achieved using reduce side join. select employees.name, employees.age, department.name from employees inner join department on employees.dept_id=department.dept_id map side is responsible for emitting the join predicate values along with the corresponding record from each table so that records having same department id in both tables will end up at on same reducer which would then do the joining of records having same department id. however it is also required to tag the each record to indicate from which table the record originated so that joining happens between records of two tables. following diagram illustrates the reduce side join process. here is the pseudo code for map function for this scenario. map (k table, v rec) { dept_id = rec.dept_id tagged_rec.tag = table tagged_rec.rec = rec emit(dept_id, tagged_rec) } at reduce side join happens within records having different tags. reduce (k dept_id, list tagged_recs) { for (tagged_rec : tagged_recs) { for (tagged_rec1 : taagged_recs) { if (tagged_rec.tag != tagged_rec1.tag) { joined_rec = join(tagged_rec, tagged_rec1) } emit (tagged_rec.rec.dept_id, joined_rec) } } map side join (replicated join) using distributed cache on smaller table for this implementation to work one relation has to fit in to memory. the smaller table is replicated to each node and loaded to the memory. the join happens at map side without reducer involvement which significantly speeds up the process since this avoids shuffling all data across the network even-though most of the records not matching are later dropped. smaller table can be populated to a hash-table so look-up by dept_id can be done. the pseudo code is outlined below. map (k table, v rec) { list recs = lookup(rec.dept_id) // get smaller table records having this dept_id for (small_table_rec : recs) { joined_rec = join (small_table_rec, rec) } emit (rec.dept_id, joined_rec) } using distributed cache on filtered table if the smaller table doesn’t fit the memory it may be possible to prune the contents of it if filtering expression has been specified in the query. consider following query. select employees.name, employees.age, department.name from employees inner join department on employees.dept_id=department.dept_id where department.name="eng" here a smaller data set can be derived from department table by filtering out records having department names other than “eng”. now it may be possible to do replicated map side join with this smaller data set. replicated semi-join reduce side join with map side filtering even of the filtered data of small table doesn’t fit in to the memory it may be possible to include just the dept_id s of filtered records in the replicated data set. then at map side this cache can be used to filter out records which would be sent over to reduce side thus reducing the amount of data moved between the mappers and reducers. the map side logic would look as follows. map (k table, v rec) { // check if this record needs to be sent to reducer boolean sendtoreducer = check_cache(rec.dept_id) if (sendtoreducer) { dept_id = rec.dept_id tagged_rec.tag = table tagged_rec.rec = rec emit(dept_id, tagged_rec) } } reducer side logic would be same as the reduce side join case. using a bloom filter a bloom filter is a construct which can be used to test the containment of a given element in a set. a smaller representation of filtered dept_ids can be derived if dept_id values can be augmented in to a bloom filter. then this bloom filter can be replicated to each node. at the map side for each record fetched from the smaller table the bloom filter can be used to check whether the dept_id in the record is present in the bloom filter and only if so to emit that particular record to reduce side. since a bloom filter is guaranteed not to provide false negatives the result would be accurate. references [1] hadoop in action [2] hadoop : the definitive guide

March 12, 2012

by Buddhika Chamith

· 31,087 Views

Computing a disparity map in OpenCV

A disparity map contains information related to the distance of the objects of a scene from a viewpoint. In this example we will see how to compute a disparity map from a stereo pair and how to use the map to cut the objects far from the cameras. The stereo pair is represented by two input images, these images are taken with two cameras separated by a distance and the disparity map is derived from the offset of the objects between them. There are various algorithm to compute a disparity map, the one implemented in OpenCV is the graph cut algorithm. To use it we have to call the function CreateStereoGCState() to initialize the data structure needed by the algorithm and use the function FindStereoCorrespondenceGC() to get the disparity map. Let's see the code: def cut(disparity, image, threshold): for i in range(0, image.height): for j in range(0, image.width): # keep closer object if cv.GetReal2D(disparity,i,j) > threshold: cv.Set2D(disparity,i,j,cv.Get2D(image,i,j)) # loading the stereo pair left = cv.LoadImage('scene_l.bmp',cv.CV_LOAD_IMAGE_GRAYSCALE) right = cv.LoadImage('scene_r.bmp',cv.CV_LOAD_IMAGE_GRAYSCALE) disparity_left = cv.CreateMat(left.height, left.width, cv.CV_16S) disparity_right = cv.CreateMat(left.height, left.width, cv.CV_16S) # data structure initialization state = cv.CreateStereoGCState(16,2) # running the graph-cut algorithm cv.FindStereoCorrespondenceGC(left,right, disparity_left,disparity_right,state) disp_left_visual = cv.CreateMat(left.height, left.width, cv.CV_8U) cv.ConvertScale( disparity_left, disp_left_visual, -20 ); cv.Save( "disparity.pgm", disp_left_visual ); # save the map # cutting the object farthest of a threshold (120) cut(disp_left_visual,left,120) cv.NamedWindow('Disparity map', cv.CV_WINDOW_AUTOSIZE) cv.ShowImage('Disparity map', disp_left_visual) cv.WaitKey() These are the two input image I used to test the program (respectively left and right): Result using threshold = 100 Result using threshold = 120 Result using threshold = 180 Source: http://glowingpython.blogspot.com/2011/11/computing-disparity-map-in-opencv.html

February 21, 2012

by Giuseppe Vettigli

· 26,192 Views

Django: Excluding Some Views from Middleware

In my Django applications, I tend to use custom middleware extensively for common tasks. I have middleware that logs page runtime, middleware that sets context that most views will end up needing anyway, and middleware that copies the HTTP_REFERRER header from an entry page into the session scope for use later in the session. At some point, I inadvertently created a middleware class invalidated the browser cache for certain views. Typically, just wrapping a view in @cache_control(max_age=3600) is enough to have the browser cache that view for an hour. But if you do something innocuous like evaluate request.user.is_authenticated() in a middleware class, then Django will set the Vary: Cookie header, invalidating the cache. In my case, what I really wanted was a decorator that I could attach to a view that would skip my custom middleware, like an exclude list. Of course, you could just attach your middleware explicitly to each view that needs it, but that's needless code repetition if a middleware should wrap almost all views. You could also change each of your middleware classes to exclude particular views by URL, but you might end up having to alter many different middleware classes with that logic. As another option, you can use the following decorator/middleware pair to short-circuit the middleware execution of any view, for any middleware defined in your settings file AFTER this one. """ Allows short-curcuiting of ALL remaining middleware by attaching the @shortcircuitmiddleware decorator as the TOP LEVEL decorator of a view. Example settings.py: MIDDLEWARE_CLASSES = ( 'django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', # THIS MIDDLEWARE 'myapp.middleware.shortcircuit.ShortCircuitMiddleware', # SOME OTHER MIDDLE WARE YOU WANT TO SKIP SOMETIMES 'myapp.middleware.package.MostOfTheTimeMiddleware', # MORE MIDDLEWARE YOU WANT TO SKIP SOMETIMES HERE ) Example view to exclude from MostOfTheTimeMiddleware (and any subsequent): @shortcircuitmiddleware def myview(request): ... """ def shortcircuitmiddleware(f): """ view decorator, the sole purpose to is 'rename' the function '_shortcircuitmiddleware' """ def _shortcircuitmiddleware(*args, **kwargs): return f(*args, **kwargs) return _shortcircuitmiddleware class ShortCircuitMiddleware(object): """ Middleware; looks for a view function named '_shortcircuitmiddleware' and short-circuits. Relies on the fact that if you return an HttpResponse from a view, it will short-circuit other middleware, see: https://docs.djangoproject.com/en/dev/topics/http/middleware/#process-request """ def process_view(self, request, view_func, view_args, view_kwargs): if view_func.func_name == "_shortcircuitmiddleware": return view_func(request, *view_args, **view_kwargs) return None Source: http://bitkickers.blogspot.com/2011/08/django-exclude-some-views-from.html

February 20, 2012

by Chase Seibert

· 13,225 Views

How to deploy a neo4j instance in Amazon EC2 in 10 minutes

Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database. In this post I will explain how to deploy a neo4j instance in Amazon EC2 web service. For this tutorial to take you no more than 10 minutes you should be able to execute properly some bash commands like mv, tar, ssh and scp (secure copy). I also assume that you have an account in Amazon Web Services and you are familiar to the process of launching instances. If not, I strongly recommend you to follow this starting guide and complete it till you manage to connect to your instance with ssh. Start downloading the latest stable version of neo4j. Which you can find here. The “Community Edition” fits well for development purposes. Do not forget to select the Unix version of the server. This will download a tar.gz file which you will copy to your EC2 instance later. While you download the neo4j server open the AWS Management Console and launch a Basic 32-bit Amazon Linux AMI. If you want to launch an Ubuntu AMI please notice that it doesn’t ship with Java, which is required for running neo4j. If you are not familiar with key pairs, pem files or security groups I insist you to follow the EC2 starting guide I mentioned above. You can either create a new security group or use the default, but you will need to configure a new security rule for the neo4j server port. After launching the instance, create a TCP rule on port 7474 with source 0.0.0.0/0. Here you are opening port 7474 for anyone. If you are planning to use the neo4j REST API and remotely call it from another server, for example a Rails application hosted in Heroku, for security reasons, you may want to change the source field to the address of your Heroku server. Do not forget to open port 22 (SSH), this is typically the first rule normal people create after launching an instance. You are almost done! You should now install neo4j in your instance. Open a terminal in your localhost and navigate to the path where you downloaded neo4j. Copy the file to your Amazon instance by using the scp command: scp -i your_pem_file.pem neo4j-community-1.6.M01-unix.tar.gz ec2-user@YOUR_PUBLIC_INSTANCE_DNS:/home/ec2-user Please notice that you will need to change the path to your pem file, typically placed in ~/.ssh, the filename of the neo4j server you just downloaded and the plublic DNS of your instance. Now connect to your instance with SSH: ssh -i your_pem_file.pem ec2-user@YOUR_PUBLIC_INSTANCE_DNS Untar the neo4j server: tar xvfz neo4j-community-1.6.M01-unix.tar.gz.tar.gz Move it to /usr/local and rename the folder to neo4j: sudo mv neo4j-community-1.6.M01 /usr/local/neo4j Almost done!!! You should now open neo4j-server.properties under the conf directory and add the following line: org.neo4j.server.webserver.address=0.0.0.0 This lines allows anyone to connect remotely to your neo4j database server. Now run the start script. From the neo4j server folder. sudo ./bin/neo4j start Finally, open a browser and access the webadmin interface of your neo4j database by typing http://YOUR_PUBLIC_INSTANCE_DNS:7474. You should see the Neo4j Monitoring and Management Tool, pretty cool! If not, ask me You can now try using the REST API and the curl bash command to insert nodes and relationships. I hope this post helped you, good luck! Follow me on Twitter @negarnil Source: http://www.cloudtmp.com/java/how-to-deploy-a-neo4j-instance-in-amazon-ec2-in-10-minutes/

December 27, 2011

by Nicolas Garnil

· 27,440 Views · 1 Like

Zero Downtime – What is it and why is it important?

For most large web applications, uptime is of foremost importants. Any outage can be seen by customers as a frustration, or opportunity to move to a competitor. What's more for a site that also includes e-commerce, it can mean real lost sales. Zero Downtime describes a site without service interruption. To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure. If you're using cloud hosting, are you redundant to alternate availability zones and regions? Are you using geographically distributed load balancing? Do you have multiple clustered databases on the backend, and multiple webservers load balanced. All of these requirements will increase uptime, but may not bring you close to zero downtime. For that you'll need thorough testing. The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage. The ultimate test is the outage itself. Sean Hull on Quora: What is zero downtime and why is it important? Source: http://www.iheavy.com/2011/06/23/zero-downtime-what-is-it-and-why-is-it-important/

November 23, 2011

by Sean Hull

· 26,149 Views

Handling PHP Sessions in Windows Azure

One of the challenges in building a distributed web application is in handling sessions. When you have multiple instances of an application running and session data is written to local files (as is the default behavior for the session handling functions in PHP) a user session can be lost when a session is started on one instance but subsequent requests are directed (via a load balancer) to other instances. To successfully manage sessions across multiple instances, you need a common data store. In this post I’ll show you how the Windows Azure SDK for PHP makes this easy by storing session data in Windows Azure Table storage. In the 4.0 release of the Windows Azure SDK for PHP, session handling via Windows Azure Table and Blob storage was included in the newly added SessionHandler class. Note: The SessionHandler class supports storing session data in Table storage or Blob storage. I will focus on using Table storage in this post largely because I haven’t been able to come up with a scenario in which using Blob storage would be better (or even necessary). If you have ideas about how/why Blob storage would be better, I’d love to hear them. The SessionHandler class makes it possible to write code for handling sessions in the same way you always have, but the session data is stored on a Windows Azure Table instead of local files. To accomplish this, precede your usual session handling code with these lines: require_once 'Microsoft/WindowsAzure/Storage/Table.php'; require_once 'Microsoft/WindowsAzure/SessionHandler.php'; $storageClient = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', 'your storage account name', 'your storage account key'); $sessionHandler = new Microsoft_WindowsAzure_SessionHandler($storageClient , 'sessionstable'); $sessionHandler->register(); Now you can call session_start() and other session functions as you normally would. Nicely, it just works. Really, that’s all there is to using the SessionHandler, but I found it interesting to take a look at how it works. The first interesting thing to note is that the register method is simply calling the session_set_save_handler function to essentially map the session handling functionality to custom functions. Here’s what the method looks like from the source code: public function register() { return session_set_save_handler(array($this, 'open'), array($this, 'close'), array($this, 'read'), array($this, 'write'), array($this, 'destroy'), array($this, 'gc') ); } The reading, writing, and deleting of session data is only slightly more complicated. When writing session data, the key-value pairs that make up the data are first serialized and then base64 encoded. The serialization of the data allows for lots of flexibility in the data you want to store (i.e. you don’t have to worry about matching some schema in the data store). When storing data in a table, each entry must have a partition key and row key that uniquely identify it. The partition key is a string (“sessions” by default, but this is changeable in the class constructor) and the the row key is the session ID. (For more information about the structure of Tables, see this post.) Finally, the data is either updated (it it already exists in the Table) or a new entry is inserted. Here’s a portion of the write function: $serializedData = base64_encode(serialize($serializedData)); $sessionRecord = new Microsoft_WindowsAzure_Storage_DynamicTableEntity($this->_sessionContainerPartition, $id); $sessionRecord->sessionExpires = time(); $sessionRecord->serializedData = $serializedData; try { $this->_storage->updateEntity($this->_sessionContainer, $sessionRecord); } catch (Microsoft_WindowsAzure_Exception $unknownRecord) { $this->_storage->insertEntity($this->_sessionContainer, $sessionRecord); } Not surprisingly, when session data is read from the table, it is retrieved by session ID, base64 decoded, and unserialized. Again, here’s a snippet that show’s what is happening: $sessionRecord = $this->_storage->retrieveEntityById( $this->_sessionContainer, $this->_sessionContainerPartition, $id ); return unserialize(base64_decode($sessionRecord->serializedData)); As you can see, the SessionHandler class makes good use of the storage APIs in the SDK. To learn more about the SessionHandler class (and the storage APIs), check out the documentation on Codeplex. You can, of course, get the complete source code here: http://phpazure.codeplex.com/SourceControl/list/changesets. As I investigated the session handling in the Windows Azure SDK for PHP, I noticed that the absence of support for SQL Azure as a session store was conspicuous. I’m curious about how many people would prefer to use SQL Azure over Azure Tables as a session store. If you have an opinion on this, please let me know in the comments.

October 19, 2011

by Brian Swan

· 7,927 Views

EC2 Interview – AWS Interview – Cloud Interview – 8 Questions

If you're looking for a cloud expert, specifically someone who knows Amazon Web Services and EC2, you'll want to have a battery of questions to assess their knowledge.

September 15, 2011

by Sean Hull

· 111,875 Views · 1 Like

Cloud Integration with Apache Camel and Amazon Web Services (AWS): S3, SQS and SNS

The integration framework Apache Camel already supports several important cloud services (see my overview article at http://www.kai-waehner.de/blog/2011/07/09/cloud-computing-heterogeneity-will-require-cloud-integration-apache-camel-is-already-prepared for more details). This article describes the combination of Apache Camel and the Amazon Web Services (AWS) interfaces of Simple Storage Service (S3), Simple Queue Service (SQS) and Simple Notification Service (SNS). Thus, The concept of Infrastructure as a Service (IaaS) is used to access messaging systems and data storage without any need for configuration. Registration to AWS and Setup of Camel First, you have to register to the Amazon Web Services (for free). Most AWS services include a free monthly quota, which is absolutely sufficient to play around and develop some simple applications. As its name states, AWS uses technology-independent web services. Besides, APIs for several different programming languages are available to ease development. By the way, Camel uses the AWS SDK for Java (http://aws.amazon.com/sdkforjava), of course. The documentation is detailed and easy to understand, including tutorials, screenshots and code examples . Hint 1: You should read the introductions to S3, SQS and SNS (go to http://aws.amazon.com and click on „products“) and play around with the AWS Management Console (http://aws.amazon.com/console) before you continue. This step is very easy and takes less than one hour. Then, you will have a much better understanding about AWS and where Camel can help you! Hint 2: It really helps to look at the source code of the camel-aws component, It helps you to understand how Camel uses the AWS Java API internally. If you want to write tests, you can do it the same way. In the past, I was afraid of looking at „complex“ source code of open source frameworks. But there is no need to be scared! The camel-aws component (and most other camel components) contain only of a few classes. Everything is easy to understand. It helps you to understand Camel internals, the AWS API, and to spot and solve errors due to exceptions in your code. In the meanwhile, the current Camel version 2.8 supports three AWS services: S3, SQS and SNS. All of them use similar concepts. Therefore, they are included in one single camel component: „camel-aws“. You have to add the libraries to your existing Camel project. As always, the simplest way is to use Maven and add the following dependency to the pom.xml: org.apache.camel camel-aws ${camel-version} Configuration of the Camel Endpoint The implementation and configuration of all three services is very similar. The URI looks like this (the code shows the SQS service): aws-sqs://queue-name[?options] There are two alternatives to configure your endpoint. Using Parameters The easy way is to use two paramters in the URI of your endpoint: „accessKey“ and „secretKey“ (you receive both after your AWS registration). “aws-sqs://unique-queue-name?accessKey=“INSERT_ME“&secretKey=INSERT_ME” Be aware of the following problem, which can result in a strange, non-speaking exception (thanks to Brendan Long): You’ll need to URL encode any +’s in your secret key (otherwise, they’ll be treated as spaces). + = %2B, so if your secretkey was “my+secret\key”, your Camel URL should have “secretKey=my%2Bsecret\key”. “Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.” Source: WC3 URI Recommendations Adding a configured AmazonClient to the Registry If you need to do more configuration (e.g. because your system is behind a firewall), you have to add an AmazonClient object to your registry. The following code shows an example using SQS, but SNS and S3 use exactly the same concept. @Override protected JndiRegistry createRegistry() throws Exception { JndiRegistry registry = super.createRegistry(); AWSCredentials awsCredentials = new BasicAWSCredentials(“INSERT_ME”, “INSERT_ME”); ClientConfiguration clientConfiguration = new ClientConfiguration(); clientConfiguration.setProxyHost(“http://myProxyHost”); clientConfiguration.setProxyPort(8080); AmazonSQSClient client = new AmazonSQSClient(awsCredentials, clientConfiguration); registry.bind(“amazonSQSClient”, client); return registry; } This example overwrites the createRegistry() method of a JUnit test (extending CamelTestSupport). You can also add this information to your runtime Camel application, of course. Apache Camel and the Simple Storage Service (S3) Simple Storage Service (S3) is a key-value-store. You can store small to very large data. The usage is very easy. You create buckets and put key-value data into these buckets. You can also create folders within buckets to organize your data. That’s it. You can monitor your buckets using the AWS Management Console – an intuitive GUI supporting most AWS services. The following example shows both alternatives for accessing the Amazon services (as described above): Paramenters and the AmazonClient. // Transfer data from your file inbox to the AWS S3 service from(“file:files/inbox”) // This is the key of your key-value data .setHeader(S3Constants.KEY, simple(“This is a static key”)) // Using parameters for accessing the AWS service .to(“aws-s3://camel-integration-bucket-mwea-kw?accessKey=INSERT_ME&secretKey=INSERT_ME&region=eu-west-1″); // Transfer data from the AWS S3 service to your file outbox from(“aws-s3://camel-integration-bucket-mwea-kw?amazonS3Client=#amazonS3Client&region=eu-wes”) .to(“file:files/outbox”); There are some additional parameters, for instance you can submit the desired AWS region or delete data after receiving it (see http://camel.apache.org/aws-s3.html and the corresponding SQS and SNS sites for more details about parameters and message headers). As you see in the code, you can use the AWS-S3 endpoint for producing and for consuming messages. Each bucket must be unique, thus you have to add some specific information such as your company to its name. Hint: If a bucket does not exist, Camel is creating it automatically (as the AWS API does). This concept is also used for SQS queues and SNS topics. Apache Camel and the Simple Queue Service (SQS) The Simple Queue Service (SQS) is similar to a JMS provider such as WebSphere MQ or ActiveMQ (but with some differences). You create queues and send messages to them. Consumers receive the messages. Contrary to most other AWS services, you cannot monitor queues by using the AWS management console directly. You have to use the service „Cloudwatch“ (http://aws.amazon.com/cloudwatch) and start an EC2 instance to monitor queues and its content. As you can see in the following code example, the syntax and concepts are almost the same as for the S3 service: from(“file:inbox”) .to(“aws-sqs://camel-integration-queue-mwea-kw?accessKey=INSERT_ME&secretKey=INSERT_ME”); from(“aws-sqs://camel-integration-queue-mwea-kw?amazonSQSClient=#amazonSQSClient”) .to(“file:outbox?fileName=sqs-${date:now:yyyy.MM.dd-hh:mm:ss:SS}”); Again, you can use the AWS-SQS endpoint for producing and for consuming messages. Each queue name must be unique. There exist two important differences to JMS (copy & paste from the AWS documentation): Q: How many times will I receive each message? Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies. Q: Why are there separate ReceiveMessage and DeleteMessage operations? When Amazon SQS returns a message to you, that message stays in the queue, whether or not you actually received the message. You are responsible for deleting the message; the delete request acknowledges that you’re done processing the message. If you don’t delete the message, Amazon SQS will deliver it again on another receive request. Apache Camel and the Simple Notification Service (SNS) The Simple Notification Service (SNS) acts like JMS topics. You create a topic, consumers subscribe to the topic and then receive notifications. Several transport protocols are supported: HTTP(S), Email and SQS. Further interfaces will be added in the future, e.g. the Short Message Service (SMS) for mobile phones. Contrary to S3 and SQS, Camel only offers a producer endpoint for this AWS service. You can only create topics and send messages via Camel. The reason is simple: Camel already offers endpoints for consuming these messages: HTTP, Email and SQS are already available. There is one tradeoff: A consumer cannot subscribe to topics using Camel – at the moment. The AWS Management Console has to be used. A very interesting discussion can be read on the Camel JIRA issue regarding the following questions: Should Camel be able to subscribe to topics? Should the producer contain this feature or should there be a consumer? In my opinion, there should be a consumer which is able to subscribe to topics, otherwise Camel is missing a key part of the AWS SNS service! Please read the discussion and contribute your opinion: https://issues.apache.org/jira/browse/CAMEL-3476. Apache Camel is already ready for the Cloud Computing Era AWS offers many more services for the cloud. Probably, it does not make sense to integrate everyone into Camel, but more AWS services will be supported in the future. For instance, SimpleDB and the Relational Database Service (RDS) are already planned and make sende, too: http://camel.apache.org/aws.html. The conclusion is easy: Apache Camel is already ready for the cloud computing era. Several important cloud services are already supported. Cloud integration will become very important in the future. Thus, Camel is on a very good way. Hopefully, we will see more cloud components, soon. I will continue to write articles about other Camel cloud components (and new AWS addons, ouf course). For instance, a component for the Platform as a Service (PaaS) product Google App Engine (GAE) is already available. If you have any additional important information, questions or other feedback, please write a comment. Thank you in advance… Best regards, Kai Wähner (Twitter: @KaiWaehner) [Content from my Blog: Cloud Integration with Apache Camel and Amazon Web Services (AWS): S3, SQS and SNS]

August 30, 2011

by Kai Wähner

CORE

· 26,196 Views