DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest AI/ML Topics

article thumbnail
Create a Java App Server on a Virtual Machine
Curator's note: This tutorial originally appeared at the Windows Azure Java Developer Center. With Windows Azure, you can use a virtual machine to provide server capabilities. As an example, a virtual machine running on Windows Azure can be configured to host a Java application server, such as Apache Tomcat. On completing this guide, you will have an understanding of how to create a virtual machine running on Windows Azure and configure it to run a Java application server. You will learn: How to create a virtual machine. How to remotely log in to your virtual machine. How to install a JDK on your virtual machine. How to install a Java application server on your virtual machine. How to create an endpoint for your virtual machine. How to open a port in the firewall for your application server. For purposes of this tutorial, an Apache Tomcat application server will be installed on a virtual machine. The completed installation will result in a Tomcat installation such as the following. Note To complete this tutorial, you need a Windows Azure account that has the Windows Azure Virtual Machines feature enabled. You can create a free trial account and enable preview features in just a couple of minutes. For details, see Create a Windows Azure account and enable preview features. To create a virtual machine Log in to the Windows Azure Preview Management Portal. Click New. Click Virtual machine. Click Quick create. In the Create virtual machine screen, enter a value for DNS name. From the Image dropdown list, select an image, such as Windows Server 2008 R2 SP1. Enter a password in the New password field, and re-enter it in the Confirm field. This is the Administrator account password. Remember this password, you will use it when you remotely log in to the virtual machine. From the Location drop down list, select the data center location for your virtual machine; for example, West US. Your screen will look similar to the following. Click Create virtual machine. Your virtual machine will be created. You can monitor the status in the Virtual machines section of the management portal. To remotely log in to your virtual machine Log in to the Preview Management Portal. Click Virtual Machines, and then select the MyTestVM1 virtual machine that you previously created. On the command bar, click Connect. Click Open to use the remote desktop protocol file that was automatically created for the virtual machine Click Connect to proceed with the connection process. Type the password that you specified as the password of the Administrator account when you created the virtual machine, and then click OK. Click Yes to verify the identity of the virtual machine. To install a JDK on your virtual machine You can copy a Java Developer Kit (JDK) to your virtual machine, or install a JDK through an installer. For purposes of this tutorial, a JDK will be installed from Oracle's site. Log in to your virtual machine. Within your browser, open http://www.oracle.com/technetwork/java/javase/downloads/index.html. Click the Download button for the JDK that you want to download. For purposes of this tutorial, the Download button for the Java SE 6 Update 32 JDK was used. Accept the license agreement. Click the download executable for Windows x64 (64-bit). Follow the prompts and respond as needed to install the JDK to your virtual machine. To install a Java application server on your virtual machine You can copy a Java application server to your virtual machine, or install a Java application server through an installer. For purposes of this tutorial, a Java application server will be installed by copying a zip file from Apache's site. Log in to your virtual machine. Within your browser, open http://tomcat.apache.org/download-70.cgi. Double-click 64-bit Windows zip. (This tutorial used the zip for Tomcat Apache 7.0.27.) When prompted, choose to save the zip. When the zip is saved, open the folder that contains the zip and double-click the zip. Extract the zip. For purposes of this tutorial, the path used was C:\program files\apache-tomcat-7.0.27-windows-x64. To run the Java application server privately on your virtual machine The following steps show you how to run the Java application server and test it within the virtual machine's browser. It won't be usable by external computers until you create an endpoint and open a port (those steps are described later). Log in to your virtual machine. Add the JDK bin folder to the Pathenvironment variable: Click Windows Start. Right-click Computer. Click Properties. Click Advanced system settings. Click Advanced. Click Environment variables. In the System variables section, click the Path variable and then click Edit. Add a trailing ; to the Path variable value (if there is not one already) and then add c:\program files\java\jdk\bin to the end of the Path variable value (adjust the path as needed if you did not use c:\program files\java\jdk as the path for your JDK installation). Press OK on the opened dialogs to save your Path change. Set the JAVA_HOMEenvironment variable: Click Windows Start. Right-click Computer. Click Properties. Click Advanced system settings. Click Advanced. Click Environment variables. In the System variables section, click New. Create a variable named JRE_HOME and set its value to c:\program files\java\jdk\jre (adjust the path as needed if you did not use c:\program files\java\jdk as the path for your JDK installation). Press OK on the open dialogs to save your JRE_HOME environment variable. Start Tomcat: Open a command prompt. Change the current directory to the Apache Tomcat binfolder. For example: cd c:\program files\apache-tomcat-7.0.27-windows-x64\apache-tomcat-7.0.27\bin (Adjust the path as needed if you used a differrent installation path for Tomcat.) Run catalina.bat start. You should now see Tomcat running if you run the virtual machine's browser and open http://localhost:8080. To see Tomcat running from external machines, you'll need to create an endpoint and open a port. To create an endpoint for your virtual machine Log in to the Preview Management Portal. Click Virtual machines. Click the name of the virtual machine that is running your Java application server. Click Endpoints. Click Add endpoint. In the Add endpoint dialog, ensure Add endpoint is checked and click the Next button. In the New endpoint detailsdialog Specify a name for the endpoint; for example, HttpIn. Specify TCP for the protocol. Specify 80 for the public port. Specify 8080for the private port. Your screen should look similar to the following: Click the Check button to close the dialog. Your endpoint will now be created. To open a port in the firewall for your virtual machine Log in to your virtual machine. Click Windows Start. Click Control Panel. Click System and Security, click Windows Firewall, and then click Advanced Settings. Click Inbound Rules and then click New Rule. For the new rule, select Port for the Rule type and click Next. Select TCP for the protocol and specify 8080 for the port, and click Next. Choose Allow the connection and click Next. Ensure Domain, Private, and Public are checked for the profile and click Next. Specify a name for the rule, such as HttpIn (the rule name is not required to match the endpoint name, however), and then click Finish. At this point, your Tomcat web site should now be viewable from an external browser, using a URL of the form http://your_DNS_name.cloudapp.net, where your_DNS_name is the DNS name you specified when you created the virtual machine. Application lifecycle considerations You could create your own application web archive (WAR) and add it to the webapps folder. For example, create a basic Java Service Page (JSP) dynamic web project and export it as a WAR file, copy the WAR to the Apache Tomcat webapps folder on the virtual machine, then run it in a browser. This tutorial runs Tomcat through a command prompt where catalina.bat start was called. You may instead want to run Tomcat as a service, a key benefit being to have it automatically start if the virtual machine is rebooted. To run Tomcat as a service, you can install it as a service via the service.bat file in the Apache Tomcat bin folder, and then you could set it up to run automatically via the Services snap-in. You can start the Services snap-in by clicking Windows Start, Administrative Tools, and then Services. If you run service.bat install MyTomcat in the Apache Tomcat bin folder, then within the Services snap-in, your service name will appear as Apache Tomcat MyTomcat. By default when the service is installed, it will be set to start manually. To set it to start automatically, double-click the service in the Services snap-in and set Startup Type to Automatic, as shown in the following. You'll need to start the service the first time, which you can do through the Services snap-in (alternatively, you can reboot the virtual machine). Close the running occurrence of catalina.bat start if it is still running before starting the service.
October 15, 2012
by Eric Gregory
· 31,192 Views
article thumbnail
Asynchronous WMI Queries: Stay Away From Them
So, it turns out that I have a WMI category on my blog. During the last couple of years I almost forgot about it, but WMI got a chance to wrap its poisonous tentacles around me again yesterday. Here’s another story. WMI is known for requiring lots of attention to security. To establish a WMI connection to a remote machine, you need to muck around with registry settings, DCOM configuration, group policy details, and other infernal things which we developers like to defer to someone else. But at least you know that once a machine has been configured properly to give you access through WMI, you can then access it from any other machine. Right? Right? Not so much. WMI has a concept of asynchronous queries, which are notably used for receiving event notifications. For example, the following code registers for an event notification whenever a process is created on my desktop machine: ManagementScope scope = new ManagementScope(@"\\sasha-desktop\root\cimv2"); WqlEventQuery query = new WqlEventQuery( "SELECT * FROM Win32_ProcessStartTrace"); ManagementEventWatcher watcher = new ManagementEventWatcher(scope, query); watcher.EventArrived += (o, e) => ...; //TODO: process the event watcher.Start(); Indeed, this thing works just fine if you point it to a local machine; but it fails when you call the Start method when you connect it to a remote machine. You could now strip the remote machine bare and have it expose its very innate networking guts to the entire Internet, and it still wouldn’t help you establish the connection. Interesting. When troubleshooting this nasty bug, I looked up a VBScript sample that receives new process creation events on another machine. Here it is: Set wmi = GetObject("winmgmts:\\sasha-desktop\root\cimv2") Set query = wmi.ExecNotificationQuery _ ("SELECT * FROM Win32_ProcessStartTrace'") Set process = query.NextEvent VBScript and all, it worked just fine. I started to suspect something smelly in the kingdom of .NET, so I rewrote the VBScript sample in C#, using the long-forgotten Microsoft.VisualBasic.Interaction class: dynamic wmi = Microsoft.VisualBasic.Interaction.GetObject( "winmgmts:\\sasha-desktop\root\cimv2"); dynamic query = wmi.ExecNotificationQuery( "SELECT * FROM Win32_ProcessStartTrace"); dynamic evt = query.NextEvent; This, too, worked just fine – although it’s not much a surprise, as it’s pretty much equivalent to the VBScript code at this time. Still interesting. This is when it hit me – the asynchronous nature of the ManagementEventWatcher.EventArrived event relies on an asynchronous WMI query, which requires a reverse connection to the client machine! This is configuration inferno, x2, on the client machine now, what with the DCOM security settings and sacrifices to the gods of group policy. Unless, of course, we give away the asynchrony and rely on the ManagementEventWatcher.WaitForNextEvent method. It’s synchronous. It burns a thread that has to sit idly by and wait while its siblings execute useful work. But it doesn’t establish a reverse DCOM connection to the caller. At least that.
September 22, 2012
by Sasha Goldshtein
· 10,995 Views
article thumbnail
New ActiveMQ failover and Clustering Goodies
For the last two weeks I’ve been working on some interesting use cases for the good ol’ failover transport. I finally have some time at my hands, so here’s a brief recap of what’s coming in 5.6 release in this area. First there’s a new feature, called Priority Backup. It’s described in details here, but in a nutshell it provides you with the mechanism of prioritizing your failover urls and keep your clients connected to them as soon as they are available. The most obvious use case for this is to keep your clients connected to the broker in local data center whenever you can. By doing this, you can both have better performances and stability of your clients, but also save on your bandwidth bills. Another improvement is coming for automatic broker cluster feature. Although this feature is not new, I spent some time hardening it and thought to share some more insight in how (and when) to use it in your projects. In search of high availability, people often default to master-slave architecture. This makes sense in most use cases, but if your flow is purely non-persistent you can probably come up with more optimal architecture. Instead of having one broker at the time handling all your load, and other one just waiting for it to fail, you’ll get more efficient system with some kind of active-active configuration where (possibly multiple) brokers share the load all the time. Ideally clients would be evenly distributed and would rebalance if anything changes. Brokers don’t need to share any messages as clients are distributed and messages are non-persistent so they will be lost if broker fails. So can you achieve this kind of architecture with ActiveMQ? Sure you do. That’s where automatic rebalance and clustering shines. First of all, brokers should be networked but only so they can exchange information on their availability. They shouldn’t exchange the messages (but of course can if your use case needs it). In 5.6 you do that with pure static networks, using configuration like So now imagine three brokers A,B and C forming a full mesh. In addition every broker uses rebalance options on their transport connectors All that is left for the client to do is connect to one of the brokers it knows like failover:(brokerA) and the broker will fill it with all information on other brokers in the cluster and whether it should reconnect to one of them or not. So having a large number of clients connecting like this, very soon they’ll rebalance over available brokers. You can stop one of the brokers in the cluster for updates and clients will rebalance over remaining ones. You can even add a new broker to the cluster and everything will get rebalanced without any need for you to touch your clients. So, basically in this way you have both load balancing and high availability for your non-persistent messages. Additionally, your clients are automatically updated with all information they need, and no manual intervention is needed. Although the basic support for clustering was there since 5.4, I did some more hardening and better rebalancing, so it’s coming in the Apache ActiveMQ 5.6 (and the next Fuse 5.5.1) release. Also, there are some more great stuff regarding broker clustering coming soon, so stay tuned and happy messaging.
September 10, 2012
by Dejan Bosanac
· 15,394 Views
article thumbnail
Algorithm of the Week: Graphs and Their Representation
Although this post is supposed to be about algorithms I’ll cover more on graphs and their computer representation.
September 4, 2012
by Stoimen Popov
· 59,269 Views · 8 Likes
article thumbnail
Machine Learning: Measuring Similarity and Distance
Measuring similarity or distance between two data points is fundamental to many Machine Learning algorithms such as K-Nearest-Neighbor, Clustering ... etc.
August 10, 2012
by Ricky Ho
· 54,277 Views · 6 Likes
article thumbnail
Using Multiple Versions of JDK and Eclipse in Single Machine
In my office laptop, I have installed two versions of JDK. For the office work, I need JDK6 because the internal framework needs it. I’m using JDK7 for my personal projects and exploring the latest and greatest in Java. I have two versions of Eclipse too (one for office work and one is the latest Juno). But, the tricky thing is to manage these multiple JDKs and IDEs. It’s a piece of cake if I just use Eclipse for compiling my code, because the IDE allows me to configure multiple versions of Java runtime. Unfortunately (or fortunately), I have to use the command line/shell to build my code. So, it is important that I have the right version of JDK present in the PATH and other related environment variables (such as JAVA_HOME). Manually modifying the environment variables every time I want to switch between JDKs, isn’t a happy task. But, thanks to Windows Powershell, I’m able to write a scriplet that can do the heavy-lifting for me. Basically, what I want to achieve is to set PATH variable to add Java bin folder and set the JAVA_HOME environment variable and then launch the correct Eclipse IDE. And, I want to do this with a single command. Let’s do it. Open a Windows Powershell. I prefer writing custom Windows scripts in my profile file so that it is available to run when ever I open the shell. To edit the profile, run this command: notepad.exe $profile - the $profile is a special variable that points to your profile file. Write the below script in the profile file and save it. function myIDE{ $env:Path += "C:\vraa\java\jdk7\bin;" $env:JAVA_HOME = "C:\vraa\java\jdk7" C:\vraa\ide\eclipse\eclipse set-location C:\vraa\workspace\myproject play } function officeIDE{ $env:Path += "C:\vraa\java\jdk6\bin;" $env:JAVA_HOME = "C:\vraa\java\jdk6" C:\office\eclipse\eclipse } Close and restart the Powershell. Now you can issue the command myIDE which will set the proper PATH and environment variables and then launch the eclipse IDE. As you can see, there are two functions with different configurations. Just call the function name that you want to launch from the Powershell command line (myIDE or officeIDE).
August 4, 2012
by Veera Sundar
· 19,911 Views
article thumbnail
Everything You Need To Know About Couchbase Architecture
After receiving a lot of good feedback and comment on my last blog on MongoDb, I was encouraged to do another deep dive on another popular document oriented db; Couchbase. I have been a long-time fan CouchDb and has wrote a blog on it many years ago. After it merges with Membase, I am very excited to take a deep look into it again. Couchbase is the merge of two popular NOSQL technologies: Membase, which provides persistence, replication, sharding to the high performance memcached technology CouchDB, which pioneers the document oriented model based on JSON Like other NOSQL technologies, both Membase and CouchDB are built from the ground up on a highly distributed architecture, with data shard across machines in a cluster. Built around the Memcached protocol, Membase provides an easy migration to existing Memcached users who want to add persistence, sharding and fault resilience on their familiar Memcached model. On the other hand, CouchDB provides first class support for storing JSON documents as well as a simple RESTful API to access them. Underneath, CouchDB also has a highly tuned storage engine that is optimized for both update transaction as well as query processing. Taking the best of both technologies, Membase is well-positioned in the NOSQL marketplace. Programming model Couchbase provides client libraries for different programming languages such as Java / .NET / PHP / Ruby / C / Python / Node.js For read, Couchbase provides a key-based lookup mechanism where the client is expected to provide the key, and only the server hosting the data (with that key) will be contacted. Couchbase also provides a query mechanism to retrieve data where the client provides a query (for example, range based on some secondary key) as well as the view (basically the index). The query will be broadcasted to all servers in the cluster and the result will be merged and sent back to the client. For write, Couchbase provides a key-based update mechanism where the client sends in an updated document with the key (as doc id). When handling write request, the server will return to client’s write request as soon as the data is stored in RAM on the active server, which offers the lowest latency for write requests. Following is the core API that Couchbase offers. (in an abstract sense) # Get a document by key doc = get(key) # Modify a document, notice the whole document # need to be passed in set(key, doc) # Modify a document when no one has modified it # since my last read casVersion = doc.getCas() cas(key, casVersion, changedDoc) # Create a new document, with an expiration time # after which the document will be deleted addIfNotExist(key, doc, timeToLive) # Delete a document delete(key) # When the value is an integer, increment the integer increment(key) # When the value is an integer, decrement the integer decrement(key) # When the value is an opaque byte array, append more # data into existing value append(key, newData) # Query the data results = query(viewName, queryParameters) In Couchbase, document is the unit of manipulation. Currently Couchbase doesn't support server-side execution of custom logic. Couchbase server is basically a passive store and unlike other document oriented DB, Couchbase doesn't support field-level modification. In case of modifying documents, client need to retrieve documents by its key, do the modification locally and then send back the whole (modified) document back to the server. This design tradeoff network bandwidth (since more data will be transferred across the network) for CPU (now CPU load shift to client). Couchbase currently doesn't support bulk modification based on a condition matching. Modification happens only in a per document basis. (client will save the modified document one at a time). Transaction Model Similar to many NOSQL databases, Couchbase’s transaction model is primitive as compared to RDBMS. Atomicity is guaranteed at a single document and transactions that span update of multiple documents are unsupported. To provide necessary isolation for concurrent access, Couchbase provides a CAS (compare and swap) mechanism which works as follows … When the client retrieves a document, a CAS ID (equivalent to a revision number) is attached to it. While the client is manipulating the retrieved document locally, another client may modify this document. When this happens, the CAS ID of the document at the server will be incremented. Now, when the original client submits its modification to the server, it can attach the original CAS ID in its request. The server will verify this ID with the actual ID in the server. If they differ, the document has been updated in between and the server will not apply the update. The original client will re-read the document (which now has a newer ID) and re-submit its modification. Couchbase also provides a locking mechanism for clients to coordinate their access to documents. Clients can request a LOCK on the document it intends to modify, update the documents and then releases the LOCK. To prevent a deadlock situation, each LOCK grant has a timeout so it will automatically be released after a period of time. Deployment Architecture In a typical setting, a Couchbase DB resides in a server clusters involving multiple machines. Client library will connect to the appropriate servers to access the data. Each machine contains a number of daemon processes which provides data access as well as management functions. The data server, written in C/C++, is responsible to handle get/set/delete request from client. The Management server, written in Erlang, is responsible to handle the query traffic from client, as well as manage the configuration and communicate with other member nodes in the cluster. Virtual Buckets The basic unit of data storage in Couchbase DB is a JSON document (or primitive data type such as int and byte array) which is associated with a key. The overall key space is partitioned into 1024 logical storage unit called "virtual buckets" (or vBucket). vBucket are distributed across machines within the cluster via a map that is shared among servers in the cluster as well as the client library. High availability is achieved through data replication at the vBucket level. Currently Couchbase supports one active vBucket zero or more standby replicas hosted in other machines. Curremtly the standby server are idle and not serving any client request. In future version of Couchbase, the standby replica will be able to serve read request. Load balancing in Couchbase is achieved as follows: Keys are uniformly distributed based on the hash function When machines are added and removed in the cluster. The administrator can request a redistribution of vBucket so that data are evenly spread across physical machines. Management Server Management server performs the management function and co-ordinate the other nodes within the cluster. It includes the following monitoring and administration functions Heartbeat: A watchdog process periodically communicates with all member nodes within the same cluster to provide Couchbase Server health updates. Process monitor: This subsystem monitors execution of the local data manager, restarting failed processes as required and provide status information to the heartbeat module. Configuration manager: Each Couchbase Server node shares a cluster-wide configuration which contains the member nodes within the cluster, a vBucket map. The configuration manager pull this config from other member nodes at bootup time. Within a cluster, one node’s Management Server will be elected as the leader which performs the following cluster-wide management function Controls the distribution of vBuckets among other nodes and initiate vBucket migration Orchestrates the failover and update the configuration manager of member nodes If the leader node crashes, a new leader will be elected from surviving members in the cluster. When a machine in the cluster has crashed, the leader will detect that and notify member machines in the cluster that all vBuckets hosted in the crashed machine is dead. After getting this signal, machines hosting the corresponding vBucket replica will set the vBucket status as “active”. The vBucket/server map is updated and eventually propagated to the client lib. Notice that at this moment, the replication level of the vBucket will be reduced. Couchbase doesn’t automatically re-create new replicas which will cause data copying traffic. Administrator can issue a command to explicitly initiate a data rebalancing. The crashed machine, after reboot can rejoin the cluster. At this moment, all the data it stores previously will be completely discard and the machine will be treated as a brand new empty machine. As more machines are put into the cluster (for scaling out), vBucket should be redistributed to achieve a load balance. This is currently triggered by an explicit command from the administrator. Once receive the “rebalance” command, the leader will compute the new provisional map which has the balanced distribution of vBuckets and send this provisional map to all members of the cluster. To compute the vBucket map and migration plan, the leader attempts the following objectives: Evenly distribute the number of active vBuckets and replica vBuckets among member nodes. Place the active copy and each replicas in physically separated nodes. Spread the replica vBucket as wide as possible among other member nodes. Minimize the amount of data migration Orchestrate the steps of replica redistribution so no node or network will be overwhelmed by the replica migration. Once the vBucket maps is determined, the leader will pass the redistribution map to each member in the cluster and coordinate the steps of vBucket migration. The actual data transfer happens directly between the origination node to the destination node. Notice that since we have generally more vBuckets than machines. The workload of migration will be evenly distributed automatically. For example, when new machines are added into the clusters, all existing machines will migrate some portion of its vBucket to the new machines. There is no single bottleneck in the cluster. Throughput the migration and redistribution of vBucket among servers, the life cycle of a vBucket in a server will be in one of the following states “Active”: means the server is hosting the vBucket is ready to handle both read and write request “Replica”: means the server is hosting the a copy of the vBucket that may be slightly out of date but can take read request that can tolerate some degree of outdate. “Pending”: means the server is hosting a copy that is in a critical transitional state. The server cannot take either read or write request at this moment. “Dead”: means the server is no longer responsible for the vBucket and will not take either read or write request anymore. Data Server Data server implements the memcached APIs such as get, set, delete, append, prepend, etc. It contains the following key datastructure: One in-memory hashtable (key by doc id) for the corresponding vBucket hosted. The hashtable acts as both a metadata for all documents as well as a cache for the document content. Maintain the entry gives a quick way to detect whether the document exists on disk. To support async write, there is a checkpoint linkedlist per vBucket holding the doc id of modified documents that hasn't been flushed to disk or replicated to the replica. To handle a "GET" request Data server routes the request to the corresponding ep-engine responsible for the vBucket. The ep-engine will lookup the document id from the in-memory hastable. If the document content is found in cache (stored in the value of the hashtable), it will be returned. Otherwise, a background disk fetch task will be created and queued into the RO dispatcher queue. The RO dispatcher then reads the value from the underlying storage engine and populates the corresponding entry in the vbucket hash table. Finally, the notification thread notifies the disk fetch completion to the memcached pending connection, so that the memcached worker thread can revisit the engine to process a get request. To handle a "SET" request, a success response will be returned to the calling client once the updated document has been put into the in-memory hashtable with a write request put into the checkpoint buffer. Later on the Flusher thread will pickup the outstanding write request from each checkpoint buffer, lookup the corresponding document content from the hashtable and write it out to the storage engine. Of course, data can be lost if the server crashes before the data has been replicated to another server and/or persisted. If the client requires a high data availability across different crashes, it can issue a subsequent observe() call which blocks on the condition that the server persist data on disk, or the server has replicated the data to another server (and get its ACK). Overall speaking, the client has various options to tradeoff data integrity with throughput. Hashtable Management To synchronize accesses to a vbucket hash table, each incoming thread needs to acquire a lock before accessing a key region of the hash table. There are multiple locks per vbucket hash table, each of which is responsible for controlling exclusive accesses to a certain ket region on that hash table. The number of regions of a hash table can grow dynamically as more documents are inserted into the hash table. To control the memory size of the hashtable, Item pager thread will monitor the memory utilization of the hashtable. Once a high watermark is reached, it will initiate an eviction process to remove certain document content from the hashtable. Only entries that is not referenced by entries in the checkpoint buffer can be evicted because otherwise the outstanding update (which only exists in hashtable but not persisted) will be lost. After eviction, the entry of the document still remains in the hashtable; only the document content of the document will be removed from memory but the metadata is still there. The eviction process stops after reaching the low watermark. The high / low water mark is determined by the bucket memory quota. By default, the high water mark is set to 75% of bucket quota, while the low water mark is set to 60% of bucket quota. These water marks can be configurable at runtime. In CouchDb, every document is associated with an expiration time and will be deleted once it is expired. Expiry pager is responsible for tracking and removing expired document from both the hashtable as well as the storage engine (by scheduling a delete operation). Checkpoint Manager Checkpoint manager is responsible to recycle the checkpoint buffer, which holds the outstanding update request, consumed by the two downstream processes, Flusher and TAP replicator. When all the request in the checkpoint buffer has been processed, the checkpoint buffer will be deleted and a new one will be created. TAP Replicator TAP replicator is responsible to handle vBucket migration as well as vBucket replication from active server to replica server. It does this by propagating the latest modified document to the corresponding replica server. At the time a replica vBucket is established, the entire vBucket need to be copied from the active server to the empty destination replica server as follows The in-memory hashtable at the active server will be transferred to the replica server. Notice that during this period, some data may be updated and therefore the data set transfered to the replica can be inconsistent (some are the latest and some are outdated). Nevertheless, all updates happen after the start of transfer is tracked in the checkpoint buffer. Therefore, after the in-memory hashtable transferred is completed, the TAP replicator can pickup those updates from the checkpoint buffer. This ensures the latest versioned of changed documents are sent to the replica, and hence fix the inconsistency. However the hashtable cache doesn’t contain all the document content. Data also need to be read from the vBucket file and send to the replica. Notice that during this period, update of vBucket will happen in active server. However, since the file is appended only, subsequent data update won’t interfere the vBucket copying process. After the replica server has caught up, subsequent update at the active server will be available at its checkpoint buffer which will be pickup by the TAP replicator and send to the replica server. CouchDB Storage Structure Data server defines an interface where different storage structure can be plugged-in. Currently it supports both a SQLite DB as well as CouchDB. Here we describe the details of CouchDb, which provides a super high performance storage mechanism underneath the Couchbase technology. Under the CouchDB structure, there will be one file per vBucket. Data are written to this file in an append-only manner, which enables Couchbase to do mostly sequential writes for update, and provide the most optimized access patterns for disk I/O. This unique storage structure attributes to Couchbase’s fast on-disk performance for write-intensive applications. The following diagram illustrate the storage model and how it is modified by 3 batch updates (notice that since updates are asynchronous, it is perform by "Flusher" thread in batches). The Flusher thread works as follows: 1) Pick up all pending write request from the dirty queue and de-duplicate multiple update request to the same document. 2) Sort each request (by key) into corresponding vBucket and open the corresponding file 3) Append the following into the vBucket file (in the following contiguous sequence) All document contents in such write request batch. Each document will be written as [length, crc, content] one after one sequentially. The index that stores the mapping from document id to the document’s position on disk (called the BTree by-id) The index that stores the mapping from update sequence number to the document’s position on disk. (called the BTree by-seq) The by-id index plays an important role for looking up the document by its id. It is organized as a B-Tree where each node contains a key range. To lookup a document by id, we just need to start from the header (which is the end of the file), transfer to the root BTree node of the by-id index, and then further traverse to the leaf BTree node that contains the pointer to the actual document position on disk. During the write, the similar mechanism is used to trace back to the corresponding BTree node that contains the id of the modified documents. Notice that in the append-only model, update is not happening in-place, instead we located the existing location and copy it over by appending. In other words, the modified BTree node will be need to be copied over and modified and finally paste to the end of file, and then its parent need to be modified to point to the new location, which triggers the parents to be copied over and paste to the end of file. Same happens to its parents’ parent and eventually all the way to the root node of the BTree. The disk seek can be at the O(logN) complexity. The by-seq index is used to keep track of the update sequence of lived documents and is used for asynchronous catchup purposes. When a document is created, modified or deleted, a sequence number is added to the by-seq btree and the previous seq node will be deleted. Therefore, for cross-site replication, view index update and compaction, we can quickly locate all the lived documents in the order of their update sequence. When a vBucket replicator asks for the list of update since a particular time, it provides the last sequence number in previous update, the system will then scan through the by-seq BTree node to locate all the document that has sequence number larger than that, which effectively includes all the document that has been modified since the last replication. As time goes by, certain data becomes garbage (see the grey-out region above) and become unreachable in the file. Therefore, we need a garbage collection mechanism to clean up the garbage. To trigger this process, the by-id and by-seq B-Tree node will keep track of the data size of lived documents (those that is not garbage) under its substree. Therefore, by examining the root BTree node, we can determine the size of all lived documents within the vBucket. When the ratio of actual size and vBucket file size fall below a certain threshold, a compaction process will be triggered whose job is to open the vBucket file and copy the survived data to another file. Technically, the compaction process opens the file and read the by-seq BTree at the end of the file. It traces the Btree all the way to the leaf node and copy the corresponding document content to the new file. The compaction process happens while the vBucket is being updated. However, since the file is appended only, new changes are recorded after the BTree root that the compaction has opened, so subsequent data update won’t interfere with the compaction process. When the compaction is completed, the system need to copy over the data that was appended since the beginning of the compaction to the new file. View Index Structure Unlike most indexing structure which provide a pointer from the search attribute back to the document. The CouchDb index (called View Index) is better perceived as a denormalized table with arbitrary keys and values loosely associated to the document. Such denormalized table is defined by a user-provided map() and reduce() function. map = function(doc) { … emit(k1, v1) … emit(k2, v2) … } reduce = function(keys, values, isRereduce) { if (isRereduce) { // Do the re-reduce only on values (keys will be null) } else { // Do the reduce on keys and values } // result must be ready for input values to re-reduce return result } Whenever a document is created, updated, deleted, the corresponding map(doc) function will be invoked (in an asynchronous manner) to generate a set of key/value pairs. Such key/value will be stored in a B-Tree structure. All the key/values pairs of each B-Tree node will be passed into the reduce() function, which compute an aggregated value within that B-Tree node. Re-reduce also happens in non-leaf B-Tree nodes which further aggregate the aggregated value of child B-Tree nodes. The management server maintains the view index and persisted it to a separate file. Create a view index is perform by broadcast the index creation request to all machines in the cluster. The management process of each machine will read its active vBucket file and feed each surviving document to the Map function. The key/value pairs emitted by the Map function will be stored in a separated BTree index file. When writing out the BTree node, the reduce() function will be called with the list of all values in the tree node. Its return result represent a partially reduced value is attached to the BTree node. The view index will be updated incrementally as documents are subsequently getting into the system. Periodically, the management process will open the vBucket file and scan all documents since the last sequence number. For each changed document since the last sync, it invokes the corresponding map function to determine the corresponding key/value into the BTree node. The BTree node will be split if appropriate. Underlying, Couchbase use a back index to keep track of the document with the keys that it previously emitted. Later when the document is deleted, it can look up the back index to determine what those key are and remove them. In case the document is updated, the back index can also be examined; semantically a modification is equivalent to a delete followed by an insert. The following diagram illustrates how the view index file will be incrementally updated via the append-only mechanism. Query Processing Query in Couchbase is made against the view index. A query is composed of the view name, a start key and end key. If the reduce() function isn’t defined, the query result will be the list of values sorted by the keys within the key range. In case the reduce() function is defined, the query result will be a single aggregated value of all keys within the key range. If the view has no reduce() function defined, the query processing proceeds as follows: Client issue a query (with view, start/end key) to the management process of any server (unlike a key based lookup, there is no need to locate a specific server). The management process will broadcast the request to other management process on all servers (include itself) within the cluster. Each management process (after receiving the broadcast request) do a local search for value within the key range by traversing the BTree node of its view file, and start sending back the result (automatically sorted by the key) to the initial server. The initial server will merge the sorted result and stream them back to the client. However, if the view has reduce() function defined, the query processing will involve computing a single aggregated value as follows: Client issue a query (with view, start/end key) to the management process of any server (unlike a key based lookup, there is no need to locate a specific server). The management process will broadcast the request to other management process on all servers (include itself) within the cluster. Each management process do a local reduce for value within the key range by traversing the BTree node of its view file to compute the reduce value of the key range. If the key range span across a BTree node, the pre-computed of the sub-range can be used. This way, the reduce function can reuse a lot of partially reduced values and doesn’t need to recomputed every value of the key range from scratch. The original server will do a final re-reduce() in all the return value from each other servers, and then passed back the final reduced value to the client. To illustrate the re-reduce concept, lets say the query has its key range from A to F. Instead of calling reduce([A,B,C,D,E,F]), the system recognize the BTree node that contains [B,C,D] has been pre-reduced and the result P is stored in the BTree node, so it only need to call reduce(A,P,E,F). Update View Index as vBucket migrates Since the view index is synchronized with the vBuckets in the same server, when the vBucket has migrated to a different server, the view index is no longer correct; those key/value that belong to a migrated vBucket should be discarded and the reduce value cannot be used anymore. To keep track of the vBucket and key in the view index, each bTree node has a 1024-bitmask indicating all the vBuckets that is covered in the subtree (ie: it contains a key emitted from a document belonging to the vBucket). Such bit-mask is maintained whenever the bTree node is updated. At the server-level, a global bitmask is used to indicate all the vBuckets that this server is responsible for. In processing the query of the map-only view, before the key/value pair is returned, an extra check will be perform for each key/value pair to make sure its associated vBucket is what this server is responsible for. When processing the query of a view that has a reduce() function, we cannot use the pre-computed reduce value if the bTree node contains a vBucket that the server is not responsible for. In this case, the bTree node’s bit mask is compared with the global bit mask. In case if they are not aligned, then the reduce value need to be recomputed. Here is an example to illustrate this process Couchbase is one of the popular NOSQL technology built on a solid technology foundation designed for high performance. In this post, we have examined a number of such key features: Load balancing between servers inside a cluster that can grow and shrink according to workload conditions. Data migration can be used to re-achieve workload balance. Asynchronous write provides lowest possible latency to client as it returns once the data is store in memory. Append-only update model pushes most update transaction into sequential disk access, hence provide extremely high throughput for write intensive applications. Automatic compaction ensures the data lay out on disk are kept optimized all the time. Map function can be used to pre-compute view index to enable query access. Summary data can be pre-aggregated using the reduce function. Overall, this cut down the workload of query processing dramatically. For a review on NOSQL architecture in general and some theoretical foundation, I have wrote a NOSQL design pattern blog, as well as some fundamental difference between SQL and NOSQL. For other NOSQL technologies, please read my other blog on MongoDb, Cassandra and HBase, Memcached Special thanks to Damien Katz and Frank Weigel from Couchbase team who provide a lot of implementation details of Couchbase.
July 7, 2012
by Ricky Ho
· 84,657 Views · 5 Likes
article thumbnail
Using Cookies to implement a RememberMe functionality
Some web applications may need a "Remember Me" functionality. This means that, after a user login, user will have access from same machine to all its data even after session expired. This access will be possible until user does a logout. If you are using Spring and its login form, then you should use "Remember Me" functionality already implemented inside the framework. Some web frameworks also offer a type of SignIn panel which already has "remember me" built-in. But in case you have to implement "Remember Me" functionality by your own, this can be easily achieved using Cookies. Java has a Cookie class named javax.servlet.http.Cookie. Algorithm is straight-forward: your login panel must contain a "Remember Me" check after a succesfull login with "Remember Me" check selected, you can create two cookies: one to keep the value for rememberMe and one to keep a token which has to identify the logged user. For sake of security, this token must never contain user name or user password. The ideea is to generate a random id as token value. And token value aside with user id must be saved in your storage (database) whenever a login is needed, you have to look if there is any cookie saved by you, and if so and your "rememberMe" value is true, you can take the user from storage based on your token and do an automatic login. when a logout is done, you have to delete the cookie that keeps the token To add a cookie, you have to specify the maximum age of the cookie in seconds : HttpServletResponse servletResponse = ...; Cookie c = new Cookie(COOKIE_NAME, encodeString(uuid)); c.setMaxAge(365 * 24 * 60 * 60); // one year servletResponse.addCookie(c); To delete a cookie, you have to find cookie by name and set its maximum age to 0, before adding it to servlet response: HttpServletRequest servletRequest = ...; HttpServletResponse servletResponse = ... ; Cookie[] cookies = servletRequest.getCookies(); for (int i = 0; i < cookies.length; i++) { Cookie c = cookies[i]; if (c.getName().equals(COOKIE_NAME)) { c.setMaxAge(0); c.setValue(null); servletResponse.addCookie(c); } }
June 26, 2012
by Mihai Dinca - Panaitescu
· 58,945 Views · 1 Like
article thumbnail
Algorithm of the Week: How to Determine the Day of the Week
Do you know what day of the week was the day you were born? Monday or maybe Saturday? Well, perhaps you know that. Everybody knows the day he’s born on, but do you know what day was the 31st of January in 1883? No? Well, there must be some method to determine any day in any century. We know that 2012 started at Sunday. After we know that, it’s easy to determine what day is the 2nd of January. It should be Monday. But things get a little more complex if we try to guess some date distant from January the 1st. Indeed 1st of Jan was on Sunday, but what day is 9th of May the same year. This is far more difficult to say. Of course we can go with a brute force approach and count from 1/Jan till 9/May, but that is quite slow and error prone. So what would we do if we had to code a program that answers this question? The easiest way is to use a library. Almost every major library has built-in functions that can answer what day is on a given date. Such are date() in PHP or getDate() in JavaScript. But the question remains: How these library functions know the answer and how can we code such library functions if our library doesn’t support such functionality? There must be some algorithm to help us. Overview Because months have different number of days, and most of them aren’t divisible by 7 without a remainder, months begin on different days of the week. Thus, if January begins on Sunday, the month of February the same year will begin on Wednesday. Of course, in common years February has 28 days, which fortunately is divisible by 7 and thus February and March both begin on the same day, which is great, but isn’t true for leap years. What Do We Know About the Calendar First thing to know is that each week has exactly 7 days. We also know that a common year has 365 days, while a leap year has one day more – 366. Most of the months have 30 or 31 days, but February has only 28 days in common years and 29 in leap years. Because 365 mod 7 = 1 in a common year each year begins exactly on the next day of the preceding year. Thus if 2011 started on Saturday, 2012 starts on Sunday. And yet again, that is because 2011 is not a leap year. What else do we know? Because a week has exactly seven days only February (with its 28 days in a common year) is divisible by 7 (28 mod 7 = 0) and has exactly four weeks in it. Thus in a common year February and March start on a same day. Unfortunately that is not true about the other months. All these things we know about the calendar are great, so we can make some conclusions. Although eleven of the months have either 30 or 31 days they don’t start on a same day, but some of the months do appear to start on a same day just because the number of days between them is divisible by 7 without a remainder. Let’s take a look on some examples. For instance September has 30 days, as does November, while October, which is in between them has 31 days. Thus 30+30+31 makes 91. Fortunately 91 mod 7 = 0. So for each year September and December start on the same day (as they are after February they don’t depend on leap years). The same thing occurs to April and July and the good news is that in leap years even January starts on the same day as April and July. Now we know that there are some relations between months. Thus, if we know somehow that the 13th of April is Monday, we’ll be sure that 13th of July is also Monday. Let’s see now a summary of these observations. We can also refer to the following diagram. For leap years there are other corresponding months. Let’s take a look at the following image. Another way to get the same information is the following table. We also know that leap years happen to occur once every four years. However, if there is a common year like the year 2001, which will be the next year that is common and starts and corresponds exactly on 2001? Because of leap years we can have a year starting on one of the seven days of the week and to be either leap or common. This means just 14 combinations. Following these observations we can refer to the following table. You can clearly see the pattern “6 4 2 0” Here’s the month table. Columns 2 and 3 differs only for January and February. Clearly the day table is as follows: Now let’s go back to the algorithm. Using these tables and applying a simple formula, we can calculate what day was on some given date. Here are the steps of this algorithm. Get the number for the corresponding century from the centuries table; Get the last two digits from the year; Divide the number from step 2 by 4 and get it without the remainder; Get the month number from the month table; Sum the numbers from steps 1 to 4; Divide it by 7 and take the remainder; Find the result of step 6 in the days table; Implementation First let’s take a look at a simple and practical example of the example above and then the code. Let’s answer the question from the first paragraph of this post. What day was on January 31st, 1883? Take a look at the centuries table: for 1800 – 1899 this is 2. Get the last two digits from the year: 83. Divide 83 by 4 without a remainder: 83/4 = 20 Get the month number from the month table: Jan = 0. Sum the numbers from steps 1 to 4: 2 + 83 + 20 + 0 = 105. Divide it by 7 and take the remainder: 105 mod 7 = 0 Find the result of step 6 in the days table: Sunday = 0. The following code in PHP implements the algorithm above. function get_century_code($century) { // XVIII if (1700 <= $century && $century <= 1799) return 4; // XIX if (1800 <= $century && $century <= 1899) return 2; // XX if (1900 <= $century && $century <= 1999) return 0; // XXI if (2000 <= $century && $century <= 2099) return 6; // XXII if (2100 <= $century && $century <= 2199) return 4; // XXIII if (2200 <= $century && $century <= 2299) return 2; // XXIV if (2300 <= $century && $century <= 2399) return 0; // XXV if (2400 <= $century && $century <= 2499) return 6; // XXVI if (2500 <= $century && $century <= 2599) return 4; // XXVII if (2600 <= $century && $century <= 2699) return 2; } /** * Get the day of a given date * * @param $date */ function get_day_from_date($date) { $months = array( 1 => 0,// January 2 => 3,// February 3 => 3,// March 4 => 6,// April 5 => 1,// May 6 => 4,// June 7 => 6,// July 8 => 2,// August 9 => 5,// September 10 => 0,// October 11 => 3,// November 12 => 5,// December ); $days = array( 0 => 'Sunday', 1 => 'Monday', 2 => 'Tuesday', 3 => 'Wednesday', 4 => 'Thursday', 5 => 'Friday', 6 => 'Saturday', ); // calculate the date $dateParts = explode('-', $date); $century = substr($dateParts[2], 0, 2); $year = substr($dateParts[2], 2); // 1. Get the number for the corresponding century from the centuries table $a = get_century_code($dateParts[2]); // 2. Get the last two digits from the year $b = $year; // 3. Divide the number from step 2 by 4 and get it without the remainder $c = floor($year / 4); // 4. Get the month number from the month table $d = $months[$dateParts[1]]; // 5. Sum the numbers from steps 1 to 4 $e = $a + $b + $c + $d; // 6. Divide it by 7 and take the remainder $f = $e % 7; // 7. Find the result of step 6 in the days table return $days[$f]; } // Sunday echo get_day_from_date('31-1-1883'); Application This algorithm can be applied in many different cases although most of the libraries have built-in functions that can do that. The only problem besides that is that there are much more efficient algorithms that don’t need additional space (tables) of data. However this algorithm isn’t difficult to implement and it gives a good outlook of some facts in the calendar.
April 24, 2012
by Stoimen Popov
· 61,703 Views · 1 Like
article thumbnail
Algorithm of the Week: Rabin-Karp String Searching
Brute force string matching is a very basic sub-string matching algorithm, but it’s good for some reasons. For example it doesn’t require preprocessing of the text or the pattern. The problem is that it’s very slow. That is why in many cases brute force matching can’t be very useful. For pattern matching we need something faster, but to understand other sub-string matching algorithms let’s take a look once again at brute force matching. In brute force sub-string matching we checked every single character from the text with the first character of the pattern. Once we have a match between them we shift the comparison between the second character of the pattern with the next character of the text, as shown on the picture below. This algorithm is slow for mainly two reasons. First, we have to check every single character from the text. On the other hand even if we find a match between a text character and the first character of the pattern we continue to check step by step (character by character) every single symbol of the pattern in order to find whether it is in the text. So is there any other approach to find whether the text contains the pattern? In fact there is a “faster” approach. In this case, in order to avoid the comparison between the pattern and the text character by character, we’ll try to compare them all at once, so we need a good hash function. With its help we can hash the pattern and check against hashed sub-strings of the text. We must be sure that the hash function is returning “small” hash codes for larger sub-strings. Another problem is that for larger patterns we can’t expect to have short hashes. But besides this the approach should be quite effective compared to the brute force string matching. This approach is known as Rabin-Karp algorithm. Overview Michael O. Rabin and Richard M. Karp came up with the idea of hashing the pattern and to check it against a hashed sub-string from the text in 1987. In general the idea seems quite simple, the only thing is that we need a hash function that gives different hashes for different sub-strings. Said hash function, for instance, may use the ASCII codes for every character, but we must be careful for multi-lingual support. The hash function may vary depending on many things, so it may consist of ASCII char to number converting, but it can also be anything else. The only thing we need is to convert a string (pattern) into some hash that is faster to compare. Let’s say we have the string “hello world”, and let’s assume that its hash is hash(‘hello world’) = 12345. So if hash(‘he’) = 1 we can say that the pattern “he” is contained in the text “hello world”. So in every step, we take from the text a sub-string with the length of m, where m is the pattern length. Thus we hash this sub-string and we can directly compare it to the hashed pattern, as in the picture above. Implementation So far we saw some diagrams explaining the Rabin-Karp algorithm, but let’s take a look at its implementation here, in this very basic example where a simple hash table is used in order to convert the characters into integers. The code is PHP and it’s used only to illustrate the principles of this algorithm. function hash_string($str, $len) { $hash = ''; $hash_table = array( 'h' => 1, 'e' => 2, 'l' => 3, 'o' => 4, 'w' => 5, 'r' => 6, 'd' => 7, ); for ($i = 0; $i < $len; $i++) { $hash .= $hash_table[$str{$i}]; } return (int)$hash; } function rabin_karp($text, $pattern) { $n = strlen($text); $m = strlen($pattern); $text_hash = hash_string(substr($text, 0, $m), $m); $pattern_hash = hash_string($pattern, $m); for ($i = 0; $i < $n-$m+1; $i++) { if ($text_hash == $pattern_hash) { return $i; } $text_hash = hash_string(substr($text, $i, $m), $m); } return -1; } // 2 echo rabin_karp('hello world', 'ello'); Multiple Pattern Match It’s great to say that the Rabin-Karp algorithm is great for multiple pattern match. Indeed its nature is supposed to support such functionality, which is its advantage in comparison to other string searching algorithms. Complexity The Rabin-Karp algorithm has the complexity of O(nm) where n, of course, is the length of the text, while m is the length of the pattern. So where is it compared to brute-force matching? Well, brute force matching complexity is O(nm), so as it seems there’s not much of a gain in performance. However, it’s considered that Rabin-Karp’s complexity is O(n+m) in practice, and that makes it a bit faster, as shown on the chart below. Note that the Rabin-Karp algorithm also needs O(m) preprocessing time. Application As we saw Rabin-Karp is not much faster than brute force matching. So where we should use it? 3 Reasons Why Rabin-Karp is Cool 1. Good for plagiarism, because it can deal with multiple pattern matching! 2. Not faster than brute force matching in theory, but in practice its complexity is O(n+m)! 3. With a good hashing function it can be quite effective and it’s easy to implement! 2 Reasons Why Rabin-Karp is Not Cool 1. There are lots of string matching algorithms that are faster than O(n+m) 2. It’s practically as slow as brute force matching and it requires additional space Final Words Rabin-Karp is a great algorithm for one simple reason – it can be used to match against multiple patterns. This makes it perfect to detect plagiarism even for larger phrases.
April 3, 2012
by Stoimen Popov
· 36,695 Views
article thumbnail
Using "Natural": A NLP Module for node.js
Like most node modules "natural" is packaged as an NPM and can be installed from the command line with node.js.
March 27, 2012
by Christopher Umbel
· 63,962 Views · 3 Likes
article thumbnail
Algorithm of the Week: Brute Force String Matching
String matching is something crucial for database development and text processing software. Fortunately, every modern programming language and library is full of functions for string processing that help us in our everyday work. However it's important to understand their principles. String algorithms can typically be divided into several categories. One of these categories is string matching. When it comes to string matching, the most basic approach is what is known as brute force, which simply means to check every single character from the text to match against the pattern. In general we have a text and a pattern (most commonly shorter than the text). What we need to do is to answer the question whether this pattern appears in the text. Overview The principles of brute force string matching are quite simple. We must check for a match between the first characters of the pattern with the first character of the text as on the picture bellow. If they don’t match, we move forward to the second character of the text. Now we compare the first character of the pattern with the second character of the text. If they don’t match again, we move forward until we get a match or until we reach the end of the text. In case they match, we move forward to the second character of the pattern comparing it with the “next” character of the text, as shown in the picture bellow. Just because we have found a match between the first character from the pattern and some character of the text, doesn’t mean that the pattern appears in the text. We must move forward to see whether the full pattern is contained in the text. Implementation Implementation of brute force string matching is easy and here we can see a short PHP example. The bad news is that this algorithm is naturally quite slow. function sub_string($pattern, $subject) { $n = strlen($subject); $m = strlen($pattern); for ($i = 0; i < $n-$m; $i++) { $j = 0; while ($j < $m && $subject[$i+$j] == $pattern[$j]) { $j++; } if ($j == $m) return $i; } return -1; } echo sub_string('o wo', 'hello world!'); Complexity As I said this algorithm is slow. Actually every algorithm that contains “brute force” in its name is slow, but to show how slow string matching is, I can say that its complexity is O(n.m). Here n is the length of the text, while m is the length of the pattern. In case we fix the length of the text and test against variable length of the pattern, again we get a rapidly growing function. Application Brute force string matching can be very ineffective, but it can also be very handy in some cases. Just like the sequential search. It can be very useful… Doesn’t require pre-processing of the text – Indeed if we search the text only once we don’t need to pre-process it. Most of the algorithms for string matching need to build an index of the text in order to search quickly. This is great when you’ve to search more than once into a text, but if you do only once, perhaps (for short texts) brute force matching is great! Doesn’t require additional space – Because brute force matching doesn’t need pre-processing it also doesn’t require more space, which is one cool feature of this algorithm Can be quite effective for short texts and patterns It can be ineffective… If we search the text more than once – As I said in the previous section if you perform the search more than once it’s perhaps better to use another string matching algorithm that builds an index, and it’s faster. It’s slow – In general brute force algorithms are slow and brute force matching isn’t an exception. Final Words String matching is something very special in software development and it is used in various cases, so every developer must be familiar with this topic.
March 27, 2012
by Stoimen Popov
· 61,846 Views · 3 Likes
article thumbnail
Writing a simple named pipes server in C#
I solved a little problem last night when playing with named pipes. I created a named pipe that writes all output to a file. Named pipes are opened for all users on a single machine. In this post I will show you a simple class that works as a pipe server. In .NET-based languages we can use the System.IO.Pipes namespace classes to work with named pipes. Here is my simple pipe server that writes all client output to file. public class MyPipeServer { public void Run() { var sid = new SecurityIdentifier(WellKnownSidType.WorldSid, null); var rule = new PipeAccessRule(sid, PipeAccessRights.ReadWrite, AccessControlType.Allow); var sec = new PipeSecurity(); sec.AddAccessRule(rule); using (NamedPipeServerStream pipeServer = new NamedPipeServerStream ("testpipe",PipeDirection.InOut, 100, PipeTransmissionMode.Byte, PipeOptions.None, 0, 0, sec)) { pipeServer.WaitForConnection(); var read = 0; var bytes = new byte[4096]; using(var file=File.Open(@"c:\tmp\myfile.dat", FileMode.Create)) while ((read = pipeServer.Read(bytes, 0, bytes.Length)) > 0) { file.Write(bytes, 0, read); file.Flush(); } } } } Real-life pipe scenarios are usually more complex but this simple class is good to get things running like they should be.
March 10, 2012
by Gunnar Peipman
· 30,871 Views
article thumbnail
Configuration Drift
In my previous article on the server lifecycle I mentioned ConfigurationDrift, a term that I’ve either coined, or I’ve forgotten where I originally heard, although most likely I got it from the Puppet Labs folks. Configuration Drift is the phenomenon where running servers in an infrastructure become more and more different as time goes on, due to manual ad-hoc changes and updates, and general entropy. A nice automated server provisioning process as I’ve advocated helps ensure machines are consistent when they are created, but during a given machine’s lifetime it will drift from the baseline, and from the other machines. There are two main methods to combat configuration drift. One is to use automated configuration tools such as Puppet or Chef, and run them frequently and repeatedly to keep machines in line. The other is to rebuild machine instances frequently, so that they don’t have much time to drift from the baseline. The challenge with automated configuration tools is that they only manage a subset of a machine’s state. Writing and maintaining manifests/recipes/scripts is time consuming, so most teams tend to focus their efforts on automating the most important areas of the system, leaving fairly large gaps. There are diminishing returns for trying to close these gaps, where you end up spending inordinate amounts of effort to nail down parts of the system that don’t change very often, and don’t matter very much day to day. On the other hand, if you rebuild machines frequently enough, you don’t need to worry about running configuration updates after provisioning happens. However, this may increase the burden of fairly trivial changes, such as tweaking a web server configuration. In practice, most infrastructures are probably best off using a combination of these methods. Use automated configuration, continuously updated, for the areas of machine configuration where it gives the most benefit, and also ensure that machines are rebuilt frequently. The frequency of rebuilds will vary depending on the nature of the services provided and the infrastructure implementation, and may even vary for different types of machines. For example, machines that provide network services such as DNS may be rebuilt weekly, while those which handle batch processing tasks may be rebuilt on demand. Source: http://kief.com/configuration-drift.html
February 27, 2012
by Kief Morris
· 26,417 Views · 3 Likes
article thumbnail
Algorithm of the Week: Data Compression with Prefix Encoding
Prefix encoding, sometimes called front encoding, is yet another algorithm that tries to remove duplicated data in order to reduce its size. Its principles are simple, however this algorithm tends to be difficult to implement. To understand why, first let’s take a look at its nature. Have a look on the following dictionary. use used useful usefulness uselss uselessly uselessness Instead of keeping all these words in plain text or transferring all them over a network, we can compress (encode) them with prefix encoding. It’s clear that each of these words begins with the prefix “use” which is also the first word from the list. So we can easily compress them into the following array. $data = array( 0 => 'use', 1 => '0d', 2 => '0ful', 3 => '0fully', 4 => '0less', 5 => '0lessly', 6 => '0lessness', ); It’s clear that this is not the best compression and we can go even further by using not only the first word as prefix. $data = array( 0 => 'use', 1 => '0d', 2 => '0ful', 3 => '2ly', 4 => '0less', 5 => '4ly', 6 => '4ness', ); Now the compression is better and the good news is that decompression is a fairly simple process. However the tricky part is compression itself. The problem is that it is quite difficult to chose an appropriate prefix. In our first example this is simple, but most of the time in practice we will have more heterogeneous data. Indeed the process of compression can be very difficult for randomly generated data and the algorithm will be not only slow, but difficult to implement. The good thing is that this algorithm can be used in many cases once we know the data format in advance. So let’s see three examples where this algorithm can be very handy. Application Here are three examples of prefix encoding. As I stated above, the process of compression can be very difficult for random data, so it is a good practice to use it only if you know in advance the format of the input data. Date and time prefixes We humans often skip the first two digits of a year, so for instance we don’t always write 1995 or 1996, but we use the shorter – ‘95 and ‘96. Thus years can be encoded with shorter strings. input: (1991, 1992, 1993, 1994, 1995, 1996 output: (91, 92, 93, 94, 95, 96) The problem is that with small changes of the input stream we can confuse the decoder. Thus if we add years from the 21st century we lose the uniqueness of the data. input: (1998, 1992, 1999, 2011, 2012) output: (98, 92, 99, 11, 12) Now the decoder can decode the last two values as (1911, 1912) as “19” is considered to be the prefix. So we must know in advance that our prefix is absolutely equal for each of the values. If not the encoding format must be different. For instance we can also encode the prefix, with some special marker. input: (1998, 1992, 1932, 1924, 2001, 2012) output: (#19, 98, 92, 32, 24, #20, 01, 12) Once the decoder reads the # character it will know to decode the following number as prefix. This can be used in practice for date and time formats. Let’s say we have some datetime values, but we know that all of them are in the same day. 2012-01-31 15:33:45 2012-01-31 16:12:11 2012-01-31 17:32:35 2012-01-31 18:54:34 Obviously we can omit the date part of these strings and send (keep) only the time. Once again, we must be absolutely sure that all these values are in the same day. If not, we can use the encoding strategy of the previous example. Phone numbers Phone numbers are the typical case of prefix encoding. Not only the international code, but also the mobile network operators use prefixes for their phone numbers. Thus if we have to transfer phone numbers from, let’s say the UK, we can replace the leading “+44” with something shorter. If you happen to code a phone book for a mobile device you can save some space by compressing the data using prefix encoding and thus the user will have more space and will store more phone numbers on his or her mobile device. Phone number prefixes can be also used for database normalization. Thus you can store them in a separate db table and leave only the unique numbers from the phonebook. Geo Coordinates Using the same example from my previous post we can send GEO coordinates by removing a common prefix, for large levels of zoom. Indeed when you need to send lots of markers to your map application you can expect all of these markers to be fairly close to each other in large zoom level. On large zoom levels we can expect markers to have the same prefix. Now the coordinates of those points can have a common prefix, like the example bellow with the Subway stations. LatLon(40.762959,-73.985989) LatLon(40.761886,-73.983629) LatLon(40.762861,-73.981612) LatLon(40.764616,-73.98056) We can see that all of these GEO points have the same prefix (40.76x, -73.98x), so we can send the prefix only once. Prefix: (40.76,-73.98) Data: LatLon(2959,5989) LatLon(1886,3629) LatLon(2861,1612) LatLon(4616,056) These are only three examples of prefix encoding and this algorithm is very useful when transferring homogeneous data. Suffix Encoding Suffix encoding is practically the same algorithm as prefix encoding, with the small difference that we use to encode duplicating suffixes. Like the examples below, suffix encoding can be useful in replacing repeating last name suffixes. Johsnon Clark Jackson Or company names. Apple Inc. Google Inc. Yahoo! Inc. Here we can replace “ Inc.” with something else, but shorter. Related posts: Computer Algorithms: Data Compression with Relative Encoding Computer Algorithms: Data Compression with Run-length Encoding Computer Algorithms: Data Compression with Diagram Encoding and Pattern Substitution Source: http://www.stoimen.com/blog/2012/02/06/computer-algorithms-data-compression-with-prefix-encoding/
February 7, 2012
by Stoimen Popov
· 19,428 Views
article thumbnail
Algorithm of the Week: Data Compression with Relative Encoding
Overview Relative encoding is another data compression algorithm. While run-length encoding, bitmap encoding and diagram and pattern substitution were trying to reduce repeating data, with relative encoding the goal is a bit different. Indeed run-length encoding was searching for long runs of repeating elements, while pattern substitution and bitmap encoding were trying to “map” where the repetitions happen to occur. The only problem with these algorithms is that the input stream of data is not always constructed out of repeating elements. It is clear that if the input stream contains many repeating elements there must be some way of reducing them. However that doesn’t mean that we cannot compress data if there are no repetitions. It all depends on the data. Let’s say we have the following stream to compress. 1, 2, 3, 4, 5, 6, 7 It's hard to imagine how this stream of data can be compressed. The same problem may occur when trying to compress the alphabet. Indeed the letters of the alphabet are the very base of words so it is the minimal part for word construction and therefore hard to compress. Fortunately this isn’t true always. An algorithm that tries to deal with non-repeating data is relative encoding. Let’s see the following input stream – years from a given decade (the 90′s). 1991, 1991, 1999, 1998, 1991, 1993, 1992, 1992 Here we have 39 characters and we can reduce them. A natural approach is to remove the leading “19” as we humans often do. 91, 91, 99, 98, 91, 93, 92, 92 Now we have a shorter string, but we can go even further by keeping only the first year. All other years will as relative to this year. 91, 0, 8, 7, 0, 2, 1, 1 Now the volume of transferred data is reduced a lot (from 39 to 16 – more than 50%). However there are some questions we need to answer first, because the stream wont always be formatted in such a pretty way. How about the next character stream? 91, 94, 95, 95, 98, 100, 101, 102, 105, 110 We see that the value 100 is somehow in the middle of the interval and it is handy to use it as a base value for the relative encoding. Thus the stream above will become: -9, -6, -5, -5, -2, 100, 1, 2, 5, 10 The problem is that we can’t always decide which value will be the base value so easily. What if the data was dispersed in a different way: 96, 97, 98, 99, 100, 101, 102, 103, 999, 1000, 1001, 1002 Now the value of “100” isn’t useful, because compressing the stream will get something like this: -4, -3, -2, -1, 100, 1, 2, 3, 899, 900, 901, 902 To group the relative values around “some” base values will be far more handy. (-4, -3, -2, -1, 100, 1, 2, 3) (-1, 1000, 1, 2) However, to decide which value will be the base value isn’t that easy. Also the encoding format is not so trivial. On the other hand, this type of encoding can be useful in some specific cases as we can see below. Implementation The implementation of this algorithm depends on the specific task and the format of the data stream. Assuming that we have to transfer the stream of years in JSON from a web server to a browser, here’s a short PHP snippet. // JSON: [1991,1991,1999,1998,1999,1998,1995,1997,1994,1993] $years = array(1991,1991,1999,1998,1999,1998,1995,1997,1994,1993); function relative_encoding($input) { $output = array(); $inputLength = count($input); $base = $input[0]; $output[] = $base; for ($i = 1; $i < $inputLength; $i++) { $output[] = $input[$i] - $base; } return $output; } // JSON: [1991,0,8,7,8,7,4,6,3,2] echo json_encode(relative_encoding($years)); Application This algorithm may be very useful in many cases, such as this one: there are plenty of map applications around the web. Some products such as Google Maps, Yahoo! Maps, Bing Maps are quite famous, while there are also very useful open source projects like OpenStreetMap. The web sites using these apps number in the thousands. A typical use case is to transfer lots of Geo coordinates from a web server to a browser using JSON. Indeed any GEO point on Earth is relative to the point (0,0), which is located near the west coast of Africa, however on large zoom levels, when there are tons of markers we can transfer the information with relative encoding. For instance the following diagram shows San Francisco with some markers on it. The coordinates are relative to the point (0,0) on Earth. Map markers can be relative to the (0, 0) point on Earth, which can occasionally be useless. Far more useful may be to encode those markers, relative to the center of the city, thus we can save some space. Relative encoding can be useful for map markers on a large zoom level, however this type of compression can be tricky. For example, when dragging the map and updating the marker array. On the other hand, we must group markers if we have to load more than one city. That’s why we must be careful when implementing it. But it can be very useful – for instance on initial load of the map we can reduce data and speed up the load time. The thing is that with relative encoding we can save only changes to base value (data) – something like version control systems and thus reducing data transfer and load. Here’s a graphical example. In the first case on the diagram below we can see that each item is stored on its own. It doesn’t depend on the adjacent items and it can be completely independent of them. However we can keep full info only for the first item and any other item will be relative to it, like on the diagram bellow. Source: http://www.stoimen.com/blog/2012/01/30/computer-algorithms-data-compression-with-relative-encoding/
January 31, 2012
by Stoimen Popov
· 17,706 Views
article thumbnail
Algorithm of the Week: Data Compression with Diagram Encoding and Pattern Substitution
Two variants of run-length encoding are the diagram encoding and the pattern substitution algorithms. The diagram encoding is actually a very simple algorithm. Unlike run-length encoding, where the input stream must consists of many repeating elements, “aaaaaaaa” for instance, which are very rare in a natural language, there are many so-called “diagrams” in almost any natural language. In plain English there are some diagrams such as “the”, “and”, “ing” (in the word “waiting” for example), “ a”, “ t”, “ e” and many doubled letters. Actually we can extend those diagrams by adding surrounding spaces. Thus we can encode not only “the”, but “ the “, which are 5 characters (2 spaces and 3 letters) with something shorter. On the other hand, as I said, in plain English there are too many doubled letters, which unfortunately aren’t something special for run-length encoding and the compression ratio will be small. Even worse the encoded text may happen to be longer than the input message. Let’s see some examples. Let’s say we’ve to encode the message “successfully accomplished”, which consists of four doubled letters. However to compress it with run-length encoding we’ll need at least 8 characters, which doesn’t help us a lot. // 8 chars replaced by 8 chars!? input: "successfully accomplished" output: "su2ce2sfu2ly a2complished" The problem is that if the input text contains numbers, “2” in particular, we’ve to chose an escape symbol (“@” for example), which we’ll use to mark where the encoded run begins. Thus if the input message is “2 successfully accomplished tasks”, it will be encoded as “2 su@2ce@2sfu@2ly a@2complished tasks”. Now the output message is longer!!! than the input string. // the compressed message is longer!!! input: "2 successfully accomplished" output: "2 su@2ce@2sfu@2ly a@2complished tasks" Again if the input stream contains the escape symbol, we have to find another one, and the problem is that it is often too difficult to find short escape symbol that doesn’t appear in the input text, without a full scan of the text. That is why run-length encoding isn’t a good solution when compressing plain text, where long runs rarely appear. Well, of course, there are exceptions. For example such an exception is the lossy text compression with run-length encoding. It is intuitively clear that compressing text with loss is rarely useful, especially when you’ve to decompress exactly the same text. However there are some cases that lossy compression may be useful. Such case can be removing spaces. Indeed the text “successfully accomplished” brings us exactly the same information as “successfully accomplished”. In this case we can simply remove those spaces. Indeed we can use a marker to indicate the long run of spaces like “successfully@6 accomplished” in order to decompress the input string with absolutely no loss, but we can also throw those symbols away. This desision depends on the goal. Exactly with the same goal in mind we can remove new lines and tabs, only if we’re sure that the sense of the text is preserved. Yet again, a problem is that such long runs don’t happen to occur in random texts. That is why it’s better to use diagram encoding for plain text compression instead of run-length encoding. A Few Questions After understanding the principles of the diagram encoding, let’s see some examples. In the example above it is better to replace doubled letters with something shorter. Let’s say # for “cc”, @ for “ss” and % for “ll”. Thus the input text will be compressed as “su#e@fu%y a#omplished”, which is shorter. But yet again what will happen if the input message contains one of the substitutions? Also we can’t say if there are many doubled letters and enough reasonable substitutions for them. A better approach is to replace patterns. Run-length encoding isn't a good approach for text compression, because long runs rarely appear in a natural language. Pattern Substitution The pattern substitution algorithm is a variant of the diagram encoding. As I said above in plain English a very commonly used pattern can be “ the “, which is five characters long. We can now replace it with something like “$%” for example. In this case the message “I send the message” will become “I send$%message”. However there are some obstacles to overcome. The first problem is that we need to know the language and somehow to define commonly used patterns in a dictionary. What would happen with a message written in some language we don’t know nothing about. Let’s say – Latin like the example bellow. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras venenatis, sapien eget suscipit placerat, justo quam blandit mauris, quis tempor ante sapien sodales augue. Praesent ut mauris quam. Phasellus scelerisque, ante quis consequat tristique, metus turpis consectetur leo, vitae facilisis sapien mi eu sapien. Praesent vitae ligula elit, et faucibus augue. Sed rhoncus sodales dolor ut gravida. In quis augue ac nulla auctor mattis sed sed libero. Donec eget purus eget enim tempor porta vitae eget diam. Mauris aliquet malesuada ipsum, non pulvinar urna vestibulum ac. Donec feugiat velit vitae nunc cursus imperdiet. Donec accumsan faucibus dictum. Phasellus sed mauris sapien. Maecenas mi metus, tincidunt sed rhoncus nec, sodales non sapien. Clearly without knowing Latin it isn’t easy to define which are those commonly used patterns. The thing is that it’s better to use pattern substitution if you know in advance the set of words and characters. The second problem is related to decompression. It is obvious that we need to define a dictionary and this dictionary must be used when decoding the message. It will be great also if we find more patterns longer than three characters. If not, the compression ratio will be low. Unfortunately such patterns aren’t very common in any natural language. Diagram encoding and pattern substitution are far more suitable for text compression than run-length encoding. In fact, pattern substitution is very effective on compressing programming languages. Application It is interesting to answer the question, how to use diagram encoding or patter substitution to compress text in natural language, especially when we don’t know the language in detail? The answer hides in the question. We wont compress natural languages, but machine language. Exactly machine (programming) languages are limited to a smaller sets of words and symbols. Isn’t it true for any programing language? Like PHP, where words like “function”, “while”, “for”, “break”, “switch”, “foreach” happen to be often in use, or HTML with its defined set of tags. Perhaps the best example is CSS, where only the values of the properties can vary. CSS files also tend to have multiple new lines, tabs and spaces, which only humans read. The question here is why should we compress those file types. It’s clear that after the compression they will be completely useless, both for humans and machines. Yes, that is true, but what if we have to store versions of those files into a DB. Kind of a backup. Imagine you’re working for a web hosting company that has to store daily versions of the sites it’s hosting. Thus the volume of stored information even for small companies hosting only few sites can be enormous. The problem is that compressing those files with some conventional compressing tool isn’t a good idea. Thus we’ve to save a copy of the entire site every day, but as we know the difference between daily versions of a site can be small. A version control system is another solution, but then you’ve to store the plain text of the files. Perhaps a better approach is to compress the text using pattern substitution and then saving only differences – kind of version control, which can be done with “relative encoding”. Using the above method we can save lots of disk space and in the same time we can compress/decompress easily. Another good thing is that you can save only changes to the initial files, like version control, which can also be compressed. Implementation The implementation of this algorithm is again on PHP and tries only to describe the main principles of compression. In this case I tried to compress a CSS file using the compression above. Although this example is quite primitive we can see some interesting facts. First of all you only need encoding and decoding dictionaries. Practically the encoding and decoding processes are equal, so you don’t need to implement two different functions. Here in this example a native PHP function is used – str_replace, because the purpose of this algorithm is not to describe pattern substitution techniques, but pattern substitution. It assumes that today’s programming languages have string manipulation functions for the purposes of this task. $str = file_get_contents('large_style_file.css'); $encoding_dict = array( "\n" => '$0', 'text' => '$1', 'color' => '$2', 'display' => '$3', 'font' => '$4', 'width' => '$5', 'height' => '$6', ' ' => '', ); function replace_patterns($input, $dict) { foreach ($dict as $pattern => $replace) { $input = str_replace($pattern, $replace, $input); } return $input; } $result = replace_patterns($str, $encoding_dict); By only replacing few CSS properties I achieved almost 40% of compression ratio (as shown the diagram bellow). The initial file is 202 KB, while compressed it’s only 131 KB. Of course, it all depends on the CSS file, but how about replacing all property names with shorter ones. Perhaps then the compression will be even better. Source: http://www.stoimen.com/blog/2012/01/23/computer-algorithms-data-compression-with-diagram-encoding-and-pattern-substitution/
January 24, 2012
by Stoimen Popov
· 23,813 Views
article thumbnail
Algorithm of the Week: Data Compression with Bitmaps
In my previous post we saw how to compress data consisting of very long runs of repeating elements. This type of compression is known as “run-length encoding” and can be very handy when transferring data with no loss. The problem is that the data must follow a specific format. Thus the string “aaaaaaaabbbbbbbb” can be compressed as “a8b8”. Now a string with length 16 can be compressed as a string with length 4, which is 25% of its initial length without loosing any information. There will be a problem in case the characters (elements) were dispersed in a different way. What would happen if the characters are the same, but they don’t form long runs? What if the string was “abababababababab”? The same length, the same characters, but we cannot use run-length encoding! Indeed using this algorithm we’ll get at best the same string. In this case, however, we can see another fact. The string consists of too many repeating elements, although not arranged one after another. We can compress this string with a bitmap. This means that we can save the positions of the occurrences of a given element with a sequence of bits, which can be easily converted into a decimal value. In the example above the string “abababababababab” can be compressed as “1010101010101010”, which is 43690 in decimals, and even better AAAA in hexadecimal. Thus the long string can be compressed. When decompressing (decoding) the message we can convert again from decimal/hexadecimal into binary and match the occurrences of the characters. Well, the example above is too simple, but let’s say only one of the characters is repeating and the rest of the string consists of different characters like this: “abacadaeafagahai”. Then we can use bitmap only for the character “a” – “1010101010101010” and compress it as “AAAA bcdefghi”. As you can see all the example strings are exactly 16 characters and that is a limitation. To use bitmaps with variable length of the data is a bit tricky and it is not always easy (if possible) to decompress it. Basically bitmap compression saves the positions of an element that is repeated very often in the message! In the other hand bitmap compression is not only applicable on strings. We can compress also arrays, objects or any kind of data. The example from my previous post is very suitable. Then we had to transfer a large array from a server to the client (browser) using JSON. The data then was very suitable for “run-length encoding”. Now let’s assume we have the same data – a set of different years, which this time are dispersed in a different way. $data = array( 0 => 1991, 1 => 1992, 2 => 1993, 3 => 1994, 4 => 1991, 5 => 1992, 6 => 1993, 7 => 1992, 8 => 1991, 9 => 1991, 10 => 1991, 11 => 1992, 12 => 1992, 13 => 1991, 14 => 1991, 15 => 1992, ... ); The JSON will encoded message will be the following (a simple but yet very large javascript array). [1991,1992,1993,1994,1991,1992,1993,1992,1991,1991,1991,1992,1992,1991,1991,1992, ...] However if we use bitmap compression we’ll get a “shorter” array. $data = array( 0 => array(1991, '1000100011100110'), 1 => array(1992, '0100010100011001'), 2 => array(1993, '0010001000000000'), 3 => array(1994, '0001000000000000'), ); Now the JSON is: [[1991,"1000100011100110"],[1992,"0100010100011001"],[1993,"0010001000000000"],[1994,"0001000000000000"]] It is obvious that the compression ratio is getting better and better as the uncompressed data grows. In fact, most of us know bitmap compression from images, because this algorithm is largely used for image compression. We can imagine how successful it can be when compressing black and white images (as black and white can be represented as 0 and 1s). Actually it is used for more than two colors (256 for instance) and again the level of compression is very high. Implementation The following implementation on PHP aims only to illustrate the bitmap compressing algorithm. As we know this algorithm can be applicable for any kind of data structures. // too many repeating "a" characters $msg = 'aazahalavaatalawacamaahakafaaaqaaaiauaacaaxaauaxaaaaaapaayatagaaoafaawayazavaaaazaaabararaaaaakakaaqaarazacajaazavanazaaaeanaaoajauaaaaaxalaraaapabataaavaaab'; function bitmap($message) { $i = 0; $bits = $rest = ''; while ($v = $message[$i]) { if ($v == 'a') { $bits .= '1'; } else { $bits .= '0'; $rest .= $v; } $i++; } return number_format(bindec($bits), 0, '.', '') . $rest;; } echo bitmap($msg); // uncompressed: acaaaaadaaaabalaaeaaaaganaaxakaavawamaasavajawaaaayaauaaadalanagaeaeamaarafalaazaaaiasaanaahaaazaraxaalaahaaawaaajasamahaajaakarapanaakaoakaanawalaacamauaamaal // compressed: 152299251941730035874325065523548237677352452096zhlvtlwcmhkfqiucxuxpytgofwyzvzbrrkkqrzcjzvnzenojuxlrpbtvb Application This algorithm is very useful when there is an element in our data that repeats very often, so you need to investigate the nature of the data you want to compress. Actually because of this fact this algorithm is used for image compression as PNG8 or GIF. Source: http://www.stoimen.com/blog/2012/01/16/computer-algorithms-data-compression-with-bitmaps/
January 17, 2012
by Stoimen Popov
· 20,337 Views
article thumbnail
Smart Batching
How often have we all heard that “batching” will increase latency? As someone with a passion for low-latency systems this surprises me. In my experience when batching is done correctly, not only does it increase throughput, it can also reduce average latency and keep it consistent. Well then, how can batching magically reduce latency? It comes down to what algorithm and data structures are employed. In a distributed environment we are often having to batch up messages/events into network packets to achieve greater throughput. We also employ similar techniques in buffering writes to storage to reduce the number of IOPS. That storage could be a block device backed file-system or a relational database. Most IO devices can only handle a modest number of IO operations per second, so it is best to fill those operations efficiently. Many approaches to batching involve waiting for a timeout to occur and this will by its very nature increase latency. The batch can also get filled before the timeout occurs making the latency even more unpredictable. Figure 1. Figure 1. above depicts decoupling the access to an IO device, and therefore the contention for access to it, by introducing a queue like structure to stage the messages/events to be sent and a thread doing the batching for writing to the device. The Algorithm An approach to batching uses the following algorithm in Java pseudo code: public final class NetworkBatcher implements Runnable { private final NetworkFacade network; private final Queue queue; private final ByteBuffer buffer; public NetworkBatcher(final NetworkFacade network, final int maxPacketSize, final Queue queue) { this.network = network; buffer = ByteBuffer.allocate(maxPacketSize); this.queue = queue; } @Override public void run() { while (!Thread.currentThread().isInterrupted()) { while (null == queue.peek()) { employWaitStrategy(); // block, spin, yield, etc. } Message msg; while (null != (msg = queue.poll())) { if (msg.size() > buffer.remaining()) { sendBuffer(); } buffer.put(msg.getBytes()); } sendBuffer(); } } private void sendBuffer() { buffer.flip(); network.send(buffer); buffer.clear(); } } Basically, wait for data to become available and as soon as it is, send it right away. While sending a previous message or waiting on new messages, a burst of traffic may arrive which can all be sent in a batch, up to the size of the buffer, to the underlying resource. This approach can use ConcurrentLinkedQueue which provides low-latency and avoid locks. However it has an issue in not creating back pressure to stall producing/publishing threads if they are outpacing the batcher whereby the queue could grow out of control because it is unbounded. I’ve often had to wrap ConcurrentLinkedQueue to track its size and thus create back pressure. This size tracking can add 50% to the processing cost of using this queue in my experience. This algorithm respects the single writer principle and can often be employed when writing to a network or storage device, and thus avoid lock contention in third party API libraries. By avoiding the contention we avoid the J-Curve latency profile normally associated with contention on resources, due to the queuing effect on locks. With this algorithm, as load increases, latency stays constant until the underlying device is saturated with traffic resulting in a more "bathtub" profile than the J-Curve. Let’s take a worked example of handling 10 messages that arrive as a burst of traffic. In most systems traffic comes in bursts and is seldom uniformly spaced out in time. One approach will assume no batching and the threads write to device API directly as in Figure 1. above. The other will use a lock free data structure to collect the messages plus a single thread consuming messages in a loop as per the algorithm above. For the example let’s assume it takes 100µs to write a single buffer to the network device as a synchronous operation and have it acknowledged. The buffer will ideally be less than the MTU of the network in size when latency is critical. Many network sub-systems are asynchronous and support pipelining but we will make the above assumption to clarify the example. If the network operation is using a protocol like HTTP under REST or Web Services then this assumption matches the underlying implementation. Best (µs) Average (µs) Worst (µs) Packets Sent Serial 100 500 1,000 10 Smart Batching 100 150 200 1-2 The absolute lowest latency will be achieved if a message is sent from the thread originating the data directly to the resource, if the resource is un-contended. The table above shows what happens when contention occurs and a queuing effect kicks in. With the serial approach 10 individual packets will have to be sent and these typically need to queue on a lock managing access to the resource, therefore they get processed sequentially. The above figures assume the locking strategy works perfectly with no perceivable overhead which is unlikely in a real application. For the batching solution it is likely all 10 packets will be picked up in first batch if the concurrent queue is efficient, thus giving the best case latency scenario. In the worst case only one message is sent in the first batch with the other nine following in the next. Therefore in the worst case scenario one message has a latency of 100µs and the following 9 have a latency of 200µs thus giving a worst case average of 190µs which is significantly better than the serial approach. This is one good example when the simplest solution is just a bit too simple because of the contention. The batching solution helps achieve consistent low-latency under burst conditions and is best for throughput. It also has a nice effect across the network on the receiving end in that the receiver has to process fewer packets and therefore makes the communication more efficient both ends. Most hardware handles data in buffers up to a fixed size for efficiency. For a storage device this will typically be a 4KB block. For networks this will be the MTU and is typically 1500 bytes for Ethernet. When batching, it is best to understand the underlying hardware and write batches down in ideal buffer size to be optimally efficient. However keep in mind that some devices need to envelope the data, e.g. the Ethernet and IP headers for network packets so the buffer needs to allow for this. There will always be an increased latency from a thread switch and the cost of exchange via the data structure. However there are a number of very good non-blocking structures available using lock-free techniques. For the Disruptor this type of exchange can be achieved in as little as 50-100ns thus making the choice of taking the smart batching approach a no brainer for low-latency or high-throughput distributed systems. This technique can be employed for many problems and not just IO. The core of the Disruptor uses this technique to help rebalance the system when the publishers burst and outpace the EventProcessors. The algorithm can be seen inside the BatchEventProcessor. Note: For this algorithm to work the queueing structure must handle the contention better than the underlying resource. Many queue implementations are extremely poor at managing contention. Use science and measure before coming to a conclusion. Batching with the Disruptor The code below shows the same algorithm in action using the Disruptor's EventHandler mechanism. In my experience, this is a very effective technique for handling any IO device efficiently and keeping latency low when dealing with load or burst traffic. public final class NetworkBatchHandler implements EventHander { private final NetworkFacade network; private final ByteBuffer buffer; public NetworkBatchHandler(final NetworkFacade network, final int maxPacketSize) { this.network = network; buffer = ByteBuffer.allocate(maxPacketSize); } public void onEvent(Message msg, long sequence, boolean endOfBatch) throws Exception { if (msg.size() > buffer.remaining()) { sendBuffer(); } buffer.put(msg.getBytes()); if (endOfBatch) { sendBuffer(); } } private void sendBuffer() { buffer.flip(); network.send(buffer); buffer.clear(); } } The endOfBatch parameter greatly simplifies the handling of the batch compared to the double loop in the algorithm above. I have simplified the examples to illustrate the algorithm. Clearly error handling and other edge conditions need to be considered. Separation of IO from Work Processing There is another very good reason to separate the IO from the threads doing the work processing. Handing off the IO to another thread means the worker thread, or threads, can continue processing without blocking in a nice cache friendly manner. I've found this to be critical in achieving high-performance throughput. If the underlying IO device or resource becomes briefly saturated then the messages can be queued for the batcher thread allowing the work processing threads to continue. The batching thread then feeds the messages to the IO device in the most efficient way possible allowing the data structure to handle the burst and if full apply the necessary back pressure, thus providing a good separation of concerns in the workflow. Conclusion So there you have it. Smart Batching can be employed in concert with the appropriate data structures to achieve consistent low-latency and maximum throughput. From http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html
October 26, 2011
by Martin Thompson
· 11,371 Views · 1 Like
article thumbnail
Brute forcing a bin packing problem
Even a basic planning problem, such as bin packing, can be notoriously hard to solve and scale. One might consider the brute force algorithm. Let's take a look at how that algorithm works out on the cloud balance example of Drools Planner: Given a set of servers with different hardware (CPU, memory and network bandwidth) and given a set of processes with different hardware requirements, assign each process to 1 server and minimize the total cost of the active servers. The brute force algorithm is simple: try every combination between processes where each process is assigned to each server. For example, if we have 6 processes (P0, P1, P2, P3, P4, P5) and 2 servers (S0, S1), we'd try these solutions: P0->S0, P1->S0, P2->S0, P3->S0, P4->S0, P5->S0 P0->S0, P1->S0, P2->S0, P3->S0, P4->S0, P5->S1 P0->S0, P1->S0, P2->S0, P3->S0, P4->S1, P5->S0 P0->S0, P1->S0, P2->S0, P3->S0, P4->S1, P5->S1 ... P0->S1, P1->S1, P2->S1, P3->S1, P4->S1, P5->S1 On my machine, it takes 15ms to calculate the score of these 2^6 combinations. When I scale out to 9 processes and 3 servers, which are 3^9 combinations, it becomes 1582ms. So it scales like this: Notice that despite that the number of processes has not even doubled, the running time multiplied by 100! For comparison, I 've added the running time of the First Fit algorithm. And it gets worse: for 12 processes and 4 servers, which are 4^12 combinations, it take more than 17 minutes: What if we want to distribute 3000 processes over 1000 servers? With this kind of scalability, it will take too long. In fact, the brute force algorithm is useless. Luckily, Drools Planner implements several other optimization algorithms, which can handle such loads. If you want to know more about them, take a look at the Drools Planner manual or come to my talk at JUDCon London (31 Oct - 1 Nov). This article was originally posted on the Drools & jBPM blog.
September 26, 2011
by Geoffrey De Smet
· 9,929 Views
  • Previous
  • ...
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×