Data Resources

The Latest Data Topics

11 OPEN NoSQL Document-Oriented Databases

A document-oriented database is a designed for storing, retrieving, and managing document-oriented, or semi structured data. Document-oriented databases are one of the main categories of NoSQL databases. The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format(s) (or encoding(s)). Encodings in use include XML, YAML, JSON and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on). MongoDB: MongoDB is a collection-oriented, schema-free document database. Data is grouped into sets that are called ‘collections’. Each collection has a unique name in the database, and can contain an unlimited number of documents. Collections are analogous to tables in a RDBMS, except that they don’t have any defined schema. It store data (which is in BASON – “Binary Serialized dOcument Notation” format) that is a structured collection of key-value pairs, where keys are strings, and values are any of a rich set of data types, including arrays and documents. Home: http://www.mongodb.org/ Quick Start: http://www.mongodb.org/display/DOCS/Quickstart Download: http://www.mongodb.org/downloads CouchDB: CouchDB is a document database server, accessible via a RESTful JSON API. It is Ad-hoc and schema-free with a flat address space. Its Query-able and index-able, featuring a table oriented reporting engine that uses JavaScript as a query language. A CouchDB document is an object that consists of named fields. Field values may be strings, numbers, dates, or even ordered lists and associative maps. Home: http://couchdb.apache.org/ Quick Start: http://couchdb.apache.org/docs/intro.html Download: http://couchdb.apache.org/downloads.html Terrastore: Terrastore is a modern document store which provides advanced scalability and elasticity features without sacrificing consistency. It is based on Terracotta, so it relies on an industry-proven, fast clustering technology. Home: http://code.google.com/p/terrastore/ Quick Start: http://code.google.com/p/terrastore/wiki/Documentation Download: http://code.google.com/p/terrastore/downloads/list RavenDB: Raven is a .NET Linq enabled Document Database, focused on providing high performance, schema-less, flexible and scalable NoSQL data store for the .NET and Windows platforms. Raven store any JSON document inside the database. It is schema-less database where you can define indexes using C#’s Linq syntax. Home: http://ravendb.net/ Quick Start: http://ravendb.net/tutorials Download: http://ravendb.net/download OrientDB: OrientDB is an open source NoSQL database management system written in Java. Even if it is a document-based database, the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes. It has a strong security profiling system based on users and roles and supports SQL as a query languages. Home: http://www.orientechnologies.com/ Quick Start: http://code.google.com/p/orient/wiki/Tutorials Download: http://code.google.com/p/orient/wiki/Download ThruDB: Thrudb is a set of simple services built on top of the Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services that can enhance or replace traditional data storage and access layers. It supports multiple storage backends such as BerkeleyDB, Disk, MySQL and also having Memcache and Spread integration. Home: http://code.google.com/p/thrudb/ Quick Start: http://thrudb.googlecode.com/svn/trunk/doc/Thrudb.pdf Download: http://code.google.com/p/thrudb/source/checkout SisoDB: SisoDb is a document-oriented db-provider for Sql-Server written in C#. It lets you store object graphs of POCOs (plain old clr objects) without having to configure any mappings. Each entity is treated as an aggregate root and will get separate tables created on the fly. Home: http://www.sisodb.com Quick Start: http://www.sisodb.com/Wiki Download: https://github.com/danielwertheim/SisoDb-Provider/ RaptorDB: RaptorDB is a extremely small size and fast embedded, noSql, persisted dictionary database using b+tree or MurMur hash indexing. It was primarily designed to store JSON data (see my fastJSON implementation), but can store any type of data that you give it. Home: http://www.codeproject.com/KB/database/RaptorDB.aspx Quick Start: http://www.codeproject.com/KB/database/RaptorDB.aspx Download: http://www.codeproject.com/KB/database/RaptorDB.aspx CloudKit: CloudKit provides schema-free, auto-versioned, RESTful JSON storage with optional OpenID and OAuth support, including OAuth Discovery. Home: http://getcloudkit.com/ Quick Start: http://getcloudkit.com/api/ Download: https://github.com/jcrosby/cloudkit Perservere: Persevere is an open source set of tools for persistence and distributed computing using an intuitive standards-based JSON interfaces of HTTP REST, JSON-RPC, JSONPath, and REST Channels. The core of the Persevere project is the Persevere Server. The Persevere server includes a Persevere JavaScript client, but the standards-based interface is intended to be used with any framework or client. Home: http://code.google.com/p/persevere-framework/ Quick Start: http://code.google.com/p/persevere-framework/w/list Download: http://code.google.com/p/persevere-framework/downloads/list Jackrabbit: The Apache Jackrabbit™ content repository is a fully conforming implementation of the Content Repository for Java Technology API (JCR, specified in JSR 170 and 283). A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more. Home: http://jackrabbit.apache.org Quick Start: http://jackrabbit.apache.org/getting-started-with-apache-jackrabbit.html Download: http://jackrabbit.apache.org/downloads.html Conclusion: Document databases store and retrieve documents and basic atomic stored unit is a document. As always your requirement leads into the decision. You need to think about your data-access patterns / use-cases to create a smart document-model. When your domain model can be split and partitioned across some documents, a document-database will be a suitable one for you. For example for a blog-software, a CMS or a wiki-software a document-db works extremely well. But at the same time a non-relational database is not better than a relational one in some cases where your database have a lot of relations and normalization. Just check the following link from stackoverflow also to cover the pros/cons of Relational Vs Document based databases. http://stackoverflow.com/questions/337344/pros-cons-of-document-based-databases-vs-relational-databases

July 23, 2012

by Lijin Joseji

· 69,299 Views · 2 Likes

WebSockets vs. SignalR: Why You Should Not Have To Care

Introduction When the web was founded, it was designed as a system to distribute and update data. If you look at the HTTP protocol you can clearly see these origins. It has commands to GET, UPDATE, POST or DELETE data, but it is always the client (and if we ignore REST, this means in most cases the browser) who takes the initiative. However, the web is changing. It has evolved from a pure data distribution system to an application distribution system. Today, the magic word of IT vendors for this is the ‘cloud’, but in fact this shift to an application distribution system has already started some years ago. In order to start massively replacing traditional desktop applications, the web needs a new trick: two way communication between browser and server. The back-end must be able to update parts of a web page without user initiative. A definitive technical solution is underway in the form of web sockets, but do we really need to wait until this technology is broadly supported by the browsers of most users? In my opinion, the answer should be a clear NO. Where SignalR comes in When there are limitations, people become creative and are able to circumvent the issues blocking them from developing great products. Many popular web applications/sits are already capable of updating their content dynamically, yet they do not rely on web sockets. How is this possible? Because they rely on patterns such as ‘long polling’. With long polling, the browser sends a request for information to the web server with a huge timeout. The web server does not immediately sends data back to the browser, but waits until it has data to send back. When the client receives back data from the server, it will immediately resend a new request to the server. This long polling pattern and similar patterns give the user the illusion that a persistent two way connection exists between the browser and the web server, but it causes some unnecessary hard work for applicative developers and this is where SignalR comes in for .NET developers. It makes an abstraction of the long polling pattern and gives applicative developers the same illusion as their end users: a persistent two way connection between browser and web server. SignalR takes care of all the details and allows developers to focus on their most important task: building a great application for users. Because SignalR abstracts the underlying communication protocol, it can both support WebSockets and patterns, such as long polling. This makes an upgrade to WebSockets fairly easy when your organization adopts Windows Server 2012 and your users have moved to modern browsers such as Internet Explorer 10 or recent versions of Firefox and Google Chrome. Conclusion As I already wanted to emphasize in the title, comparing WebSockets with SignalR is pointless. Yes, WebSockets is technically superior and will probably give you some extra performance on your server side. But SignalR makes it possible to start developing today the web applications of tomorrow. When WebSockets become broadly available, SignalR will make it possible for you to move away from long polling without a lot of impact on your code.

July 21, 2012

by Pieter De Rycke

· 42,526 Views

Replacing Query String Elements in C# .NET and JavaScript

While writing list navigation and search features in websites today there is a constant need to find/replace and play with query string elements, so that you can easily manipulate these mystical items while you’re carrying them around in your website’s URLs. I have a few little methods I’ve used over the years and carry with me project to project, and this post is putting them on the record for easy access later. I have a secret. This post is actually more aimed at an audience of 'myself', and my ability to have an easy bit of source code to call upon when I’m on the go looking for a quick solution to cut and paste – as most of my blog posts are. But you, dear reader, you get to share in this benefit with me by pulling from the awesomeness within this post as well. Solution: .Net c# When doing this with c# you have a few pretty cool features up your sleeve. One of these is HttpUtility.ParseQueryString(urlPath) framework method. This static method allows you to extract a NameValueCollection that is editable from a given query string. Why is this cool? Because it allows you to very easily play with the query string collection like it is any other NameValueCollection – with Add() and Remove() methods. This makes it incredibly powerful. Quick & Dirty code beware! The code I’m pasting below is far from being the most elegant solution, i seem to have misplaced my nicer piece of code and am in too much of a rush to find it right now (sorry). Until i find my nicer solution, the method below will get you by – whether you have a hatred for ternary’s or not. public static string ReplaceQueryStringParam(string currentPageUrl, string paramToReplace, string newValue) { string urlWithoutQuery = currentPageUrl.IndexOf('?') >= 0 ? currentPageUrl.Substring(0, currentPageUrl.IndexOf('?')) : currentPageUrl; string queryString = currentPageUrl.IndexOf('?') >= 0 ? currentPageUrl.Substring(currentPageUrl.IndexOf('?')) : null; var queryParamList = queryString != null ? HttpUtility.ParseQueryString(queryString) : HttpUtility.ParseQueryString(string.Empty); if (queryParamList[paramToReplace] != null) { queryParamList[paramToReplace] = newValue; } else { queryParamList.Add(paramToReplace, newValue); } return String.Format("{0}?{1}", urlWithoutQuery, queryParamList); } To call this, you can do the following: // var currentUrl = HttpContext.Current.Request.Url; var currentUrl = "http://www.mysite.com/mypage?category=cool-products&sort=price&page=3"; // change the my sort-by param named"sort" to "name" var newUrlWithChangedSort = ReplaceQueryStringParam(currentUrl, "sort", "name"); Solution: JavaScript The second part of this post includes a JavaScript solution, as you never know when you have to do this on the client side. function replaceQueryString(url, param, value) { if (url.lastIndexOf('?') <= 0) url = url + "?"; var re = new RegExp("([?|&])" + param + "=.*?(&|$)", "i"); if (url.match(re)) return url.replace(re, '$1' + param + "=" + value + '$2'); else return url.substring(url.length - 1) == '?' ? url + param + "=" + value : url + '&' + param + "=" + value; } And to use the above code in your client-side javascript simply write something along the lines of: //var currentUrl = self.location; var currentUrl = "http://www.mysite.com/mypage?category=cool-products&sort=price&page=3"; // change the my sort-by param named"sort" to "name" var newUrlWithChangedSort = replaceQueryString(currentUrl, "sort", "name"); Easy – now next time you need to knock something together, instead of writing it yourself, you can simply cut & paste mine!

July 20, 2012

by Douglas Rathbone

· 16,540 Views

Spring Data - Apache Hadoop

Spring for Apache Hadoop is a Spring project to support writing applications that can benefit of the integration of Spring Framework and Hadoop. This post describes how to use Spring Data Apache Hadoop in an Amazon EC2 environment using the “Hello World” equivalent of Hadoop programming – a Wordcount application. 1./ Launch an Amazon Web Services EC2 instance. - Navigate to AWS EC2 Console (“https://console.aws.amazon.com/ec2/home”): - Select Launch Instance then Classic Wizzard and click on Continue. My test environment was a “Basic Amazon Linux AMI 2011.09″ 32-bit., Instant type: Micro (t1.micro , 613 MB), Security group quick-start-1 that enables ssh to be used for login. Select your existing key pair (or create a new one). Obviously you can select another AMI and instance types depending on your favourite flavour. (Should you vote for Windows 2008 based instance, you also need to have cygwin installed as an additional Hadoop prerequisite beside Java JDK and ssh, see “Install Apache Hadoop” section) 2./ Download Apache Hadoop - as of writing this article, 1.0.0 is the latest stable version of Apache Hadoop, that is what was used for testing purposes. I downloaded hadoop-1.0.0.tar.gz and copied it into /home/ec2-user directory using pscp command from my PC running Windows: c:\downloads>pscp -i mykey.ppk hadoop-1.0.0.tar.gz [email protected]:/home/ec2-user (the computer name above – ec2-ipaddress-region-compute.amazonaws.com – can be found on AWS EC2 console, Instance Description, public DNS field) 3./ Install Apache Hadoop: As prerequisites, you need to have Java JDK 1.6 and ssh installed, see Apache Single-Node Setup Guide. (ssh is automatically installed with Basic Amazon AMI). Then install hadoop itself: $ cd ~ # change directory to ec2-user home (/home/ec2-user) $ tar xvzf hadoop-1.0.0.tar.gz $ ln -s hadoop-1.0.0 hadoop $ cd hadoop/conf $ vi hadoop-env.sh # edit as below export JAVA_HOME=/opt/jdk1.6.0_29 $ vi core-site.xml # edit as below – this defines the namenode to be running on localhost and listeing to port 9000. fs.default.name hdfs://localhost:9000 $ vi hdsf-site.xml # edit as below this defines that file system replicate is 1 (in production environment it is supposed to be 3 by default) dfs.replication 1 $ vi mapred-site.xml # edit as below – this defines the jobtracker to be running on localhost and listeing to port 9001. mapred.job.tracker localhost:9001 $ cd ~/hadoop $ bin/hadoop namenode -format $ bin/start-all.sh At this stage all hadoop jobs are running in pseudo distributed mode, you can verify it by running: $ ps -ef | grep java You should see 5 java processes: namenode, secondarynamenode, datanode, jobtracker and tasktracker. 4./ Install Spring Data Hadoop Download Spring Data Hadoop package from SpringSource community download site. As of writing this article, the latest stable version is spring-data-hadoop-1.0.0.M1.zip. $ cd ~ $ tar xzvf spring-data-hadoop-1.0.0.M1.zip $ ln -s spring-data-hadoop-1.0.0.M1 spring-data-hadoop 5./ Build and Run Spring Data Hadoop Wordcount example $ cd spring-data-hadoop/spring-data-hadoop-1.0.0.M1/samples/wordcount Spring Data Hadoop is using gradle as build tool. Check build.grandle build file. The original version packaged in the tar.gz file does not compile, it complains about thrift, version 0.2.0 and jdo2-api, version2.3-ec. Add datanucleus.org maven repository to the build.gradle file to support jdo2-api (http://www.datanucleus.org/downloads/maven2/) . Unfortunatelly, there seems to be no maven repo for thrift 0.2.0 . You should download thrift 0.2.0.jar and thrift.0.2.0.pom file e.g. from this repo: “http://people.apache.org/~rawson/repo“ and then add it to local maven repo. $ mvn install:install-file -DgroupId=org.apache.thrift -DartifactId=thrift -Dversion=0.2.0 -Dfile=thrift-0.2.0.jar -Dpackaging=jar $ vi build.grandle # modify the build file to refer to datanucleus maven repo for jdo2-api and the local repo for thrift repositories { // Public Spring artefacts mavenCentral() maven { url “http://repo.springsource.org/libs-release” } maven { url “http://repo.springsource.org/libs-milestone” } maven { url “http://repo.springsource.org/libs-snapshot” } maven { url “http://www.datanucleus.org/downloads/maven2/” } maven { url “file:///home/ec2-user/.m2/repository” } } I also modified the META-INF/spring/context.xml file in order to run hadoop file system commands manually: $ cd /home/ec2-user/spring-data-hadoop/spring-data-hadoop-1.0.0.M1/samples/wordcount/src/main/resources $vi META-INF/spring/context.xml # remove clean-script and also the dependency on it for JobRunner. xmlns=”http://www.springframework.org/schema/beans” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:context=”http://www.springframework.org/schema/context” xmlns:hdp=”http://www.springframework.org/schema/hadoop” xmlns:p=”http://www.springframework.org/schema/p” xsi:schemaLocation=”http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd”> fs.default.name=${hd.fs} Copy the sample file – nietzsche-chapter-1.txt – to Hadoop file system (/user/ec2-user-/input directory) $ cd src/main/resources/data $ hadoop fs -mkdir /user/ec2-user/input $ hadoop fs -put nietzsche-chapter-1.txt /user/ec2-user/input/data $ cd ../../../.. # go back to samples/wordcount directory $ ../gradlew Verify the result: $ hadoop fs -cat /user/ec2-user/output/part-r-00000 | more “AWAY 1 “BY 1 “Beyond 1 “By 2 “Cheers 1 “DE 1 “Everywhere 1 “FROM” 1 “Flatterers 1 “Freedom 1

July 19, 2012

by Istvan Szegedi

· 11,930 Views

My Experience Moving Data from MySQL to Cassandra

I had a relational database, that I wanted to migrate to cassandra. Cassandra's sstableloader provides option to load the existing data from flat files to a cassandra ring. Hence this can be used as a way to migrate data in relational databases to cassandra, as most relational databases let us export the data into flat files. sqoop gives the option to do this effectively. Interestingly, DataStax Enterprise provides everything we want in the big data space as a package. This includes, cassandra, hadoop, hive, pig, sqoop, and mahout, which comes handy in this case. Under the resources directory, you may find the cassandra, dse, hadoop, hive, log4j-appender, mahout, pig, solr, sqoop, and tomcat specific configurations. For example, from resources/hadoop/bin, you may format the hadoop name node using ./hadoop namenode -format as usual. * Download and extract DataStax Enterprise binary archive (dse-2.1-bin.tar.gz). * Follow the documentation, which is also available as a PDF. * Migrating a relational database to cassandra is documented and is also blogged. * Before starting DataStax, make sure that the JAVA_HOME is set. This also can be set directly on conf/hadoop-env.sh. * Include the connector to the relational database into a location reachable by sqoop. I put mysql-connector-java-5.1.12-bin.jar under resources/sqoop. * Set the environment $ bin/dse-env.sh * Start DataStax Enterprise, as an Analytics node. $ sudo bin/dse cassandra -t where cassandra starts the Cassandra process plus CassandraFS and the -t option starts the Hadoop JobTracker and TaskTracker processes. if you start without the -t flag, the below exception will be thrown during the further operations that are discussed below. No jobtracker found Unable to run : jobtracker not found Hence do not miss the -t flag. * Start cassandra cli to view the cassandra keyrings and you will be able to view the data in cassandra, once you migrate using sqoop as given below. $ bin/cassandra-cli -host localhost -port 9160 Confirm that it is connected to the test cluster that is created on the port 9160, by the below from the CLI. [default@unknown] describe cluster; Cluster Information: Snitch: com.datastax.bdp.snitch.DseDelegateSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: f5a19a50-b616-11e1-0000-45b29245ddff: [127.0.1.1] If you have missed mentioning the host/port (starting the cli by just bin/cassandra-cli) or given it wrong, you will get the response as "Not connected to a cassandra instance." $ bin/dse sqoop import --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --table Category --split-by categoryName --cassandra-keyspace shopping_cart_db --cassandra-column-family Category_cf --cassandra-row-key categoryName --cassandra-thrift-host localhost --cassandra-create-schema Above command will now migrate the table "Category" in the shopping_cart_db with the primary key categoryName, into a cassandra keyspace named shopping_cart, with the cassandra row key categoryName. You may use the --direct mysql specific option, which is faster. In my above command, I have everything runs on localhost. +--------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-------------+------+-----+---------+-------+ | categoryName | varchar(50) | NO | PRI | NULL | | | description | text | YES | | NULL | | | image | blob | YES | | NULL | | +--------------+-------------+------+-----+---------+-------+ This also creates the respective java class (Category.java), inside the working directory. To import all the tables in the database, instead of a single table. $ bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct Here "-m 1" tag ensures a sequential import. If not specified, the below exception will be thrown. ERROR tool.ImportAllTablesTool: Error during import: No primary key could be found for table Category. Please specify one with --split-by or perform a sequential import with '-m 1'. To check whether the keyspace is created, [default@unknown] show keyspaces; ................ Keyspace: shopping_cart_db: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: ColumnFamily: Category_cf Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 200000.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy ............. [default@unknown] describe shopping_cart_db; Keyspace: shopping_cart_db: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: ColumnFamily: Category_cf Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 200000.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy You may also use hive to view the databases created in cassandra, in an sql-like manner. * Start Hive $ bin/dse hive hive> show databases; OK default shopping_cart_db When the entire database is imported as above, separate java classes will be created for each of the tables. $ bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct 12/06/15 15:42:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 12/06/15 15:42:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 12/06/15 15:42:11 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Category` AS t LIMIT 1 12/06/15 15:42:11 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Category.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Category.jar 12/06/15 15:42:13 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:13 INFO mapreduce.ImportJobBase: Beginning import of Category 12/06/15 15:42:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 12/06/15 15:42:15 INFO mapred.JobClient: Running job: job_201206151241_0007 12/06/15 15:42:16 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:42:25 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:42:25 INFO mapred.JobClient: Job complete: job_201206151241_0007 12/06/15 15:42:25 INFO mapred.JobClient: Counters: 18 12/06/15 15:42:25 INFO mapred.JobClient: Job Counters 12/06/15 15:42:25 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6480 12/06/15 15:42:25 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:42:25 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:42:25 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:42:25 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:42:25 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:42:25 INFO mapred.JobClient: Bytes Written=2848 12/06/15 15:42:25 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:42:25 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21419 12/06/15 15:42:25 INFO mapred.JobClient: CFS_BYTES_WRITTEN=2848 12/06/15 15:42:25 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:42:25 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:42:25 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:42:25 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:42:25 INFO mapred.JobClient: Map input records=1 12/06/15 15:42:25 INFO mapred.JobClient: Physical memory (bytes) snapshot=119435264 12/06/15 15:42:25 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:42:25 INFO mapred.JobClient: CPU time spent (ms)=630 12/06/15 15:42:25 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:42:25 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2085318656 12/06/15 15:42:25 INFO mapred.JobClient: Map output records=36 12/06/15 15:42:25 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:42:25 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 11.4492 seconds (0 bytes/sec) 12/06/15 15:42:25 INFO mapreduce.ImportJobBase: Retrieved 36 records. 12/06/15 15:42:25 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Customer` AS t LIMIT 1 12/06/15 15:42:25 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Customer.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:25 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Customer.jar 12/06/15 15:42:26 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:26 INFO mapreduce.ImportJobBase: Beginning import of Customer 12/06/15 15:42:26 INFO mapred.JobClient: Running job: job_201206151241_0008 12/06/15 15:42:27 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:42:35 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:42:35 INFO mapred.JobClient: Job complete: job_201206151241_0008 12/06/15 15:42:35 INFO mapred.JobClient: Counters: 17 12/06/15 15:42:35 INFO mapred.JobClient: Job Counters 12/06/15 15:42:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6009 12/06/15 15:42:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:42:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:42:35 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:42:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:42:35 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:42:35 INFO mapred.JobClient: Bytes Written=0 12/06/15 15:42:35 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:42:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21489 12/06/15 15:42:35 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:42:35 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:42:35 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:42:35 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:42:35 INFO mapred.JobClient: Map input records=1 12/06/15 15:42:35 INFO mapred.JobClient: Physical memory (bytes) snapshot=164855808 12/06/15 15:42:35 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:42:35 INFO mapred.JobClient: CPU time spent (ms)=510 12/06/15 15:42:35 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:42:35 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2082869248 12/06/15 15:42:35 INFO mapred.JobClient: Map output records=0 12/06/15 15:42:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:42:35 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.3143 seconds (0 bytes/sec) 12/06/15 15:42:35 INFO mapreduce.ImportJobBase: Retrieved 0 records. 12/06/15 15:42:35 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `OrderEntry` AS t LIMIT 1 12/06/15 15:42:35 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderEntry.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderEntry.jar 12/06/15 15:42:36 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:36 INFO mapreduce.ImportJobBase: Beginning import of OrderEntry 12/06/15 15:42:36 INFO mapred.JobClient: Running job: job_201206151241_0009 12/06/15 15:42:37 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:42:45 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:42:45 INFO mapred.JobClient: Job complete: job_201206151241_0009 12/06/15 15:42:45 INFO mapred.JobClient: Counters: 17 12/06/15 15:42:45 INFO mapred.JobClient: Job Counters 12/06/15 15:42:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6381 12/06/15 15:42:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:42:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:42:45 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:42:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:42:45 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:42:45 INFO mapred.JobClient: Bytes Written=0 12/06/15 15:42:45 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:42:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21569 12/06/15 15:42:45 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:42:45 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:42:45 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:42:45 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:42:45 INFO mapred.JobClient: Map input records=1 12/06/15 15:42:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=137252864 12/06/15 15:42:45 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:42:45 INFO mapred.JobClient: CPU time spent (ms)=520 12/06/15 15:42:45 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:42:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2014703616 12/06/15 15:42:45 INFO mapred.JobClient: Map output records=0 12/06/15 15:42:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:42:45 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2859 seconds (0 bytes/sec) 12/06/15 15:42:45 INFO mapreduce.ImportJobBase: Retrieved 0 records. 12/06/15 15:42:45 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `OrderItem` AS t LIMIT 1 12/06/15 15:42:45 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderItem.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:45 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderItem.jar 12/06/15 15:42:46 WARN manager.CatalogQueryManager: The table OrderItem contains a multi-column primary key. Sqoop will default to the column orderNumber only for this job. 12/06/15 15:42:46 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:46 INFO mapreduce.ImportJobBase: Beginning import of OrderItem 12/06/15 15:42:46 INFO mapred.JobClient: Running job: job_201206151241_0010 12/06/15 15:42:47 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:42:55 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:42:55 INFO mapred.JobClient: Job complete: job_201206151241_0010 12/06/15 15:42:55 INFO mapred.JobClient: Counters: 17 12/06/15 15:42:55 INFO mapred.JobClient: Job Counters 12/06/15 15:42:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5949 12/06/15 15:42:55 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:42:55 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:42:55 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:42:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:42:55 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:42:55 INFO mapred.JobClient: Bytes Written=0 12/06/15 15:42:55 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:42:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21524 12/06/15 15:42:55 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:42:55 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:42:55 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:42:55 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:42:55 INFO mapred.JobClient: Map input records=1 12/06/15 15:42:55 INFO mapred.JobClient: Physical memory (bytes) snapshot=116674560 12/06/15 15:42:55 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:42:55 INFO mapred.JobClient: CPU time spent (ms)=590 12/06/15 15:42:55 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:42:55 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2014703616 12/06/15 15:42:55 INFO mapred.JobClient: Map output records=0 12/06/15 15:42:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:42:55 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2539 seconds (0 bytes/sec) 12/06/15 15:42:55 INFO mapreduce.ImportJobBase: Retrieved 0 records. 12/06/15 15:42:55 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Payment` AS t LIMIT 1 12/06/15 15:42:55 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Payment.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Payment.jar 12/06/15 15:42:56 WARN manager.CatalogQueryManager: The table Payment contains a multi-column primary key. Sqoop will default to the column orderNumber only for this job. 12/06/15 15:42:56 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:56 INFO mapreduce.ImportJobBase: Beginning import of Payment 12/06/15 15:42:56 INFO mapred.JobClient: Running job: job_201206151241_0011 12/06/15 15:42:57 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:43:05 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:43:05 INFO mapred.JobClient: Job complete: job_201206151241_0011 12/06/15 15:43:05 INFO mapred.JobClient: Counters: 17 12/06/15 15:43:05 INFO mapred.JobClient: Job Counters 12/06/15 15:43:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5914 12/06/15 15:43:05 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:43:05 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:43:05 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:43:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:43:05 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:43:05 INFO mapred.JobClient: Bytes Written=0 12/06/15 15:43:05 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:43:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21518 12/06/15 15:43:05 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:43:05 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:43:05 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:43:05 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:43:05 INFO mapred.JobClient: Map input records=1 12/06/15 15:43:05 INFO mapred.JobClient: Physical memory (bytes) snapshot=137998336 12/06/15 15:43:05 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:43:05 INFO mapred.JobClient: CPU time spent (ms)=520 12/06/15 15:43:05 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:43:05 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2082865152 12/06/15 15:43:05 INFO mapred.JobClient: Map output records=0 12/06/15 15:43:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:43:05 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2642 seconds (0 bytes/sec) 12/06/15 15:43:05 INFO mapreduce.ImportJobBase: Retrieved 0 records. 12/06/15 15:43:05 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:43:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Product` AS t LIMIT 1 12/06/15 15:43:06 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Product.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:43:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Product.jar 12/06/15 15:43:06 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:43:06 INFO mapreduce.ImportJobBase: Beginning import of Product 12/06/15 15:43:07 INFO mapred.JobClient: Running job: job_201206151241_0012 12/06/15 15:43:08 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:43:16 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:43:16 INFO mapred.JobClient: Job complete: job_201206151241_0012 12/06/15 15:43:16 INFO mapred.JobClient: Counters: 18 12/06/15 15:43:16 INFO mapred.JobClient: Job Counters 12/06/15 15:43:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5961 12/06/15 15:43:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:43:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:43:16 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:43:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:43:16 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:43:16 INFO mapred.JobClient: Bytes Written=248262 12/06/15 15:43:16 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:43:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21527 12/06/15 15:43:16 INFO mapred.JobClient: CFS_BYTES_WRITTEN=248262 12/06/15 15:43:16 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:43:16 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:43:16 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:43:16 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:43:16 INFO mapred.JobClient: Map input records=1 12/06/15 15:43:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=144871424 12/06/15 15:43:16 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:43:16 INFO mapred.JobClient: CPU time spent (ms)=1030 12/06/15 15:43:16 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:43:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2085318656 12/06/15 15:43:16 INFO mapred.JobClient: Map output records=300 12/06/15 15:43:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:43:16 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2613 seconds (0 bytes/sec) 12/06/15 15:43:16 INFO mapreduce.ImportJobBase: Retrieved 300 records. I found DataStax an interesting project to explore more. I have blogged on the issues that I faced on this as a learner, and how easily can they be fixed - Issues that you may encounter during the migration to Cassandra using DataStax/Sqoop and the fixes.

July 16, 2012

by Pradeeban Kathiravelu

· 20,446 Views · 2 Likes

Render Geographic Information in 3D With Three.js and D3.js

The last couple of days I've been playing around with three.js and geo information. I wanted to be able to render map/geo data (e.g. in geojson format) inside the three.js scene. That way I have another dimension I could use to show a specific metric instead of just using the color in a 2D map. In this article I'll show you how you can do this. The example we'll create shows a 3D map of the Netherlands, rendered in Three.js, that uses a color to indicate the population density per municipality and the height of each municipality represents the actual number of residents. Or if you can look at a working example. This information is based on open data available from the Dutch government. If you look at the source from the example, you can see the json we use for this. For more information on geojson and how to parse it see the other articles I did on this subject: Using d3.js to visualize GIS Election site part 1: Basics with Knockout.js, Bootstrap and d3.js To get this working we'll take the following steps: Load the input geo data Setup a three.js scene Convert the input data to a Three.js path using d3.js Set the color and height of the Three.js object Render everything Just a reminder to see everything working, just look at the example. Load the input geo data D3.js has support to load json and directly transform it to an SVG path. Though this is a convenient way, I only needed the path data, not the complete SVG elements. So to load json I just used jquery's json support. // get the data jQuery.getJSON('data/cities.json', function(data, textStatus, jqXHR) { .. }); This will load the data and pass it in the data object to the supplied function. Setup a three.js scene Before we do anything with the data lets first setup a basic Three.js scene. // Set up the three.js scene. This is the most basic setup without // any special stuff function initScene() { // set the scene size var WIDTH = 600, HEIGHT = 600; // set some camera attributes var VIEW_ANGLE = 45, ASPECT = WIDTH / HEIGHT, NEAR = 0.1, FAR = 10000; // create a WebGL renderer, camera, and a scene renderer = new THREE.WebGLRenderer({antialias:true}); camera = new THREE.PerspectiveCamera(VIEW_ANGLE, ASPECT, NEAR, FAR); scene = new THREE.Scene(); // add and position the camera at a fixed position scene.add(camera); camera.position.z = 550; camera.position.x = 0; camera.position.y = 550; camera.lookAt( scene.position ); // start the renderer, and black background renderer.setSize(WIDTH, HEIGHT); renderer.setClearColor(0x000); // add the render target to the page $("#chart").append(renderer.domElement); // add a light at a specific position var pointLight = new THREE.PointLight(0xFFFFFF); scene.add(pointLight); pointLight.position.x = 800; pointLight.position.y = 800; pointLight.position.z = 800; // add a base plane on which we'll render our map var planeGeo = new THREE.PlaneGeometry(10000, 10000, 10, 10); var planeMat = new THREE.MeshLambertMaterial({color: 0x666699}); var plane = new THREE.Mesh(planeGeo, planeMat); // rotate it to correct position plane.rotation.x = -Math.PI/2; scene.add(plane); } Nothing to special, the comments inline should nicely explain what we're doing here. Next it gets more interesting. Convert the input data to a Three.js path using d3.js What we need to do next is convert our geojson input format to a THREE.Path that we can use in our scene. Three.js itself doesn't support geojson or SVG for that matter. Luckily though someone already started work on integrating d3.js with three.js. This project is called "d3-threeD" (sources can be found on github here). With this extension you can automagically render SVG elements in 3D directly from D3.js. Cool stuff, but it didn't allow me any control over how the elements were rendered. It does however contain a function we can use for our scenario. If you look through the source code of this project you'll find a method called "transformSVGPath". This method converts an SVG path string to a Three.Shape element. Unfortunately this method isn't exposed, but that's quickly solved by adding this to the d3-threeD.js file: // at the top var transformSVGPathExposed; ... // within the d3threeD(exports) function transformSVGPathExposed = transformSVGPath; This way we can call this method separately. Now that we have a way to transform an SVG path to a Three.js shape, we only need to convert the geojson to an SVG string and pass it to this function. We can use the geo functionaly from D3.js for this: geons.geoConfig = function() { this.TRANSLATE_0 = appConstants.TRANSLATE_0; this.TRANSLATE_1 = appConstants.TRANSLATE_1; this.SCALE = appConstants.SCALE; this.mercator = d3.geo.mercator(); this.path = d3.geo.path().projection(this.mercator); this.setupGeo = function() { var translate = this.mercator.translate(); translate[0] = this.TRANSLATE_0; translate[1] = this.TRANSLATE_1; this.mercator.translate(translate); this.mercator.scale(this.SCALE); } } The path variable from the previous piece of code can now be used like this: var feature = geo.path(geoFeature); To convert a geojson element to an SVG path. So how does this look combined? // add the loaded gis object (in geojson format) to the map function addGeoObject() { // keep track of rendered objects var meshes = []; ... // convert to mesh and calculate values for (var i = 0 ; i < data.features.length ; i++) { var geoFeature = data.features[i] var feature = geo.path(geoFeature); // we only need to convert it to a three.js path var mesh = transformSVGPathExposed(feature); // add to array meshes.push(mesh); ... } As you can see we iterate over the data.features list (this contains all the geojson representations of the municipalities). Each municipality is converted to an svg string, and each svg string is converted to a mesh. This mesh is a Three.js object that we can render on the scene. Set the color and height of the Three.js object Now we just need to set the height and the color of the Three.js shape and add it to the scene. The extended addGeoObject method now looks like this: // add the loaded gis object (in geojson format) to the map function addGeoObject() { // keep track of rendered objects var meshes = []; var averageValues = []; var totalValues = []; // keep track of min and max, used to color the objects var maxValueAverage = 0; var minValueAverage = -1; // keep track of max and min of total value var maxValueTotal = 0; var minValueTotal = -1; // convert to mesh and calculate values for (var i = 0 ; i < data.features.length ; i++) { var geoFeature = data.features[i] var feature = geo.path(geoFeature); // we only need to convert it to a three.js path var mesh = transformSVGPathExposed(feature); // add to array meshes.push(mesh); // we get a property from the json object and use it // to determine the color later on var value = parseInt(geoFeature.properties.bev_dichth); if (value > maxValueAverage) maxValueAverage = value; if (value < minValueAverage || minValueAverage == -1) minValueAverage = value; averageValues.push(value); // and we get the max values to determine height later on. value = parseInt(geoFeature.properties.aant_inw); if (value > maxValueTotal) maxValueTotal = value; if (value < minValueTotal || minValueTotal == -1) minValueTotal = value; totalValues.push(value); } // we've got our paths now extrude them to a height and add a color for (var i = 0 ; i < averageValues.length ; i++) { // create material color based on average var scale = ((averageValues[i] - minValueAverage) / (maxValueAverage - minValueAverage)) * 255; var mathColor = gradient(Math.round(scale),255); var material = new THREE.MeshLambertMaterial({ color: mathColor }); // create extrude based on total var extrude = ((totalValues[i] - minValueTotal) / (maxValueTotal - minValueTotal)) * 100; var shape3d = meshes[i].extrude({amount: Math.round(extrude), bevelEnabled: false}); // create a mesh based on material and extruded shape var toAdd = new THREE.Mesh(shape3d, material); // rotate and position the elements nicely in the center toAdd.rotation.x = Math.PI/2; toAdd.translateX(-490); toAdd.translateZ(50); toAdd.translateY(extrude/2); // add to scene scene.add(toAdd); } } // simple gradient function function gradient(length, maxLength) { var i = (length * 255 / maxLength); var r = i; var g = 255-(i); var b = 0; var rgb = b | (g << 8) | (r << 16); return rgb; } A big piece of code, but not that complex. What we do here is we keep track of two values for each municipality: the population density and the total population. These values are used to respectively calculate the color (using the gradient function) and the height. The height is used in the Three.js extrude function which converts our 2D Three.Js path to a 3D shape. The color is used to define a material. This shape and material is used to create the Mesh that we add to the scene. Render everything All that is left is to render everything. For this example we're not interested in animations or anything so we can make a single call to the renderer: renderer.render( scene, camera ); And the result is as you saw in the beginning. The following image shows a different example. This time we once again show the population density, but now the height represents the land area of the municipality. I'm currently creating a new set of geojson data, but this time for the whole of Europe. So in the next couple of weeks expect some articles using maps of Europe.

July 16, 2012

by Jos Dirksen

· 41,334 Views

Paging ASP.NET ListView using DataPager without using DataSource Control

If you have already used the ASP.NET ListView and DataPager controls, you know how easily you can display your data using custom templates and provide pagination functionality. You can do that in only few minutes. The ListView and DataPager controls work perfectly fine in combination with DataSource controls (SqlDataSource, LinqDataSource, ObjectDataSource etc.), however if you use them with custom data collection, without using Data Source controls, you may find some unexpected behavior and have to add some little more code to make it work. Note: I saw question related to this issue asked multiple times in different asp.net forums, so I thought it would be nice to document it here. Lets create a working demo together… 1. Create sample ASP.NET Web Application project, add new ASPX page. 2. Add ListView control and modify the markup so that you will end up having this: ( ) No data The ListView control has three templates defined: LayoutTemplate – where we define the way we want to represent our data. We have PlaceHolder where data from ItemTemplate will be placed. ListView recognizes this automatically. ItemTemplate – where the ListView control will show the items from the data source. It makes automatically iterating over each item in the collection (same as any other data source control) EmptyDataTemplate – If the collection has no data, this template will be displayed. 3. Add DataPager control and modify the markup in the following way: We add the DataPager control, associate it with the ListView1 control and add the PageSize property. After that, we need to define where we put the field type and button type. If we were binding from Data Source Control (SqlDataSource, LinqDataSource or any other…) this would be it and the ListView and DataPager would work perfectly fine. However, if we bind custom data collection to the ListView without using Data Source controls, we will have problems with the pagination. Lets add custom data in our C# code and bind it to the ListView. 4. C# code adding custom data collection - Define Product class public class Product { public string Name { get; set; } public decimal Price { get; set; } public string Currency { get; set; } } - Define method that will create List of products (sample data) List SampleData() { List p = new List(); p.Add(new Product() { Name = "Microsoft Windows 7", Price = 70, Currency = "USD" }); p.Add(new Product() { Name = "HP ProBook", Price = 320, Currency = "USD" }); p.Add(new Product() { Name = "Microsoft Office Home", Price = 60, Currency = "USD" }); p.Add(new Product() { Name = "NOKIA N900", Price = 350, Currency = "USD" }); p.Add(new Product() { Name = "BlackBerry Storm", Price = 100, Currency = "USD" }); p.Add(new Product() { Name = "Apple iPhone", Price = 400, Currency = "USD" }); p.Add(new Product() { Name = "HTC myTouch", Price = 200, Currency = "USD" }); return p; } This method should be part of the ASPX page class (e.g. inside _Default page class if your page is Default.aspx) - Bind the sample data to ListView on Page_Load protected void Page_Load(object sender, EventArgs e) { if (!IsPostBack) { BindListView(); } } void BindListView() { ListView1.DataSource = SampleData(); ListView1.DataBind(); } Now, run the project and you should see the data displayed where only the first five items will be shown. The data pager should have two pages (1 2) and you will be able to click the second page to navigate to the last two items in the `collection. Now, if you click on page 2, you will see it won’t display the last two items automatically, instead you will have to click again on page 2 to see the last two items. After, if you click on page 1, you will encounter another problem where the five items are displayed, but the data for the last two items in the ListView are shown (see print screen bellow) If you notice in the previous two pictures, the behavior doesn’t seem to work properly. The problem here is that DataPager doesn’t know about current ListView page changing property. Therefore, we should explicitly set the DataPager Page Properties in the ListView’s PagePropertiesChanging event. Here is what we need to do: 1. Add OnPagePropertiesChanging event to ListView control 2. Implement the ListView1_PagePropertiesChanging method protected void ListView1_PagePropertiesChanging(object sender, PagePropertiesChangingEventArgs e) { //set current page startindex, max rows and rebind to false lvDataPager1.SetPageProperties(e.StartRowIndex, e.MaximumRows, false); //rebind List View BindListView(); } Now, if you test the functionality, it should work properly. Hope this was helpful. Regards, Hajan

July 14, 2012

by Hajan Selmani

· 64,131 Views

JMS With ActiveMQ

Java Message Service is a mechanism for integrating applications in a loosely coupled, flexible manner and delivers data asynchronously across applications.

July 14, 2012

by Pavithra Gunasekara

· 165,483 Views · 13 Likes

The Most Pressed Keys in Various Programming Languages

i switch between programming languages quite a bit; i often wondered what happens when having to deal with the different syntaxes, does the syntax allow you to be more expressive or faster at coding in one language or another. i don't really know about that; but what i do know what keys are pressed when writing with different programming languages. this might be something interesting for people who are deciding to select a programming language might look into, here is a post on the answer to the aged question of: which programming language should i learn? as far as i can tell languages with a wider focused spread across the keyboard are usually syntaxes we usually associate with ugly languages (ugly to read and code). ex. shell and perl. you might argue that the variables names being used will alter the results, but as most languages programming have conventions for naming but we can assume a decent spread for variable names. i don’t offer conclusions, just poorly layout the facts. although the heat map does miss out on things like shift and caps. ex. in perl with the dollar sign. ($) whitespace hasn’t been taken into consideration (tabs and spaces) which would have been a cool thing to see. the data that was used to gather this information was spread amongst various popular github projects. javascript shell java c c++ ruby python php perl objc lisp lisp code here was written by paul graham. references heatmap.js http://www.patrick-wied.at/projects/heatmap-keyboard/

July 12, 2012

by Mahdi Yusuf

· 39,265 Views

Working with MongoDB MultiMaster

Learn all about working with MondoDB multimaster.

July 11, 2012

by Rick Copeland

· 28,252 Views · 2 Likes

5 Things You Should Check Now to Improve PHP Web Performance

We all know how financially important it is for your app’s server architecture to handle peaks of load. This article discusses 5 tips for improving PHP Web performance.

July 11, 2012

by Gonzalo Ayuso

· 263,950 Views · 2 Likes

Testing Zabbix Trigger Expressions

When defining a Zabbix (1.8.2) trigger e.g. to inform you that there are errors in a log file, how do you verify that it is correct? As somebody recommended in a forum, you can use a Calculated Item with a similar expression (the syntax is little different from triggers). Contrary to triggers, the value of a calculated item is easy to see and the historical values are stored so you can check how it evolved. If your trigger expression is complex the you can create multiple calculated items, one for each subexpression. Example If we have a log item that sends us data whenever the text “ERROR” appears in a log line and the corresponding trigger expected to fire if we have got any data from the item in the last 600 sec (nodata() returns 1 if there indeed was no data): {hive.example.com:log["/tmp/ada/hive.log","ERROR",,20].nodata(600)}=0 Then we could test it with a calculated item with the expression nodata("hive.example.com:log[\"/tmp/ada/hive.log\",\"ERROR\",,20]", 600) (Notice that the function comes first, taking the host:item as its 0th parameter and that this is enclosed with “”, escaping any nested ” with \.) The value of the calculated item will be re-checked every (independent on whether the source item changed or not) and stored, in this case it will either thave the value of 0 or 1. We can also construct more complex expressions with &, + etc. similarly to trigger expressions.

July 11, 2012

by Jakub Holý

· 12,086 Views

Everything You Need To Know About Couchbase Architecture

After receiving a lot of good feedback and comment on my last blog on MongoDb, I was encouraged to do another deep dive on another popular document oriented db; Couchbase. I have been a long-time fan CouchDb and has wrote a blog on it many years ago. After it merges with Membase, I am very excited to take a deep look into it again. Couchbase is the merge of two popular NOSQL technologies: Membase, which provides persistence, replication, sharding to the high performance memcached technology CouchDB, which pioneers the document oriented model based on JSON Like other NOSQL technologies, both Membase and CouchDB are built from the ground up on a highly distributed architecture, with data shard across machines in a cluster. Built around the Memcached protocol, Membase provides an easy migration to existing Memcached users who want to add persistence, sharding and fault resilience on their familiar Memcached model. On the other hand, CouchDB provides first class support for storing JSON documents as well as a simple RESTful API to access them. Underneath, CouchDB also has a highly tuned storage engine that is optimized for both update transaction as well as query processing. Taking the best of both technologies, Membase is well-positioned in the NOSQL marketplace. Programming model Couchbase provides client libraries for different programming languages such as Java / .NET / PHP / Ruby / C / Python / Node.js For read, Couchbase provides a key-based lookup mechanism where the client is expected to provide the key, and only the server hosting the data (with that key) will be contacted. Couchbase also provides a query mechanism to retrieve data where the client provides a query (for example, range based on some secondary key) as well as the view (basically the index). The query will be broadcasted to all servers in the cluster and the result will be merged and sent back to the client. For write, Couchbase provides a key-based update mechanism where the client sends in an updated document with the key (as doc id). When handling write request, the server will return to client’s write request as soon as the data is stored in RAM on the active server, which offers the lowest latency for write requests. Following is the core API that Couchbase offers. (in an abstract sense) # Get a document by key doc = get(key) # Modify a document, notice the whole document # need to be passed in set(key, doc) # Modify a document when no one has modified it # since my last read casVersion = doc.getCas() cas(key, casVersion, changedDoc) # Create a new document, with an expiration time # after which the document will be deleted addIfNotExist(key, doc, timeToLive) # Delete a document delete(key) # When the value is an integer, increment the integer increment(key) # When the value is an integer, decrement the integer decrement(key) # When the value is an opaque byte array, append more # data into existing value append(key, newData) # Query the data results = query(viewName, queryParameters) In Couchbase, document is the unit of manipulation. Currently Couchbase doesn't support server-side execution of custom logic. Couchbase server is basically a passive store and unlike other document oriented DB, Couchbase doesn't support field-level modification. In case of modifying documents, client need to retrieve documents by its key, do the modification locally and then send back the whole (modified) document back to the server. This design tradeoff network bandwidth (since more data will be transferred across the network) for CPU (now CPU load shift to client). Couchbase currently doesn't support bulk modification based on a condition matching. Modification happens only in a per document basis. (client will save the modified document one at a time). Transaction Model Similar to many NOSQL databases, Couchbase’s transaction model is primitive as compared to RDBMS. Atomicity is guaranteed at a single document and transactions that span update of multiple documents are unsupported. To provide necessary isolation for concurrent access, Couchbase provides a CAS (compare and swap) mechanism which works as follows … When the client retrieves a document, a CAS ID (equivalent to a revision number) is attached to it. While the client is manipulating the retrieved document locally, another client may modify this document. When this happens, the CAS ID of the document at the server will be incremented. Now, when the original client submits its modification to the server, it can attach the original CAS ID in its request. The server will verify this ID with the actual ID in the server. If they differ, the document has been updated in between and the server will not apply the update. The original client will re-read the document (which now has a newer ID) and re-submit its modification. Couchbase also provides a locking mechanism for clients to coordinate their access to documents. Clients can request a LOCK on the document it intends to modify, update the documents and then releases the LOCK. To prevent a deadlock situation, each LOCK grant has a timeout so it will automatically be released after a period of time. Deployment Architecture In a typical setting, a Couchbase DB resides in a server clusters involving multiple machines. Client library will connect to the appropriate servers to access the data. Each machine contains a number of daemon processes which provides data access as well as management functions. The data server, written in C/C++, is responsible to handle get/set/delete request from client. The Management server, written in Erlang, is responsible to handle the query traffic from client, as well as manage the configuration and communicate with other member nodes in the cluster. Virtual Buckets The basic unit of data storage in Couchbase DB is a JSON document (or primitive data type such as int and byte array) which is associated with a key. The overall key space is partitioned into 1024 logical storage unit called "virtual buckets" (or vBucket). vBucket are distributed across machines within the cluster via a map that is shared among servers in the cluster as well as the client library. High availability is achieved through data replication at the vBucket level. Currently Couchbase supports one active vBucket zero or more standby replicas hosted in other machines. Curremtly the standby server are idle and not serving any client request. In future version of Couchbase, the standby replica will be able to serve read request. Load balancing in Couchbase is achieved as follows: Keys are uniformly distributed based on the hash function When machines are added and removed in the cluster. The administrator can request a redistribution of vBucket so that data are evenly spread across physical machines. Management Server Management server performs the management function and co-ordinate the other nodes within the cluster. It includes the following monitoring and administration functions Heartbeat: A watchdog process periodically communicates with all member nodes within the same cluster to provide Couchbase Server health updates. Process monitor: This subsystem monitors execution of the local data manager, restarting failed processes as required and provide status information to the heartbeat module. Configuration manager: Each Couchbase Server node shares a cluster-wide configuration which contains the member nodes within the cluster, a vBucket map. The configuration manager pull this config from other member nodes at bootup time. Within a cluster, one node’s Management Server will be elected as the leader which performs the following cluster-wide management function Controls the distribution of vBuckets among other nodes and initiate vBucket migration Orchestrates the failover and update the configuration manager of member nodes If the leader node crashes, a new leader will be elected from surviving members in the cluster. When a machine in the cluster has crashed, the leader will detect that and notify member machines in the cluster that all vBuckets hosted in the crashed machine is dead. After getting this signal, machines hosting the corresponding vBucket replica will set the vBucket status as “active”. The vBucket/server map is updated and eventually propagated to the client lib. Notice that at this moment, the replication level of the vBucket will be reduced. Couchbase doesn’t automatically re-create new replicas which will cause data copying traffic. Administrator can issue a command to explicitly initiate a data rebalancing. The crashed machine, after reboot can rejoin the cluster. At this moment, all the data it stores previously will be completely discard and the machine will be treated as a brand new empty machine. As more machines are put into the cluster (for scaling out), vBucket should be redistributed to achieve a load balance. This is currently triggered by an explicit command from the administrator. Once receive the “rebalance” command, the leader will compute the new provisional map which has the balanced distribution of vBuckets and send this provisional map to all members of the cluster. To compute the vBucket map and migration plan, the leader attempts the following objectives: Evenly distribute the number of active vBuckets and replica vBuckets among member nodes. Place the active copy and each replicas in physically separated nodes. Spread the replica vBucket as wide as possible among other member nodes. Minimize the amount of data migration Orchestrate the steps of replica redistribution so no node or network will be overwhelmed by the replica migration. Once the vBucket maps is determined, the leader will pass the redistribution map to each member in the cluster and coordinate the steps of vBucket migration. The actual data transfer happens directly between the origination node to the destination node. Notice that since we have generally more vBuckets than machines. The workload of migration will be evenly distributed automatically. For example, when new machines are added into the clusters, all existing machines will migrate some portion of its vBucket to the new machines. There is no single bottleneck in the cluster. Throughput the migration and redistribution of vBucket among servers, the life cycle of a vBucket in a server will be in one of the following states “Active”: means the server is hosting the vBucket is ready to handle both read and write request “Replica”: means the server is hosting the a copy of the vBucket that may be slightly out of date but can take read request that can tolerate some degree of outdate. “Pending”: means the server is hosting a copy that is in a critical transitional state. The server cannot take either read or write request at this moment. “Dead”: means the server is no longer responsible for the vBucket and will not take either read or write request anymore. Data Server Data server implements the memcached APIs such as get, set, delete, append, prepend, etc. It contains the following key datastructure: One in-memory hashtable (key by doc id) for the corresponding vBucket hosted. The hashtable acts as both a metadata for all documents as well as a cache for the document content. Maintain the entry gives a quick way to detect whether the document exists on disk. To support async write, there is a checkpoint linkedlist per vBucket holding the doc id of modified documents that hasn't been flushed to disk or replicated to the replica. To handle a "GET" request Data server routes the request to the corresponding ep-engine responsible for the vBucket. The ep-engine will lookup the document id from the in-memory hastable. If the document content is found in cache (stored in the value of the hashtable), it will be returned. Otherwise, a background disk fetch task will be created and queued into the RO dispatcher queue. The RO dispatcher then reads the value from the underlying storage engine and populates the corresponding entry in the vbucket hash table. Finally, the notification thread notifies the disk fetch completion to the memcached pending connection, so that the memcached worker thread can revisit the engine to process a get request. To handle a "SET" request, a success response will be returned to the calling client once the updated document has been put into the in-memory hashtable with a write request put into the checkpoint buffer. Later on the Flusher thread will pickup the outstanding write request from each checkpoint buffer, lookup the corresponding document content from the hashtable and write it out to the storage engine. Of course, data can be lost if the server crashes before the data has been replicated to another server and/or persisted. If the client requires a high data availability across different crashes, it can issue a subsequent observe() call which blocks on the condition that the server persist data on disk, or the server has replicated the data to another server (and get its ACK). Overall speaking, the client has various options to tradeoff data integrity with throughput. Hashtable Management To synchronize accesses to a vbucket hash table, each incoming thread needs to acquire a lock before accessing a key region of the hash table. There are multiple locks per vbucket hash table, each of which is responsible for controlling exclusive accesses to a certain ket region on that hash table. The number of regions of a hash table can grow dynamically as more documents are inserted into the hash table. To control the memory size of the hashtable, Item pager thread will monitor the memory utilization of the hashtable. Once a high watermark is reached, it will initiate an eviction process to remove certain document content from the hashtable. Only entries that is not referenced by entries in the checkpoint buffer can be evicted because otherwise the outstanding update (which only exists in hashtable but not persisted) will be lost. After eviction, the entry of the document still remains in the hashtable; only the document content of the document will be removed from memory but the metadata is still there. The eviction process stops after reaching the low watermark. The high / low water mark is determined by the bucket memory quota. By default, the high water mark is set to 75% of bucket quota, while the low water mark is set to 60% of bucket quota. These water marks can be configurable at runtime. In CouchDb, every document is associated with an expiration time and will be deleted once it is expired. Expiry pager is responsible for tracking and removing expired document from both the hashtable as well as the storage engine (by scheduling a delete operation). Checkpoint Manager Checkpoint manager is responsible to recycle the checkpoint buffer, which holds the outstanding update request, consumed by the two downstream processes, Flusher and TAP replicator. When all the request in the checkpoint buffer has been processed, the checkpoint buffer will be deleted and a new one will be created. TAP Replicator TAP replicator is responsible to handle vBucket migration as well as vBucket replication from active server to replica server. It does this by propagating the latest modified document to the corresponding replica server. At the time a replica vBucket is established, the entire vBucket need to be copied from the active server to the empty destination replica server as follows The in-memory hashtable at the active server will be transferred to the replica server. Notice that during this period, some data may be updated and therefore the data set transfered to the replica can be inconsistent (some are the latest and some are outdated). Nevertheless, all updates happen after the start of transfer is tracked in the checkpoint buffer. Therefore, after the in-memory hashtable transferred is completed, the TAP replicator can pickup those updates from the checkpoint buffer. This ensures the latest versioned of changed documents are sent to the replica, and hence fix the inconsistency. However the hashtable cache doesn’t contain all the document content. Data also need to be read from the vBucket file and send to the replica. Notice that during this period, update of vBucket will happen in active server. However, since the file is appended only, subsequent data update won’t interfere the vBucket copying process. After the replica server has caught up, subsequent update at the active server will be available at its checkpoint buffer which will be pickup by the TAP replicator and send to the replica server. CouchDB Storage Structure Data server defines an interface where different storage structure can be plugged-in. Currently it supports both a SQLite DB as well as CouchDB. Here we describe the details of CouchDb, which provides a super high performance storage mechanism underneath the Couchbase technology. Under the CouchDB structure, there will be one file per vBucket. Data are written to this file in an append-only manner, which enables Couchbase to do mostly sequential writes for update, and provide the most optimized access patterns for disk I/O. This unique storage structure attributes to Couchbase’s fast on-disk performance for write-intensive applications. The following diagram illustrate the storage model and how it is modified by 3 batch updates (notice that since updates are asynchronous, it is perform by "Flusher" thread in batches). The Flusher thread works as follows: 1) Pick up all pending write request from the dirty queue and de-duplicate multiple update request to the same document. 2) Sort each request (by key) into corresponding vBucket and open the corresponding file 3) Append the following into the vBucket file (in the following contiguous sequence) All document contents in such write request batch. Each document will be written as [length, crc, content] one after one sequentially. The index that stores the mapping from document id to the document’s position on disk (called the BTree by-id) The index that stores the mapping from update sequence number to the document’s position on disk. (called the BTree by-seq) The by-id index plays an important role for looking up the document by its id. It is organized as a B-Tree where each node contains a key range. To lookup a document by id, we just need to start from the header (which is the end of the file), transfer to the root BTree node of the by-id index, and then further traverse to the leaf BTree node that contains the pointer to the actual document position on disk. During the write, the similar mechanism is used to trace back to the corresponding BTree node that contains the id of the modified documents. Notice that in the append-only model, update is not happening in-place, instead we located the existing location and copy it over by appending. In other words, the modified BTree node will be need to be copied over and modified and finally paste to the end of file, and then its parent need to be modified to point to the new location, which triggers the parents to be copied over and paste to the end of file. Same happens to its parents’ parent and eventually all the way to the root node of the BTree. The disk seek can be at the O(logN) complexity. The by-seq index is used to keep track of the update sequence of lived documents and is used for asynchronous catchup purposes. When a document is created, modified or deleted, a sequence number is added to the by-seq btree and the previous seq node will be deleted. Therefore, for cross-site replication, view index update and compaction, we can quickly locate all the lived documents in the order of their update sequence. When a vBucket replicator asks for the list of update since a particular time, it provides the last sequence number in previous update, the system will then scan through the by-seq BTree node to locate all the document that has sequence number larger than that, which effectively includes all the document that has been modified since the last replication. As time goes by, certain data becomes garbage (see the grey-out region above) and become unreachable in the file. Therefore, we need a garbage collection mechanism to clean up the garbage. To trigger this process, the by-id and by-seq B-Tree node will keep track of the data size of lived documents (those that is not garbage) under its substree. Therefore, by examining the root BTree node, we can determine the size of all lived documents within the vBucket. When the ratio of actual size and vBucket file size fall below a certain threshold, a compaction process will be triggered whose job is to open the vBucket file and copy the survived data to another file. Technically, the compaction process opens the file and read the by-seq BTree at the end of the file. It traces the Btree all the way to the leaf node and copy the corresponding document content to the new file. The compaction process happens while the vBucket is being updated. However, since the file is appended only, new changes are recorded after the BTree root that the compaction has opened, so subsequent data update won’t interfere with the compaction process. When the compaction is completed, the system need to copy over the data that was appended since the beginning of the compaction to the new file. View Index Structure Unlike most indexing structure which provide a pointer from the search attribute back to the document. The CouchDb index (called View Index) is better perceived as a denormalized table with arbitrary keys and values loosely associated to the document. Such denormalized table is defined by a user-provided map() and reduce() function. map = function(doc) { … emit(k1, v1) … emit(k2, v2) … } reduce = function(keys, values, isRereduce) { if (isRereduce) { // Do the re-reduce only on values (keys will be null) } else { // Do the reduce on keys and values } // result must be ready for input values to re-reduce return result } Whenever a document is created, updated, deleted, the corresponding map(doc) function will be invoked (in an asynchronous manner) to generate a set of key/value pairs. Such key/value will be stored in a B-Tree structure. All the key/values pairs of each B-Tree node will be passed into the reduce() function, which compute an aggregated value within that B-Tree node. Re-reduce also happens in non-leaf B-Tree nodes which further aggregate the aggregated value of child B-Tree nodes. The management server maintains the view index and persisted it to a separate file. Create a view index is perform by broadcast the index creation request to all machines in the cluster. The management process of each machine will read its active vBucket file and feed each surviving document to the Map function. The key/value pairs emitted by the Map function will be stored in a separated BTree index file. When writing out the BTree node, the reduce() function will be called with the list of all values in the tree node. Its return result represent a partially reduced value is attached to the BTree node. The view index will be updated incrementally as documents are subsequently getting into the system. Periodically, the management process will open the vBucket file and scan all documents since the last sequence number. For each changed document since the last sync, it invokes the corresponding map function to determine the corresponding key/value into the BTree node. The BTree node will be split if appropriate. Underlying, Couchbase use a back index to keep track of the document with the keys that it previously emitted. Later when the document is deleted, it can look up the back index to determine what those key are and remove them. In case the document is updated, the back index can also be examined; semantically a modification is equivalent to a delete followed by an insert. The following diagram illustrates how the view index file will be incrementally updated via the append-only mechanism. Query Processing Query in Couchbase is made against the view index. A query is composed of the view name, a start key and end key. If the reduce() function isn’t defined, the query result will be the list of values sorted by the keys within the key range. In case the reduce() function is defined, the query result will be a single aggregated value of all keys within the key range. If the view has no reduce() function defined, the query processing proceeds as follows: Client issue a query (with view, start/end key) to the management process of any server (unlike a key based lookup, there is no need to locate a specific server). The management process will broadcast the request to other management process on all servers (include itself) within the cluster. Each management process (after receiving the broadcast request) do a local search for value within the key range by traversing the BTree node of its view file, and start sending back the result (automatically sorted by the key) to the initial server. The initial server will merge the sorted result and stream them back to the client. However, if the view has reduce() function defined, the query processing will involve computing a single aggregated value as follows: Client issue a query (with view, start/end key) to the management process of any server (unlike a key based lookup, there is no need to locate a specific server). The management process will broadcast the request to other management process on all servers (include itself) within the cluster. Each management process do a local reduce for value within the key range by traversing the BTree node of its view file to compute the reduce value of the key range. If the key range span across a BTree node, the pre-computed of the sub-range can be used. This way, the reduce function can reuse a lot of partially reduced values and doesn’t need to recomputed every value of the key range from scratch. The original server will do a final re-reduce() in all the return value from each other servers, and then passed back the final reduced value to the client. To illustrate the re-reduce concept, lets say the query has its key range from A to F. Instead of calling reduce([A,B,C,D,E,F]), the system recognize the BTree node that contains [B,C,D] has been pre-reduced and the result P is stored in the BTree node, so it only need to call reduce(A,P,E,F). Update View Index as vBucket migrates Since the view index is synchronized with the vBuckets in the same server, when the vBucket has migrated to a different server, the view index is no longer correct; those key/value that belong to a migrated vBucket should be discarded and the reduce value cannot be used anymore. To keep track of the vBucket and key in the view index, each bTree node has a 1024-bitmask indicating all the vBuckets that is covered in the subtree (ie: it contains a key emitted from a document belonging to the vBucket). Such bit-mask is maintained whenever the bTree node is updated. At the server-level, a global bitmask is used to indicate all the vBuckets that this server is responsible for. In processing the query of the map-only view, before the key/value pair is returned, an extra check will be perform for each key/value pair to make sure its associated vBucket is what this server is responsible for. When processing the query of a view that has a reduce() function, we cannot use the pre-computed reduce value if the bTree node contains a vBucket that the server is not responsible for. In this case, the bTree node’s bit mask is compared with the global bit mask. In case if they are not aligned, then the reduce value need to be recomputed. Here is an example to illustrate this process Couchbase is one of the popular NOSQL technology built on a solid technology foundation designed for high performance. In this post, we have examined a number of such key features: Load balancing between servers inside a cluster that can grow and shrink according to workload conditions. Data migration can be used to re-achieve workload balance. Asynchronous write provides lowest possible latency to client as it returns once the data is store in memory. Append-only update model pushes most update transaction into sequential disk access, hence provide extremely high throughput for write intensive applications. Automatic compaction ensures the data lay out on disk are kept optimized all the time. Map function can be used to pre-compute view index to enable query access. Summary data can be pre-aggregated using the reduce function. Overall, this cut down the workload of query processing dramatically. For a review on NOSQL architecture in general and some theoretical foundation, I have wrote a NOSQL design pattern blog, as well as some fundamental difference between SQL and NOSQL. For other NOSQL technologies, please read my other blog on MongoDb, Cassandra and HBase, Memcached Special thanks to Damien Katz and Frank Weigel from Couchbase team who provide a lot of implementation details of Couchbase.

July 7, 2012

by Ricky Ho

· 84,819 Views · 5 Likes

Apache Camel Monitoring

I've seen a lot of discussion about how to monitor Camel based applications. Most people are looking for the following features: ability to view services (contexts, endpoints, routes), to view performance statistics (route throughput, etc) and to perform basic operations (start/stop routes, send messages, etc). This post will breakdown the options (that I know of) that are available today (as of Camel 2.8). If you have used other approaches or know of other ongoing development in this area, please let me know. JMX APIs Camel uses JMX to provide a standardized way to access metadata about contexts/routes/endpoints defined in a given application. Also, you can use JMX to interact with these components (start/stop routes, etc) in some interesting ways. I recently had some very specific Camel/ActiveMQ monitoring requests from a client. After looking at the options, we ended up building a standalone Tomcat web app that used JSPs, jQuery, Ajax and JMX APIs to view route/endpoint statistics, manage Camel routes (stop, start, etc) and monitor/manipulate ActiveMQ queues. It provided some much needed visibility and management features for our Camel/ActiveMQ based message processing application... CamelContext If you have a handle to the CamelContext, there are various APIs that can help describe and manage routes and endpoints. These are used by the existing Camel Web Console and can be used to build custom interface to retrieve and use this information in various ways... here are some of the notable APIs... getRouteDefinitions() getEndpoints() getEndpointsMap() getRouteStatus(routeId) startRoute(routeId) stopRoute(routeId) removeRoute(routeId) addRoutes(routeBuilder) suspendRoute(routeId) resumeRoute(routeId) With a little creativity, you can use these APIs to manage/monitor and re-wire a Camel application dynamically. Camel Web Console This console provides web and REST interfaces to Camel contexts/routes/endpoints and allows you to view/manage endpoints/routes, send messages to endpoints, viewing route statistics, etc. That being said, using this web console with an existing Camel application is tricky at the moment. It's currently deployed as a war file that only has access to the CamelContext defined in its embedded spring XML file. Though the entire camel-web project can be embedded and customized in your application if you desire (and know Scalate). Given my recent client requirements, I opted to build my own basic app using JSPs/JMX as described above. There has been some recent support for deploying this console in OSGI, where it should be able to view any CamelContexts deployed in the container, etc. However, I'm yet to see this work...more on this later. Using Camel APIs There are also a number of Camel technologies/patterns that can be used to add monitoring to existing routes. wire tap - can add message logging (to a file or JMS queue/topic, etc) or other inline processing advicewith - can be used to modify existing routes to apply before/after operations or add/remove operations in a route intercept - can be used to intercept Exchanges while they are in route, can apply to all endpoints, certain endpoints or just starting endpoints BrowsableEndpoint - is an interface which Endpoints may implement to support the browsing of the exchanges which are pending or have been sent on it. That being said, it takes some creativity to use these effectively and caution to not adversely affect the routes you are trying to monitor. Hyperic HQ You can use this tool to monitor Servicemix (or any process), but it more geared towards system monitoring and JVM stats. I didn't find it useful for any Camel specific monitoring. jConsole/VisualVM these are standard JMX based consoles. They aren't web based and can't be customized (easily anyways) to provide anything more than a tree-like view of JMX MBeans. If you know where to look though, you can do a lot with it. Summary These are just some quick notes at this point. As I learn about other ways of monitoring Camel, I'll update this list and give some more detailed comparison. Any comments are welcome...

June 27, 2012

by Ben O'Day

· 20,144 Views

Using Cookies to implement a RememberMe functionality

Some web applications may need a "Remember Me" functionality. This means that, after a user login, user will have access from same machine to all its data even after session expired. This access will be possible until user does a logout. If you are using Spring and its login form, then you should use "Remember Me" functionality already implemented inside the framework. Some web frameworks also offer a type of SignIn panel which already has "remember me" built-in. But in case you have to implement "Remember Me" functionality by your own, this can be easily achieved using Cookies. Java has a Cookie class named javax.servlet.http.Cookie. Algorithm is straight-forward: your login panel must contain a "Remember Me" check after a succesfull login with "Remember Me" check selected, you can create two cookies: one to keep the value for rememberMe and one to keep a token which has to identify the logged user. For sake of security, this token must never contain user name or user password. The ideea is to generate a random id as token value. And token value aside with user id must be saved in your storage (database) whenever a login is needed, you have to look if there is any cookie saved by you, and if so and your "rememberMe" value is true, you can take the user from storage based on your token and do an automatic login. when a logout is done, you have to delete the cookie that keeps the token To add a cookie, you have to specify the maximum age of the cookie in seconds : HttpServletResponse servletResponse = ...; Cookie c = new Cookie(COOKIE_NAME, encodeString(uuid)); c.setMaxAge(365 * 24 * 60 * 60); // one year servletResponse.addCookie(c); To delete a cookie, you have to find cookie by name and set its maximum age to 0, before adding it to servlet response: HttpServletRequest servletRequest = ...; HttpServletResponse servletResponse = ... ; Cookie[] cookies = servletRequest.getCookies(); for (int i = 0; i < cookies.length; i++) { Cookie c = cookies[i]; if (c.getName().equals(COOKIE_NAME)) { c.setMaxAge(0); c.setValue(null); servletResponse.addCookie(c); } }

June 26, 2012

by Mihai Dinca - Panaitescu

· 59,009 Views · 1 Like

C# – Generic Serialization Methods

Short blog post today. These are a couple of generic serialize and deserialize methods that can be easily used when needing to serialize and deserialize classes. The methods work with any .Net type. That includes built-in .Net types and custom classes that you might create yourself. These methods will only serialize PUBLIC properties of a class. Also, the XML will be human-readable instead of one long line of text. Serialize Method /// /// Serializes the data in the object to the designated file path /// /// Type of Object to serialize /// Object to serialize /// FilePath for the XML file public static void Serialize(T dataToSerialize, string filePath) { try { using (Stream stream = File.Open(filePath, FileMode.Create, FileAccess.ReadWrite)) { XmlSerializer serializer = new XmlSerializer(typeof(T)); XmlTextWriter writer = new XmlTextWriter(stream, Encoding.Default); writer.Formatting = Formatting.Indented; serializer.Serialize(writer, dataToSerialize); writer.Close(); } } catch { throw; } } Deserialize Method /// /// Deserializes the data in the XML file into an object /// /// Type of object to deserialize /// FilePath to XML file /// Object containing deserialized data public static T Deserialize(string filePath) { try { XmlSerializer serializer = new XmlSerializer(typeof(T)); T serializedData; using (Stream stream = File.Open(filePath, FileMode.Open, FileAccess.Read)) { serializedData = (T)serializer.Deserialize(stream); } return serializedData; } catch { throw; } } Here is some sample code to show the methods in action. Person p = new Person() { Name = "John Doe", Age = 42 }; XmlHelper.Serialize(p, @"D:\text.xml"); Person p2 = new Person(); p2 = XmlHelper.Deserialize(@"D:\text.xml"); Console.WriteLine("Name: {0}", p2.Name); Console.WriteLine("Age: {0}", p2.Age); Console.Read();

June 24, 2012

by Ryan Alford

· 28,694 Views

Managing ActiveMQ with JMX APIs

Here is a quick example of how to programmatically access ActiveMQ MBeans to monitor and manipulate message queues... First, get a connection to a JMX server (assumes localhost, port 1099, no auth) Note, always cache the connection for subsequent requests (can cause memory utilization issues otherwise) JMXServiceURL url = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi"); JMXConnector jmxc = JMXConnectorFactory.connect(url); MBeanServerConnection conn = jmxc.getMBeanServerConnection(); Then, you can execute various operations such as addQueue, removeQueue, etc... String operationName="addQueue"; String parameter="MyNewQueue"; ObjectName activeMQ = new ObjectName("org.apache.activemq:BrokerName=localhost,Type=Broker"); if(parameter != null) { Object[] params = {parameter}; String[] sig = {"java.lang.String"}; conn.invoke(activeMQ, operationName, params, sig); } else { conn.invoke(activeMQ, operationName,null,null); } Also, you can get an ActiveMQ QueueViewMBean instance for a specified queue name... ObjectName activeMQ = new ObjectName("org.apache.activemq:BrokerName=localhost,Type=Broker"); BrokerViewMBean mbean = (BrokerViewMBean) MBeanServerInvocationHandler.newProxyInstance(conn, activeMQ,BrokerViewMBean.class, true); for (ObjectName name : mbean.getQueues()) { QueueViewMBean queueMbean = (QueueViewMBean) MBeanServerInvocationHandler.newProxyInstance(mbsc, name, QueueViewMBean.class, true); if (queueMbean.getName().equals(queueName)) { queueViewBeanCache.put(cacheKey, queueMbean); return queueMbean; } } Then, execute one of several APIs against the QueueViewMBean instance... Queue monitoring - getEnqueueCount(), getDequeueCount(), getConsumerCount(), etc... Queue manipulation - purge(), getMessage(String messageId), removeMessage(String messageId), moveMessageTo(String messageId, String destinationName), copyMessageTo(String messageId, String destinationName), etc... Summary The APIs can easily be used to build a web or command line based tool to support remote ActiveMQ management features. That being said, all of these features are available via the JMX console itself and ActiveMQ does provide a web console to support some management/monitoring tasks. See these pages for more information... http://activemq.apache.org/jmx-support.html http://activemq.apache.org/web-console.html

June 22, 2012

by Ben O'Day

· 32,223 Views · 1 Like

Top 10 Causes of Java EE Enterprise Performance Problems

Performance problems are one of the biggest challenges to expect when designing and implementing Java EE related technologies.

June 20, 2012

by Pierre - Hugues Charbonneau

· 274,189 Views · 20 Likes

Every Programmer Should Know These Latency Numbers

This is interesting stuff; Jonas Bonér organized some general some latency data by Peter Norvig as a Gist, and others expanded on it. What's interesting is how, scaling time up by a billion, converts a CPU instruction cycle into approximately one heartbeat, and yields a disk seek time of "a semester in university". ### Latency numbers every programmer should know L1 cache reference ......................... 0.5 ns Branch mispredict ............................ 5 ns L2 cache reference ........................... 7 ns Mutex lock/unlock ........................... 25 ns Main memory reference ...................... 100 ns Compress 1K bytes with Zippy ............. 3,000 ns = 3 µs Send 2K bytes over 1 Gbps network ....... 20,000 ns = 20 µs SSD random read ........................ 150,000 ns = 150 µs Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs Round trip within same datacenter ...... 500,000 ns = 0.5 ms Read 1 MB sequentially from SSD* ..... 1,000,000 ns = 1 ms Disk seek ........................... 10,000,000 ns = 10 ms Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms Send packet CA->Netherlands->CA .... 150,000,000 ns = 150 ms Assuming ~1GB/sec SSD ![Visual representation of latencies](http://i.imgur.com/k0t1e.png) Visual chart provided by [ayshen](https://gist.github.com/ayshen) Data by [Jeff Dean](http://research.google.com/people/jeff/) Originally by [Peter Norvig](http://norvig.com/21-days.html#answers) Lets multiply all these durations by a billion: Magnitudes: ### Minute: L1 cache reference 0.5 s One heart beat (0.5 s) Branch mispredict 5 s Yawn L2 cache reference 7 s Long yawn Mutex lock/unlock 25 s Making a coffee ### Hour: Main memory reference 100 s Brushing your teeth Compress 1K bytes with Zippy 50 min One episode of a TV show (including ad breaks) ### Day: Send 2K bytes over 1 Gbps network 5.5 hr From lunch to end of work day ### Week SSD random read 1.7 days A normal weekend Read 1 MB sequentially from memory 2.9 days A long weekend Round trip within same datacenter 5.8 days A medium vacation Read 1 MB sequentially from SSD 11.6 days Waiting for almost 2 weeks for a delivery ### Year Disk seek 16.5 weeks A semester in university Read 1 MB sequentially from disk 7.8 months Almost producing a new human being The above 2 together 1 year ### Decade Send packet CA->Netherlands->CA 4.8 years Average time it takes to complete a bachelor's degree

June 12, 2012

by Howard Lewis Ship

· 137,615 Views

How to Get the JPQL/SQL String From a CriteriaQuery in JPA ?

I.T. is full of complex things that should (and sometimes could) be simple. Getting the JQPL/SQL String representation for a JPA 2.0 CriteriaQuery is one of them. By now you all know the JPA 2.0 Criteria API : a type safe way to write a JQPL query. This API is clever in the way that you don’t use Strings to build your query, but is quite verbose… and sometimes you get lost in dozens of lines of Java code, just to write a simple query. You get lost in your CriteriaQuery, you don’t know why your query doesn’t work, and you would love to debug it. But how do you debug it ? Well, one way would be by just displaying the JPQL and/or SQL representation. Simple, isn’t it ? Yes, but JPA 2.0 javax.persistence.Query doesn’t have an API to do this. You then need to rely on the implementation… meaning, the code is different if you use EclipseLink, Hibernate or OpenJPA. The CriteriaQuery we want to debug Let’s say you have a simple Book entity and you want to retrieve all the books sorted by their id. Something like SELECT b FROM Book b ORDER BY b.id DESC. How would you write this with the CriteriaQuery ? Well, something like these 5 lines of Java code : CriteriaBuilder cb = em.getCriteriaBuilder(); CriteriaQuery q = cb.createQuery(Book.class); Root b = q.from(Book.class); q.select(b).orderBy(cb.desc(b.get("id"))); TypedQuery findAllBooks = em.createQuery(q); So imagine when you have more complex ones. Sometimes, you just get lost, it gets buggy and you would appreciate to have the JPQL and/or SQL String representation to find out what’s happening. You could then even unit test it. Getting the JPQL/SQL String Representations for a Criteria Query So let’s use an API to get the JPQL/SQL String representations of a CriteriaQuery (to be more precise, the TypedQuery created from a CriteriaQuery). The bad news is that there is no standard JPA 2.0 API to do this. You need to use the implementation API hoping the implementation allows it (thank god that’s (nearly) the case for the 3 main JPA ORM frameworks). The good news is that the Query interface (and therefore TypedQuery) has an unwrap method. This method returns the provider’s query API implementation. Let’s see how you can use it with EclipseLink, Hibernate and OpenJPA. EclipseLink EclipseLink‘s Query representation is the org.eclipse.persistence.jpa.JpaQuery interface and the org.eclipse.persistence.internal.jpa.EJBQueryImpl implementation. This interface gives you the wrapped native query (org.eclipse.persistence.queries.DatabaseQuery) with two very handy methods : getJPQLString() and getSQLString(). Unfortunatelly the getJPQLString() method will not translate a CriteriaQuery into JPQL, it only works for queries originally written in JPQL (dynamic or named query). The getSQLString() method relies on the query being “prepared”, meaning you have to run the query once before getting the SQL String representation. findAllBooks.unwrap(JpaQuery.class).getDatabaseQuery().getJPQLString(); // doesn't work for CriteriaQuery findAllBooks.unwrap(JpaQuery.class).getDatabaseQuery().getSQLString(); Hibernate Hibernate‘s Query representation is org.hibernate.Query. This interface has several implementations and the very useful method that returns the SQL query string : getQueryString(). I couldn’t find a method that returns the JPQL representation, if I’ve missed something, please let me know. findAllBooks.unwrap(org.hibernate.Query.class).getQueryString() OpenJPA OpenJPA‘s Query representation is org.apache.openjpa.persistence.QueryImpl and also has a getQueryString() method that returns the SQL (not the JPQL). It delegates the call to the internal org.apache.openjpa.kernel.Query interface. I couldn’t find a method that returns the JPQL representation, if I’ve missed something, please let me know. findAllBooks.unwrap(org.apache.openjpa.persistence.QueryImpl.class).getQueryString() Unit testing Once you get your SQL String, why not unit test it ? Hey, but I don’t want to test my ORM, why would I do that ? Well, it happens that I’ve discovered a but in the new releases of OpenJPA by unit testing a query… so, there is a use case for that. Anyway, this is how you could do it : assertEquals("SELECT b FROM Book b ORDER BY b.id DESC", findAllBooksCriteriaQuery.unwrap(org.apache.openjpa.persistence.QueryImpl.class).getQueryString()); Conclusion As you can see, it’s not that simple to get a String representation for a TypedQuery. Here is a digest of the three main ORMs : ORM Framework Query implementation How to get the JPQL String How to get the SPQL String EclipseLink JpaQuery getDatabaseQuery().getJPQLString()* getDatabaseQuery().getSQLString()** Hibernate Query N/A getQueryString() OpenJPA QueryImpl getQueryString() N/A (*) Only possible on a dynamic or named query. Not possible on a CriteriaQuery (**) You need to execute the query first, if not, the value is null To illustrate all that I’ve written simple test cases using EclipseLink, Hibernate and OpenJPA that you can download from GitHub. Give it a try and let me know. And what about having an API in JPA 2.1 ? For a developers’ point of view it would be great to have two methods in the javax.persistence.Query (and therefore javax.persistence.TypedQuery) interface that would be able to easily return the JPQL and SQL String representations, e.g : Query.getJPQLString() and Query.getSQLString(). Hey, that would be the perfect time to have it in JPA 2.1 that will be shipped in less than a year. Now, as an implementer, this might be tricky to do, I would love to ear your point of view on this. Anyway, I’m going to post an email to the JPA 2.1 Expert Group… just in case we can have this in the next version of JPA ;o) References http://efreedom.com/Question/1-6412774/Get-SQL-String-JPQLQuery http://old.nabble.com/Cannot-get-the-JPQL—SQL-String-of-a-CriteriaQuery-td33882629.html http://paddyweblog.blogspot.fr/2010/04/some-examples-of-criteria-api-jpa-20.html http://www.altuure.com/2010/09/23/jpa-criteria-api-by-samples-part-i/ http://www.altuure.com/2010/09/23/jpa-criteria-api-by-samples-%E2%80%93-part-ii/ http://www.jumpingbean.co.za/blogs/jpa2-criteria-api http://wiki.eclipse.org/EclipseLink/FAQ/JPA#How_to_get_the_SQL_for_a_Query.3F

June 5, 2012

by Antonio Goncalves

· 61,019 Views · 1 Like