Data Engineering Resources

The Latest Data Engineering Topics

With the web being used on so many different devices now it's very important that you can change your design to fit on different screen sizes. The best way of changing your display depending on screen size is to use media queries to find out the size viewport of the screen and allowing you to change the design depending on what screen size is on. You will mainly make these changes in the CSS file as you can define the media query break points to change the design on different devices like this. /* Smartphones (portrait) ----------- */ @media only screen and (max-width : 320px) { } /* Desktops and laptops ----------- */ @media only screen and (min-width : 1224px) { } /* Large screens ----------- */ @media only screen and (min-width : 1824px) { } The above code will allow you to make styling completely different on different devices, but what if you wanted to change the functionality of the site depending on the screen size? What if you needed to use some Javascript code on different screen sizes, for example to create a slide down navigation button. How Do You Use Media Queries With jQuery Media queries will be checking the width of the window to see what the size of the device is so you would think that you can use a method like .width() on the window object like this. if($(window).width() < 767) { // change functionality for smaller screens } else { // change functionality for larger screens } But this will not return the true value of the window as it takes into effect things like body padding and scroll bars on the window. The other option you have when checking the media size is to use a Javascript method of .matchMedia() on the window object. var window_size = window.matchMedia('(max-width: 768px)')); This works the same way as media queries and is supported on many browsers apart from IE9 and lower. Can I Use window.matchMedia To use matchMedia you need to pass in the min or max values you want to check (like media queries) and see if the viewport matches this. if (window.matchMedia('(max-width: 768px)').matches) { // do functionality on screens smaller than 768px } Now you can use this to add a click event on to a sub-menu for screens smaller than 768px. The below code is an example of how you can add some Javascript code which will only be run on screens smaller than 768px. if (window.matchMedia('(max-width: 768px)').matches) { $('.sub-menu-button').on('click', function(e) { var subMenu = $(this).next('.sub-navigation'); if(subMenu.is(':visible')) { subMenu.slideUp(); } else { subMenu.slideDown(); } return false; }); }

June 6, 2014

by Paul Underwood

· 135,720 Views · 3 Likes

How Does Spring @Transactional Really Work?

Wondering how Spring @Transactional works? Learn about what's really going on.

June 5, 2014

by Vasco Cavalheiro

· 882,582 Views · 47 Likes

MapDB: The Agile Java Data Engine

MapDB is a pure Java database, specifically designed for the Java developer. The fundamental concept for MapDB is very clever yet natural to use: provide a reliable, full-featured and “tune-able” database engine using the Java Collections API. MapDB 1.0 has just been released, this is the culmination of years of research and development to get the project to this point. Jan Kotek, the primary developer for MapDB, also worked on predecessor projects (JDBM), starting MapDB as an entire from-scratch rewrite. Jan’s expertise and dedication to low-level debugging has yielded excellent results, producting an easy-to-use database for Java with comparable performance to many C-based engines. What sets MapDB apart is the “map” concept. The idea is to leverage the totally natural Java Collections API – so familiar to Java developers that most of them literally use it daily in their work. For most database interactions with a Java application, some sort of translator is required. There are many Object-Relational Mapping (ORM) tools to name just one category of such components. The goal has always been in the direction of making it natural to code in objects in the Java language, and translate them to a specific database syntax (such as SQL). However, such efforts have always come up short, adding complexity for both the application developer and the data architect. When using MapDB there is no object “translation layer” – developers just access data in familiar structures like Maps, Sets, Queues, etc. There is no change in syntax from typical Java coding, other than a brief initialization syntax and transaction management. A developer can literally transform memory-limited maps into a high-speed persistent store in seconds (typically changing just one line of code). A MapDB Example Here is a simple MapDB example, showing how easy and intuitive it is to use in a Java application: // Initialize a MapDB database DB db = DBMaker.newFileDB(new File("testdb")) .closeOnJvmShutdown() .make(); // Create a Map: Map myMap = db.getTreeMap(“testmap”); // Work with the Map using the normal Map API. myMap.put(“key1”, “value1”); myMap.put(“key2”, “value2”); String value = myMap.get(“key1”); ... That’s all you need to do, now you have a file-backed Map of virtually any size. Note the “builder-style” initialization syntax, enabling MapDB as the agile database choice for Java. There are many builder options that let you tune your database for the specific requirements at hand. Just a small subset of options include: In-memory implementation Enable transactions Configurable caching This means that you can configure your database just for what you need, effectively making MapDB serve the job of many other databases. MapDB comes with a set of powerful configuration options, and you can even extend the product to make your own data implementations if necessary. Another very powerful feature is that MapDB utilizes some of the advanced Java Collections variants, such as ConcurrentNavigableMap. With this type of Map you can go beyond simple key-value semantics, as it is also a sorted Map allowing you to access data in order, and find values near a key. Not many people are aware of this extension to the Collections API, but it is extremely powerful and allows you to do a lot with your MapDB database (I will cover more of these capabilities in a future article). The Agile Aspect of MapDB When I first met Jan and started talking with him about MapDB he said something that made a very important impression: If you know what data structure you want, MapDB allows you to tailor the structure and database characteristics to your exact application needs. In other words, the schema and ways you can structure your data is very flexible. The configuration of the physical data store is just as flexible, making a perfect combination for meeting almost any database need. They key to this capability is inherent in MapDB’s architecture, and how it translates to the MapDB API itself. Here is a simple diagram of the MapDB architecture: As you can see from the diagram, there are 3 tiers in MapDB: Collections API: This is the familiar Java Collections API that every Java developer uses for maintaining application state. It has a simple builder-style extension to allow you to control the exact characteristics of a given database (including its internal format or record structure). Engine: The Engine is the real key to MapDB, this is where the records for a database – including their internal structure, concurrency control, transactional semantics – are controlled. MapDB ships with several engines already, and it is straightforward to add your own Engine if needed for specialized data handling. Volume: This is the physical storage layer (e.g., on-disk or in-memory). MapDB has a few standard Volume implementations, and they should suffice for most projects. The main point is that the development API is completely distinct from the Engine implementation (the heart of MapDB), and both are separate from the actual physical storage layer. This offers a very agile approach, allowing developers to exactly control what type of internal structure is needed for a given database, and what the actual data structure looks like from the top-level Collections API. To make things even more extensible and agile, MapDB uses a concept of Engine Wrappers. An Engine Wrapper allows adding additional features and options on top of a specific engine layer. For example, if the standard Map engine is utilized for creating a B-Tree backed Map, it is feasible to enable (or disable) caching support. This caching feature is done through an Engine Wrapper, and that is what shows up in the builder-style API used to configure a given database. While a whole article could be written just about this, the point here is that this adds to MapDB’s inherent agile nature. By way of example, here is how you configure a pure in-memory database, without transactional capabilities: // Initialize an in-memory MapDB database // without transactions DB db = DBMaker.newMemoryDB() .transactionDisable() .closeOnJvmShutdown() .make(); // Create a Map: Map myMap = db.getTreeMap(“testmap”); // Work with the Map using the normal Map API. myMap.put(“key1”, “value1”); myMap.put(“key2”, “value2”); String value = myMap.get(“key1”); ... That’s it! All that was needed was to change the DBMaker call to add the new options, everything else works exactly the same as in the example shown earlier. Agile Data Model In addition to customizing the features and performance characteristics of a given database instance, MapDB allows you to create an agile data model, with a schema exactly matching your application requirements. This is probably similar to how you write your code when creating standard Java in-memory structures. For example, let’s say you need to lookup a Person object by username, or by personID. Simply create a Person object and two Maps to meet your needs: public class Person { private Integer personID; private String username; ... // Setters and getters go here ... } // Create a Map of Person by username. Map personByUsernameMap = ... // Create a Map of Person by personID. Map personByPersonIDMap = ... This is a very trivial example, but now you can easily write to both maps for each new Person instance, and subsequently retrieve a Person by either key. Another interesting concept with MapDB data structures are some key extensions to the normal Java Collections API. A common requirement in applications is to have a Map with a key/value, and in addition to finding the value for a key to be able to perform the inverse: lookup the key for a given value. We can easily do this using the MapDB extension for bi-directional maps: // Create a primary map HTreeMap map = DBMaker.newTempHashMap(); // Create the inverse mapping for primary map NavigableSet> inverseMapping = new TreeSet>(); // Bind the inverse mapping to primary map, so it is auto-updated each time the primary map gets a new key/value Bind.mapInverse(map, inverseMapping); map.put(10L,"value2"); map.put(1111L,"value"); map.put(1112L,"value"); map.put(11L,"val"); // Now find a key by a given value. Long keyValue = Fun.filter(inverseMapping.get(“value2”); MapDB supports many constructs for the interaction of Maps or other collections, allowing you to create a schema of related structures that can automatically be kept in sync. This avoids a lot of scanning of structures, makes coding fast and convenient, and can keep things very fast. Wrapping it up I have shown a very brief introduction on MapDB and how the product works. As you can see its strengths are its use of the natural Java Collections API, the agile nature of the engine itself, and the support for virtually any type of data model or schema that your application needs. MapDB is freely available for any use under the Apache 2.0 license. To learn more, check out: www.mapdb.org.

June 5, 2014

by Cory Isaacson

· 28,534 Views · 3 Likes

Are There Any Differences Between Table Scan and Index Scan

Introduction As we all know, developers prefer Index Seek in the case of performance of a query, and we all know what Index Seek is and how it improves performance. In this article we are not going to discuss Index Seek. A table scan is performed on a table which does not have an Index upon it (a heap) – it looks at the rows in the table and an Index Scan is performed on an indexed table – the index itself. Here we are trying discuss a specified scenario related to Table Scan and Index Scan. Scenario Description Suppose we have a table object without any Index on it. The name of the Table Object is tbl_WithoutIndex and another Table Object called tbl_WuthIndex -- Heap Table Defination IF OBJECT_ID(N'dbo.tbl_WithoutIndex', N'U') IS NOT NULL BEGIN DROP TABLE [dbo].[tbl_WithoutIndex]; END GO CREATE TABLE [dbo].[tbl_WithoutIndex] ( EMPID INT NOT NULL, EMPNAME VARCHAR(50) NOT NULL ); GO -- Insert Some Records INSERT INTO [dbo].[tbl_WithoutIndex] (EMPID, EMPNAME) VALUES (101, 'Joydeep Das'), (102, 'Sukamal Jana'), (103, 'Ranajit Shinah'); GO -- Table with Index (Clustered Index for primary Key) IF OBJECT_ID(N'dbo.tbl_WithIndex', N'U') IS NOT NULL BEGIN DROP TABLE [dbo].[tbl_WithIndex]; END GO CREATE TABLE [dbo].[tbl_WithIndex] ( EMPID INT NOT NULL PRIMARY KEY, EMPNAME VARCHAR(50) NOT NULL ); GO -- Finding Index Name sp_helpindex tbl_WithIndex; index_name index_description index_keys PK__tbl_With__14CCD97D3AD6B8E2 clustered, unique, primary key located on PRIMARY EMPID -- Insert Some Records INSERT INTO [dbo].[tbl_WithIndex] (EMPID, EMPNAME) VALUES (101, 'Joydeep Das'), (102, 'Sukamal Jana'), (103, 'Ranajit Shinah'); GO Now we are going to compare the execution plan output SELECT * FROM [dbo].[tbl_WithoutIndex]; has no Index (Here I mean the clustered Index) the table is a Heap. So when we put the SELECT statement the entire table scanned. SELECT * FROM [dbo].[tbl_WithIndex]; Here he Table has a PRIMARY KEY, so it has a Clustered Index on it. But here we are not putting the Index columns on the WHERE clause, so the Clustered Index Scan Occurs. Close Observation of Execution Plan Please remember that the table has small number of records. Table Name Estimated IO Cost tbl_WithoutIndex 0.0032035 Tbl_WithIndex 0.003125 If we see the Estimated Operation Cost, it would be same for both the Query (0.0032853). Question in Mind Here we can see the Estimated Operation cost for both the Query is same. So Question is in the mind that, if a Index Scan occur we can drop the index and use the heap (in our case). So is there any other difference between them. How they are Differences Here we understand what the internal difference between Table Scan and Index Scan. When the table scan occurs MS SQL server reads all the Rows and Columns into memory. When the Index Scan occurs, it's going to read all the Rows and only the Columns in the index. Effects in Performance In case of performance the Table Scan and Index Scan both have the same output, if we have use the single table SELECT statement. But it differs in the case of Index Scan when we are going to JOIN table. Hope you like it.

June 5, 2014

by Joydeep Das

· 24,240 Views

You Never Really Learn Something Until You Teach It

as software developers we spend a large amount of time learning. there is always a new framework or technology to learn. it can seem impossible to keep up with everything when there is something new every day. so, it should be no surprise to you that learning quickly and gaining a deeper understanding of what you learn is very important. and that is exactly why–if you are not doing so already– you need to incorporate teaching into your learning. why teaching is such an effective learning tool when we learn something, most of us learn it in bits and pieces. typically, if you read a book, you’ll find the material in that book organized in a sensible way. the same goes for others mediums like video or online courses. but, unfortunately, the material doesn’t go into your head in the same way. what happens instead is that you absorb information in jumbled bits and pieces. you learn something, but don’t completely “get it” until you learn something else later on. the earlier topic becomes more clear, but the way that data is structured in your mind is not very well organized–regardless of how organized the source of that information was. even now, as i write this blog post, i am struggling with taking the jumbled mess of information i have in my head about how teaching helps you learn and figuring out how to present it in an organized way. i know what i want to say, but i don’t yet know how to say it. only the process of putting my thoughts on paper will force me to reorganize them; to sort them out and make sense of them. when you try to teach something to someone else, you have to go through this process in your own mind. you have to take that mess of data, sort it out, repackage it and organize it in a way that someone else can understand. this process forces you to reorganize the way that data is stored in your own head. also, as part of this process, you’ll inevitably find gaps in your own understanding of whatever subject you are trying to teach. when we learn something we have a tendency to gloss over many things we think we understand. you might be able to solve a math problem in a mechanical way, and the steps you use to solve the math problem might be sufficient for what you are trying to do, but just knowing how to solve a problem doesn’t mean you understand how to solve a problem. knowledge is temporary. it is easily lost. understanding is much more permanent. it is rare that we forget something we understand thoroughly. when we are trying to explain something to someone else, we are forced to ask ourselves the most important question in leaning… in gaining true understanding… “why.” when we have to answer the question “why,” superficial understanding won’t do. we have to know something deeply in order to not just say how, but why. that means we have to explore a subject deeply ourselves. sometimes this involves just sitting and thinking about it clearly before you try to explain it to someone else. sometimes just the act of writing, speaking or drawing something causes you to make connections you didn’t make before, instantly deepening your knowledge. (ever had one of those moments when you explained something to someone else and you suddenly realized that before you tried to explain it you didn’t really understand it yourself, but now you magically do?) and, sometimes, you have to go back to the drawing board and do more research to fill in those gaps in your own knowledge you just uncovered when you tried to explain it to someone else. becoming a teacher so, you perhaps you agree with me so far, but you’ve got one problem–you’re not a teacher. well, i have good news for you. we are all teachers. teaching doesn’t have to be some formal thing where you have books and a classroom. teaching is simply repackaging information in a way that someone else can understand. the most effective teaching takes place when you can explain something to someone in terms of something else they already understand. (want a great book on the subject that might make your brain hurt? surfaces and essences: analogy as the fuel and fire of thinking. an excellent book by douglas hofstadter, author of godel, escher, bach: an eternal golden braid. both books are extremely difficult reads, but very rewarding.) as human beings, we do this all the time. whenever we communicate with someone else and tell them about something we learned or explain how to do something, we are teaching. of course, the more you do it, the better you get at it, and adding a little more formalization to your practice doesn’t hurt, but at heart, you–yes, you–are a teacher. one of the best ways to start teaching–that may even benefit your career–is to start a blog. many developers i talk to assume that they have to already be experts at something in order to blog about it. the truth is, you only have to be one step ahead of someone for them to learn from you. so, don’t be afraid to blog about what you are learning as you are learning it. there will always be someone out there who could benefit from your knowledge–even if you consider yourself a beginner. and don’t worry about blogging for someone else–at least not at first. just blog for yourself. the act of taking your thoughts and putting them into words will gain you the benefits of increasing your own understanding and reorganizing thoughts in your mind. i won’t pretend the process isn’t painful. when you first start writing, it doesn’t usually come easily. but, don’t worry too much about quality. worry about communicating your ideas. with time, you’ll eventually get better and you’ll find the process of converting the ideas in your head to words becomes easier and easier. of course, creating a blog isn’t the only way to start teaching. you can simply have a conversation with a friend, a coworker, or even your spouse about what you are learning. just make sure you express what you are learning externally in one form or another if you really want to gain a deep understanding of the subject. you can also record videos or screencasts, speak on a subject or even give a presentation at work. whatever you do, make sure that teaching is part of your learning process. for more posts like this and some content i only deliver via email, sign up here.

June 5, 2014

by John Sonmez

· 19,550 Views · 1 Like

Introducing Partitioned Collections for MongoDB Applications

TokuMX 1.5 is around the corner. The big feature will be something we discussed briefly when talking about replication changes in 1.4: partitioned collections. Before introducing the feature, I wanted to mention the following. Although TokuMX 1.5 is not available as of this writing, we would love to hear feedback on partitioned collections, which we think are wonderful for time-series data, as I describe below. If you are interested in trying out the feature, email [email protected] for a pre-release version of TokuMX 1.5. What is a partitioned collection? A partitioned collection is analogous to a partitioned table in relational databases. Oracle, MySQL, SQL Server, and Postgres all support partitioned tables. We are happy to bring this functionality to TokuMX. So, if the remainder of this blog is unclear, and you have friends in the office who are familiar with relational databases, you may want to ask them for more information :). Nevertheless, a partitioned collection is a collection that underneath the covers is broken into (or partitioned into) several individual collections, based on ranges of a “partition key”. From the application developer’s point of view, the collection is just another collection. Queries, inserts, updates, and deletes just work with no syntactical changes. Secondary indexes and replication work as well. But underneath the covers, the data will be broken into several collections, with each collection responsible for all data for a range of the partition key. If you are running TokuMX 1.4, a simple example is the oplog, which is a partitioned collection. Any normal query works just fine on the oplog. However, if you look in your data directory, you will see several .tokumx files named “local_oplog_rs_p…”. These files are the individual partitions that break up the data. Each partition stores a range of _id fields in the oplog. Why should I bother using a partitioned collection? This will be its own post with longer examples, but here is a summary. Partitioned collections have two big advantages: Large chunks of data can be deleted very efficiently by dropping partitions. The cost is that of performing an “rm” on some files in the filesystem. This is really fast and efficient. Queries that include the partition key may be isolated to individual partitions, and therefore run faster. This is similar to “query isolation” for shard keys. So, one scenario you may want a partitioned collection for is where the oldest data gets dropped periodically, and many queries benefit from a time based key. That will be a good fit. In short: time series data. If you have a time-series application where you want to keep a rolling period of data (e.g. the last 6 months worth), then using a partitioned collection will be great, and is preferable to using a TTL index or a capped collection. In a future blog post, I will expand on this. How do I use a partitioned collection in TokuMX 1.5? Basically, just like a normal collection, except with some commands added to create a partitioned collection, add partitions, and drop partitions. Below, I explain the shell commands added for this functionality. Our documentation contains the full commands so that they may be called by any driver’s runCommand method. Ok, so how do I create a partitioned collection? The first thing to consider is what your partition key should be. That is, what key do you want to use ranges of to partition your data? This key has similarities with a shard key. It should be a key that can be used to isolate partitions, the way a shard key is used to isolate shards (as explained here). Also, it should be a key that contains a range of data you would like to delete all at once. With time series data, that key will likely be a timestamp. In TokuMX, the partition key is always the primary key. To create a partitioned collection, “foo”, with a timestamp field, “ts”, used for the partitioning run the following: > db.createCollection("foo", { partitioned : 1 , primaryKey : { ts : 1 , _id : 1 } }) { "ok" : 1 } Note that in TokuMX, the primary key must have the _id field appended to it to ensure uniqueness. As a side note, we do not support hash based partitioning, only range based partitioning. Adding partitions? In TokuMX, partitions can only be appended to the end. Individual partitions cannot be split. So, say we have a collection that partitions on the _id field, where all _id’s happen to be integers. Suppose we have three partitions with the following ranges: _id <= 0 0 < _id <= 1000 _id > 1000 With this collection we cannot create a partition with the range 500 < _id <= 1000, because that would split the second partition. All we can do is add a new partition to the end, and “cap” the current last partition with a new maximum value. This new maximum value must be greater than or equal to the primary key (or in this case, _id) of the last partition’s last document. So, if the last partition’s last document has an _id of 2500, we can only partitions that create a range whose maximum is at least 2500. There are two ways to add a partition. The first method peeks at the last document in the current last partition, caps the partition with the primary key of that last document, and creates a new partition. To do so, one does: > db.foo.addPartition() { "ok" : 1 } In the above example, the partitioned collection would now have partitions with the following ranges: _id <= 0 0 < _id <= 1000 1000 < _id <= 2500 _id > 2500 Alternatively, we can specify what the new maximum of the existing last partition may be, provided it is greater than the last document in last partition (which in this example is 2500). To do so, we simply pass in the new maximum as a parameter: > db.foo.addPartition({ _id : 3000 }); { "ok" : 1 } This would make the collection have partitions with the following ranges: _id <= 0 0 < _id <= 1000 1000 < _id <= 3000 _id > 3000 Dropping partitions? Dropping partitions is simple. First, see what the partitions are with the following shell command: > db.foo.getPartitionInfo() { "numPartitions" : NumberLong(4), "partitions" : [ { "_id" : NumberLong(0), "max" : { "_id" : 0 }, "createTime" : ISODate("2014-05-29T01:50:15.839Z") }, { "_id" : NumberLong(1), "max" : { "_id" : 1000 }, "createTime" : ISODate("2014-05-29T01:50:27.049Z") }, { "_id" : NumberLong(2), "max" : { "_id" : 2500 }, "createTime" : ISODate("2014-05-29T01:50:30.549Z") }, { "_id" : NumberLong(3), "max" : { "_id" : { "$maxKey" : 1 } }, "createTime" : ISODate("2014-05-29T01:50:35.903Z") } ], "ok" : 1 } This lists each partition, what the maximum value that each partition may hold (thus defining the range of the partition), and the id of the partition (in the _id field). So, in the example we used for adding partitions, we have four partitions with _ids 0 through 3. To drop a partition, we run the following command and pass the _id of the partition we want to drop. To drop partition 0, we run: > db.foo.dropPartition(0) { "ok" : 1 } Looking at the list of partitions after this operation, we see the partition is dropped: > db.foo.getPartitionInfo() { "numPartitions" : NumberLong(3), "partitions" : [ { "_id" : NumberLong(1), "max" : { "_id" : 1000 }, "createTime" : ISODate("2014-05-29T01:50:27.049Z") }, { "_id" : NumberLong(2), "max" : { "_id" : 2500 }, "createTime" : ISODate("2014-05-29T01:50:30.549Z") }, { "_id" : NumberLong(3), "max" : { "_id" : { "$maxKey" : 1 } }, "createTime" : ISODate("2014-05-29T01:50:35.903Z") } ], "ok" : 1 } This covers how to use partitioned collections. We hope users in the MongoDB ecosystem find this feature as useful as relational database users do. In the comments section below, feel free to leave questions and/or feedback.

June 4, 2014

by Zardosht Kasheff

· 12,470 Views

Exploring Message Brokers: RabbitMQ, Kafka, ActiveMQ, and Kestrel

Explore different message brokers, and discover how these important web technologies impact a customer's backlog of messages, and cluster/data performance.

June 3, 2014

by Yves Trudeau

· 460,598 Views · 86 Likes

MySQL Transaction Isolation Levels and Locks

Originally written by Lim Han Recently, an application that my team was working on encountered problems with a MySQL deadlock situation and it took us some time to figure out the reasons behind it. This application that we deployed was running on a 2-node cluster and they both are connected to an AWS MySQL database. The MySQL db tables are mostly based on InnoDB which supports transaction (meaning all the usual commit and rollback semantics) as well as row-level locking that MyISAM engine does not provide. So the problem arose when our users, due to some poorly designed user interface, was able to execute the same long running operation twice on the database. As it turned out, due to the fact that we have a dual node cluster, each of the user operation originated from a different web application (which in turn meant 2 different transaction running the same queries). The deadlock query happened to be a “INSERT INTO T… SELECT FROM S WHERE” query that introduced shared locks on the records that were used in the SELECT query. It didn’t help that both T and S in this case happened to be the same table. In effect, both the shared locks and exclusive locks were applied on the same table. An attempt to explain the possible cause of the deadlock on the queries could be explained by the following table. This is based on the assumption that we are using a default REPEATABLE_READ transaction isolation level (I will explain the concept of transaction isolation later) Assuming that we have a table as such RowId Value 1 Collection 1 2 Collection 2 … Collection N 450000 Collection 450000 The following is a sample sequence that could possibly cause a deadlock based on the 2 transactions running an SQL query like “INSERT INTO T SELECT FROM T WHERE … “ : Time Transaction 1 Transaction 2 Comment T1 Statement executed Statement executed. A shared lock is applied to records that are read by selection T2 Read lock s1 on Row 10-20 The lock on the index across a range. InnoDB has a concept of gap locks. T3 Statement executed Transaction 2 statement executed. Similar shared lock to s1 applied by selection T4 Read lock s2 on Row 10-20 Shared read locks allow both transaction to read the records only T5 Insert lock x1 into Row 13 in index wanted Transaction 1 attempts to get exclusive lock on Row 13 for insertion but Transaction 2 is holding a shared lock T6 Insert lock x2 into Row 13 in index wanted Transaction 2 attempts to get exclusive lock on Row 13 for insertion but Transaction 1 is holding a shared lock T7 Deadlock! The above scenario occurs only when we use REPEATABLE_READ (which introduces shared read locks). If we were to lower the transaction isolation level to READ_COMMITTED, we would reduce the chances of a deadlock happening. Of course, this would mean relaxing the consistency of the database records. In the case of our data requirements, we do not have such strict requirements for strong consistency. Thus, it is acceptable for one transaction to read records that are committed by other transactions. So, to delve deeper into the idea of Transaction Isolation, this concept has been defined by ANSI/ISO SQL as the following from highest isolation levels to lowest Serializable This is the highest isolation level and usually requires the use of shared read locks and exclusive write locks (as in the case of MySQL). What this means in essence that any query made will require access to a shared read lock on the records which prevents another transaction’s query to modify these records. Every update statement will require access to an exclusive write lock Also, range-locks must be acquired when a select statement with a WHERE condition is used. This is implemented as a gap lock in MySQL. Repeatable Reads This is the default level used in MySQL. This is mainly similar to Serializable beside the fact that a range lock is not used. However, the way that MySQL implements this level seemed to me a little different. Based on Wikipedia’s article on Transaction Isolation, a range lock is not implemented and so phantom reads can still occur. Phantom reads refer to a possibility that select queries will have additional records when the same query is made within a transaction. However, what I understand from MySQL’s document is that range locks are still used and the same select queries made in the same transaction will always return the same records. Maybe I’m mistaken in my understanding and if there’s any mistakes in my intepretations, I stand ready to be corrected. Read Committed This is an isolation level that will maintain a write lock until the end of the transaction but read locks will be released at the end of the SELECT statement. It does not promise that a SELECT statement will find the same data if it is re-run again in the same transaction. It will, however, guarantee that the data that is read are not “dirty” and has been committed. Read Uncommitted This is an isolation level that I doubt would be useful for most use cases. Basically, it allows a transaction to see all data that has been modified, including “dirty” or uncommitted data. This is the lowest isolation level Having gone through the different transaction isolation levels, we could see how the selection of the Transaction Isolation level determines the kind of database locking mechanism. From a practical standpoint, the default MySQL isolation level (REPEATABLE_READ) might not always be a good choice when you are dealing with a scenario like ours where there is really no need for such strong consistency in the data reads. I believe that by lowering the isolation level, it is likely to reduce chances that your database queries meet with a deadlock. Also, it might even allow a higher concurrent access to your database which improve the performance level of your queries. Of course, this comes with the caveat that you need to understand how important consistent reads are for your application. If you are dealing with data where precision is paramount (e.g. your bank accounts), then it is definitely necessary to impose as much isolation as possible so that you would not read inconsistent information within your transaction.

June 2, 2014

by Anh Tuan Nguyen

· 18,220 Views · 1 Like

Neo4j & Cypher: Rounding a Float Value to Decimal Places

About 6 months ago, Jacqui Read created a github issue explaining how she wanted to round a float value to a number of decimal places but was unable to do so due to the round function not taking the appropriate parameter. I found myself wanting to do the same thing last week where I initially had the following value: RETURN toFloat("12.336666") AS value I wanted to round that to 2 decimal places and Wes suggested multiplying the value before using ROUND and then dividing afterwards to achieve that. For two decimal places we need to multiply and divide by 100: WITH toFloat("12.336666") AS value RETURN round(100 * value) / 100 AS value 12.34 Michael suggested abstracting the number of decimal places like so: WITH 2 as precision WITH toFloat("12.336666") AS value, 10^precision AS factor RETURN round(factor * value)/factor AS value If we want to round to 4 decimal places we can easily change that: WITH 4 as precision WITH toFloat("12.336666") AS value, 10^precision AS factor RETURN round(factor * value)/factor AS value 12.3367 The code is available as a graph gist too if you want to play around with it. And you may as well check out the other gists while you’re at it – enjoy!

June 2, 2014

by Mark Needham

· 6,988 Views

NetBeans in the Classroom: Console Input Using Scanners

ken fogel is the program coordinator and chairperson of the computer science technology program at dawson college in montreal, canada. he is also a program consultant to and part-time instructor in the computer institute of concordia university 's school of extended learning. he blogs at omniprogrammer.com and tweets @omniprof . his regular columns about netbeans in education are listed here . i am old enough to remember commercial applications that had a console interface. my students have no experience with the console. when java was introduced the era of such applications was over. instead you used awt, then swing and now javafx. for the student starting out in programming these gui frameworks add complexity they may not be ready for. therefore i start my classes using console input and output. input can be divided into two categories. the first is values. the primitive types such as int and double are value types. as i explain it, a value is a something you can perform both mathematical and logical operations on. the second is strings. a string does not have a value in the mathematical sense. instead it has a state. therefore it cannot be used in a mathematical expression. state can be compared and so strings can be used is logical expressions such as determining if two strings are the same. if you are ordering strings then you can also determine if one string is greater or less than the other. java does not have any console input routines for values. to java everything you type is a string. this has meant that for many years every java textbook author provided a console input routine that could convert strings to values. java 1.5 made all these routines redundant with the introduction of the scanner class. i could now teach input without needing to go into detail about wrapper classes or the numberformatexception in a standard approach. /** * ask the user for their weight */ public void inputweight() { scanner sc = new scanner(system.in); system.out.print("please enter your weight: "); int weight = sc.nextint(); system.out.println("weight = " + weight); } there is a problem with this code. if the user enters a string that cannot be converted we get the numberformatexception we want to avoid. scanner provides a solution with its ‘hasnext. . .’ methods. these methods determine if the user input can be converted to the target value. you can reject input and avoid the exception. /** * ask the user for their weight */ public void inputweight() { scanner sc = new scanner(system.in); system.out.print("please enter your weight: "); int weight; // accept input and test to see if it’s an integer if (sc.hasnextint()) { // success, it was an integer so get it weight = sc.nextint(); system.out.println("weight = " + weight); } else { // failure so retrieve the input as a string string bad = sc.next(); // inform the user what went wrong system.out.println(bad + " cannot be converted to an integer"); } } now when you run the program you get the following for a good input. this is what you get for a bad input. i will return to numeric input in another article as you do not want a program to end when the user makes one mistake. character input to make my assignments more interesting i have the students create a menu from which they must choose an action. i like to use letters as the menu choice. the problem is that there is no nextchar() method in scanner. instead you need to use next() which will accept anything that you enter. public void displaymenu() { scanner sc = new scanner(system.in); system.out.println("select an account"); system.out.println("a: savings"); system.out.println("b: checking"); system.out.println("c: exit"); system.out.print("please choose a, b or c: "); string choice = sc.next(); system.out.println("you chose: " + choice); } which could result in the following: the first improvement is to convert the string into a character. to do this the string must only be one character long. rather than checking the length you can just extract the first character in the string using the string class charat method. string choice = sc.next(); char letter = choice.charat(0); system.out.println("you chose: " + letter); this will work and if you enter “a” you will get ‘a’. the problem now is that the case of the character you entered should not matter. this is easy to solve by adding one more method from string into the mix. char letter = choice.touppercase().charat(0); almost perfect but since we are using string input the user can enter any string they want. so “bacon” will result in ‘b’. we also need some way to control the allowable letters. in this menu the choices are a, b and c. the solution is to use the ‘hasnext’ method for ‘next’. not the normal, no parameter version, but the version that takes a pattern. pattern is a polite way of saying regular expression. from https://xkcd.com/208/ just as we did with integer input we first test the string that the user has entered. the regular expression will be the pattern that will be used to determine if the string is correct. this regular expression is probably the simplest one you can write. we want a single character, either upper or lower case and is one of the three allowable characters. here it is: [abcabc] the square brackets indicate that this is a character class expression that can match only a single character. what goes inside the brackets are the allowable characters, in this case a, b, c, a, b or c. since these characters are adjacent in the ascii/unicode table they can also be written as a range. [a-ca-c] putting this all together we get a routine that will only accept a single letter. public void displaymenu() { scanner sc = new scanner(system.in); system.out.println("select an account"); system.out.println("a: savings"); system.out.println("b: checking"); system.out.println("c: exit"); system.out.print("please choose a, b or c: "); char letter; if (sc.hasnext("[a-ca-c]")) { string choice = sc.next(); letter = choice.touppercase().charat(0); system.out.println("you chose: " + letter); } else { string choice = sc.next(); system.out.println(choice + " is not valid input"); } } now if you enter ‘a’: but if you enter ‘aardvark’: in my next article i will look at console input a bit more and i will introduce you to the beethoven test.

May 30, 2014

by Ken Fogel

· 43,629 Views

Connecting to Cassandra from Java

In this post, I look at the basics of connecting to a Cassandra database from a Java client. I will use the DataStax Java Client JAR in order to do so.

May 30, 2014

by Dustin Marx

· 121,293 Views · 3 Likes

Performance Tuning of Spring/Hibernate Applications

For most typical Spring/Hibernate enterprise applications, the application performance depends almost entirely on the performance of it's persistence layer. This post will go over how to confirm that we are in presence of a 'database-bound' application, and then walk through 7 frequently used 'quick-win' tips that can help improve application performance. How to confirm that an application is 'database-bound' To confirm that an application is 'database-bound', start by doing a typical run in some development environment, using VisualVM for monitoring. VisualVM is a Java profiler shipped with the JDK and launchable via the command line by calling jvisualvm. After launching Visual VM, try the following steps: double click on your running application Select Sampler click on Settings checkbox Choose Profile only packages, and type in the following packages: your.application.packages.* org.hibernate.* org.springframework.* your.database.driver.package, for example oracle.* Click Sample CPU The CPU profiling of a typical 'database-bound' application should look something like this: We can see that the client Java process spends 56% of it's time waiting for the database to return results over the network. This is a good sign that the queries on the database are what's keeping the application slow. The 32.7% in Hibernate reflection calls is normal and nothing much can be done about it. First step for tuning - obtaining a baseline run The first step to do tuning is to define a baseline run for the program. We need to identify a set of functionally valid input data that makes the program go through a typical execution similar to the production run. The main difference is that the baseline run should run in a much shorter period of time, as a guideline an execution time of around 5 to 10 minutes is a good target. What makes a good baseline? A good baseline should have the following characteristics: it's functionally correct the input data is similar to production in it's variety it completes in a short amount of time optimizations in the baseline run can be extrapolated to a full run Getting a good baseline is solving half of the problem. What makes a bad baseline? For example, in a batch run for processing call data records in a telecommunications system, taking the first 10 000 records could be the wrong approach. The reason being, the first 10 000 might be mostly voice calls, but the unknown performance problem is in the processing of SMS traffic. Taking the first records of a large run would lead us to a bad baseline, from which wrong conclusions would be taken. Collecting SQL logs and query timings The SQL queries executed with their execution time can be collected using for example log4jdbc. See this blog post for how to collect SQL queries using log4jdbc - Spring/Hibernate improved SQL logging with log4jdbc. The query execution time is measured from the Java client side, and it includes the network round-trip to the database. The SQL query logs look like this: 16 avr. 2014 11:13:48 | SQL_QUERY /* insert your.package.YourEntity */ insert into YOUR_TABLE (...) values (...) {executed in 13 msec} The prepared statements themselves are also a good source of information - they allow to easily identify frequent query types. They can be logged by following this blog post - Why and where is Hibernate doing this SQL query? What metrics can be extracted from SQL logs The SQL logs can give the answer these questions: What are slowest queries being executed? What are the most frequent queries? What is the amount of time spent generating primary keys? Is there some data that could benefit from caching ? How to parse the SQL logs Probably the only viable option for large log volumes is to use command line tools. This approach has the advantage of being very flexible. At the expense of writing a small script or command, we can extract mostly any metric needed. Any command line tool will work as long as you are comfortable with it. If you are used to the Unix command line, bash might be a good option. Bash can be used also in Windows workstations, using for example Cygwin, or Git that includes a bash command line. Frequently applied Quick-Wins The quick-wins bellow identify common performance problems in Spring/Hibernate applications, and their corresponding solutions. Quick-win Tip 1 - Reduce primary key generation overhead In processes that are 'insert-intensive', the choice of a primary key generation strategy can matter a lot. One common way to generate id's is to use database sequences, usually one per table to avoid contention between inserts on different tables. The problem is that if 50 records are inserted, we want to avoid that 50 network round-trips are made to the database in order to obtain 50 id's, leaving the Java process hanging most of the time. How does Hibernate usually handle this? Hibernate provides new optimized ID generators that avoid this problem. Namely for sequences, a HiLo id generator is used by default. This is how the HiLo sequence generator it works: call a sequence once and get 1000 (the High value) calculate 50 id's like this: 1000 * 50 + 0 = 50000 1000 * 50 + 1 = 50001 ... 1000 * 50 + 49 = 50049, Low value (50) reached call sequence for new High value 1001 ... etc ... So from a single sequence call, 50 keys where generated, reducing the overhead caused my inumerous network round-trips. These new optimized key generators are on by default in Hibernate 4, and can even be turned off if needed by setting hibernate.id.new_generator_mappings to false. Why can primary key generation still be a problem? The problem is, if you declared the key generation strategy as AUTO, the optimized generators are still off, and your application will end up with a huge amount of sequence calls. In order to make sure the new optimized generators are on, make sure to use the SEQUENCE strategy instead of AUTO: @Id @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "your_key_generator") private Long id; With this simple change, an improvement in the range of 10%-20% can be measured in 'insert-intensive' applications, with basically no code changes. Quick-win Tip 2 - Use JDBC batch inserts/updates For batch programs, JDBC drivers usually provide an optimization for reducing network round-trips named 'JDBC batch inserts/updates'. When these are used, inserts/updates are queued at the driver level before being sent to the database. When a threshold is reached, then the whole batch of queued statements is sent to the database in one go. This prevents the driver from sending the statements one by one, which would waist multiple network round-trips. This is the entity manager factory configuration needed to active batch inserts/updates: 100 true true Setting only the JDBC batch size won't work. This is because the JDBC driver will batch the inserts only when receiving insert/updates for the exact same table. If an insert to a new table is received, then the JDBC driver will first flush the batched statements on the previous table, before starting to batch statements on the new table. A similar functionality is implicitly used if using Spring Batch. This optimization can easily buy you 30% to 40% to 'insert intensive' programs, without changing a single line of code. Quick-win Tip 3 - Periodically flush and clear the Hibernate session When adding/modifying data in the database, Hibernate keeps in the session a version of the entities already persisted, just in case they are modified again before the session is closed. But many times we can safely discard entities once the corresponding inserts where done in the database. This releases memory in the Java client process, preventing performance problems caused by long running Hibernate sessions. Such long-running sessions should be avoided as much as possible, but if by some reason they are needed, this is how to contain memory consumption: entityManager.flush(); entityManager.clear(); The flush will trigger the inserts from new entities to be sent to the database. The clear releases the new entities from the session. Quick-win Tip 4 - Reduce Hibernate dirty-checking overhead Hibernate uses internally a mechanism to keep track of modified entities called dirty-checking. This mechanism is not based on the equals and hashcode methods of the entity classes. Hibernate does it's most to keep the performance cost of dirty-checking to a minimum, and to dirty-check only when it needs to, but the mechanism does have a cost, which is more noticeable in tables with a large number of columns. Before applying any optimization, the most important is to measure the cost of dirty-checking using VisualVM. How to avoid dirty-checking? In Spring business methods that we know are read-only, dirty-checking can be turned off like this: @Transactional(readOnly=true) public void someBusinessMethod() { .... } An alternative to avoid dirty-checking is to use the Hibernate Stateless Session, which is detailed in the documentation. Quick-win Tip 5 - Search for 'bad' query plans Check the queries in the slowest queries list to see if they have good query plans. The most usual 'bad' query plans are: Full table scans: they happen when the table is being fully scanned due to usually a missing index or outdated table statistics. Full cartesian joins: This means that the full cartesian product of several tables is being computed. Check for missing join conditions, or if this can be avoided by splitting a step into several. Quick-win Tip 6 - check for wrong commit intervals If you are doing batch processing, the commit interval can make a large difference in the performance results, as in 10 to 100 times faster. Confirm that the commit interval is the one expected (usually around 100-1000 for Spring Batch jobs). It happens often that this parameter is not correctly configured. Quick-win Tip 7 - Use the second-level and query caches If some data is identified as being eligible for caching, then have a look at this blog post for how to setup the Hibernate caching: Pitfalls of the Hibernate Second-Level / Query Caches Conclusions To solve application performance problems, the most important action to take is to collect some metrics that allow to find what the current bottleneck is. Without some metrics it is often not possible to guess in useful time what the correct problem cause is. Also, many but not all of the typical performance pitfalls of a 'database-driven' application can be avoided in the first place by using the Spring Batch framework.

May 29, 2014

by Vasco Cavalheiro

· 38,323 Views · 3 Likes

Implementing Correlation ids in Spring Boot (for Distributed Tracing in SOA/Microservices)

After attending Sam Newman’s microservice talks at Geecon last week I started to think more about what is most likely an essential feature of service-oriented/microservice platforms for monitoring, reporting and diagnostics: correlation ids. Correlation ids allow distributed tracing within complex service oriented platforms, where a single request into the application can often be dealt with by multiple downstream service. Without the ability to correlate downstream service requests it can be very difficult to understand how requests are being handled within your platform. I’ve seen the benefit of correlation ids in several recent SOA projects I have worked on, but as Sam mentioned in his talks, it’s often very easy to think this type of tracing won’t be needed when building the initial version of the application, but then very difficult to retrofit into the application when you do realise the benefits (and the need for!). I’ve not yet found the perfect way to implement correlation ids within a Java/Spring-based application, but after chatting to Sam via email he made several suggestions which I have now turned into a simple project using Spring Boot to demonstrate how this could be implemented. Why? During both of Sam’s Geecon talks he mentioned that in his experience correlation ids were very useful for diagnostic purposes. Correlation ids are essentially an id that is generated and associated with a single (typically user-driven) request into the application that is passed down through the stack and onto dependent services. In SOA or microservice platforms this type of id is very useful, as requests into the application typically are ‘fanned out’ or handled by multiple downstream services, and a correlation id allows all of the downstream requests (from the initial point of request) to be correlated or grouped based on the id. So called ‘distributed tracing’ can then be performed using the correlation ids by combining all the downstream service logs and matching the required id to see the trace of the request throughout your entire application stack (which is very easy if you are using a centralised logging framework such as logstash) The big players in the service-oriented field have been talking about the need for distributed tracing and correlating requests for quite some time, and as such Twitter have created their open source Zipkin framework (which often plugs into their RPC framework Finagle), and Netflix has open-sourced their Karyon web/microservice framework, both of which provide distributed tracing. There are of course commercial offering in this area, one such product being AppDynamics, which is very cool, but has a rather hefty price tag. Creating a proof-of-concept in Spring Boot As great as Zipkin and Karyon are, they are both relatively invasive, in that you have to build your services on top of the (often opinionated) frameworks. This might be fine for some use cases, but no so much for others, especially when you are building microservices. I’ve been enjoying experimenting with Spring Boot of late, and this framework builds on the much known and loved (at least by me :-) ) Spring framework by providing lots of preconfigured sensible defaults. This allows you to build microservices (especially ones that communicate via RESTful interfaces) very rapidly. The remainder of this blog pos explains how I implemented a (hopefully) non-invasive way of implementing correlation ids. Goals Allow a correlation id to be generated for a initial request into the application Enable the correlation id to be passed to downstream services, using as method that is as non-invasive into the code as possible Implementation I have created two projects on GitHub, one containing an implementation where all requests are being handled in a synchronous style (i.e. the traditional Spring approach of handling all request processing on a single thread), and also one for when an asynchronous (non-blocking) style of communication is being used (i.e., using the Servlet 3 asynchronous support combined with Spring’s DeferredResult and Java’s Futures/Callables). The majority of this article describes the asynchronous implementation, as this is more interesting: Spring Boot asynchronous (DeferredResult + Futures) communication correlation id Github repo The main work in both code bases is undertaken by the CorrelationHeaderFilter, which is a standard Java EE Filter that inspects the HttpServletRequest header for the presence of a correlationId. If one is found then we set a ThreadLocal variable in the RequestCorrelation Class (discussed later). If a correlation id is not found then one is generated and added to the RequestCorrelation Class: public class CorrelationHeaderFilter implements Filter { //... @Override public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException { final HttpServletRequest httpServletRequest = (HttpServletRequest) servletRequest; String currentCorrId = httpServletRequest.getHeader(RequestCorrelation.CORRELATION_ID_HEADER); if (!currentRequestIsAsyncDispatcher(httpServletRequest)) { if (currentCorrId == null) { currentCorrId = UUID.randomUUID().toString(); LOGGER.info("No correlationId found in Header. Generated : " + currentCorrId); } else { LOGGER.info("Found correlationId in Header : " + currentCorrId); } RequestCorrelation.setId(currentCorrId); } filterChain.doFilter(httpServletRequest, servletResponse); } //... private boolean currentRequestIsAsyncDispatcher(HttpServletRequest httpServletRequest) { return httpServletRequest.getDispatcherType().equals(DispatcherType.ASYNC); } The only thing is this code that may not instantly be obvious is the conditional checkcurrentRequestIsAsyncDispatcher(httpServletRequest), but this is here to guard against the correlation id code being executed when the Async Dispatcher thread is running to return the results (this is interesting to note, as I initially didn’t expect the Async Dispatcher to trigger the execution of the filter again?) Here is the RequestCorrelation Class, which contains a simple ThreadLocal static variable to hold the correlation id for the current Thread of execution (set via the CorrelationHeaderFilter above) public class RequestCorrelation { public static final String CORRELATION_ID = "correlationId"; private static final ThreadLocal id = new ThreadLocal(); public static String getId() { return id.get(); } public static void setId(String correlationId) { id.set(correlationId); } } Once the correlation id is stored in the RequestCorrelation Class it can be retrieved and added to downstream service requests (or data store access etc) as required by calling the static getId() method within RequestCorrelation. It is probably a good idea to encapsulate this behaviour away from your application services, and you can see an example of how to do this in a RestClient Class I have created, which composes Spring’s RestTemplate and handles the setting of the correlation id within the header transparently from the calling Class. @Component public class CorrelatingRestClient implements RestClient { private RestTemplate restTemplate = new RestTemplate(); @Override public String getForString(String uri) { String correlationId = RequestCorrelation.getId(); HttpHeaders httpHeaders = new HttpHeaders(); httpHeaders.set(RequestCorrelation.CORRELATION_ID, correlationId); LOGGER.info("start REST request to {} with correlationId {}", uri, correlationId); //TODO: error-handling and fault-tolerance in production ResponseEntity response = restTemplate.exchange(uri, HttpMethod.GET, new HttpEntity(httpHeaders), String.class); LOGGER.info("completed REST request to {} with correlationId {}", uri, correlationId); return response.getBody(); } } //... calling Class public String exampleMethod() { RestClient restClient = new CorrelatingRestClient(); return restClient.getForString(URI_LOCATION); //correlation id handling completely abstracted to RestClient impl } Making this work for asynchronous requests… The code included above works fine when you are handling all of your requests synchronously, but it is often a good idea in a SOA/microservice platform to handle requests in a non-blocking asynchronous manner. In Spring this can be achieved by using the DeferredResult Class in combination with the Servlet 3 asynchronous support. The problem with using ThreadLocal variables within the asynchronous approach is that the Thread that initially handles the request (and creates the DeferredResult/Future) will not be the Thread doing the actual processing. Accordingly, a bit of glue code is needed to ensure that the correlation id is propagated across the Threads. This can be achieved by extending Callable with the required functionality: (don’t worry if example Calling Class code doesn’t look intuitive – this adaption between DeferredResults and Futures is a necessary evil within Spring, and the full code including the boilerplate ListenableFutureAdapter is in my GitHub repo): public class CorrelationCallable implements Callable { private String correlationId; private Callable callable; public CorrelationCallable(Callable targetCallable) { correlationId = RequestCorrelation.getId(); callable = targetCallable; } @Override public V call() throws Exception { RequestCorrelation.setId(correlationId); return callable.call(); } } //... Calling Class @RequestMapping("externalNews") public DeferredResult externalNews() { return new ListenableFutureAdapter<>(service.submit(new CorrelationCallable<>(externalNewsService::getNews))); } And there we have it – the propagation of correlation id regardless of the synchronous/asynchronous nature of processing! You can clone the Github report containing my asynchronous example, and execute the application by running mvn spring-boot:run at the command line. If you access http://localhost:8080/externalNewsin your browser (or via curl) you will see something similar to the following in your Spring Boot console, which clearly demonstrates a correlation id being generated on the initial request, and then this being propagated through to a simulated external call (have a look in the ExternalNewsServiceRest Class to see how this has been implemented): [nio-8080-exec-1] u.c.t.e.c.w.f.CorrelationHeaderFilter : No correlationId found in Header. Generated : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : start REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 [nio-8080-exec-2] u.c.t.e.c.w.f.CorrelationHeaderFilter : Found correlationId in Header : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : completed REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 Conclusion I’m quite happy with this simple prototype, and it does meet the two goals I listed above. Future work will include writing some tests for this code (shame on me for not TDDing!), and also extend this functionality to a more realistic example. I would like to say a massive thanks to Sam, not only for sharing his knowledge at the great talks at Geecon, but also for taking time to respond to my emails. If you’re interested in microservices and related work I can highly recommend Sam’s Microservice book which is available in Early Access at O’Reilly. I’ve enjoyed reading the currently available chapters, and having implemented quite a few SOA projects recently I can relate to a lot of the good advice contained within. I’ll be following the development of this book with keen interest! If you have any comments or thoughts then please do share them via the comment below, or feel free to get in touch via the usual mechanisms! References I used Tomasz Nurkiewicz’s excellent blog several times for learning how best to wire up all of the DeferredResult/Future code in Spring: http://www.nurkiewicz.com/2013/03/deferredresult-asynchronous-processing.html

May 29, 2014

by Daniel Bryant

· 24,511 Views · 1 Like

Visualizing XML as a Graph Using Neo4j

neo4j is quite awesome for pretty much everything . here is an extract i found from this blog whenever someone gives you a problem, think graphs . they are the most fundamental and flexible way of representing any kind of a relationship, so it’s about a 50-50 shot that any interesting design problem has a graph involved in it. make absolutely sure you can’t think of a way to solve it using graphs before moving on to other solution types. as a baby step, let us try to visualize an xml as a graph. we can start with a simple xml here, belgian waffles $5.95 two of our famous belgian waffles with plenty of real maple syrup 650 it is a simple ‘breakfast menu’ with just one item, ‘ belgian waffles ‘. neo4j has a neat interface which lets us visualize the graph we created. the graph for the above xml looks slick.. the above xml has 6 tags and we have 6 nodes in our graph. the nested tags/nodes has a ‘child-of’ relationship with the parent tag/node. in the graph above, node numbered 0 is the root node – corresponds to the tag node numbered 1 is the immediate child of the root – corresponds to the tag nodes numbered from 2-4 are the tags which come under tag, viz , , and transforming xml.. let us parse the xml into a data structure which can be easily persisted using neo4j . personally, i would prefer sax parser as there are serious memory constraints when creating a dom object (really painful if you talking big data as xml). additionally sax parsing gives you unlimited freedom to do whatever you want to do with the xml. this is how i represent a node, using the object xmlelement . each xmlelement is identified by an id. this is basically an integer or long. only thing to make sure is that the number has to be unique across all xmlelements. publicclassxmlelement { privatestring tagname; privatestring tagvalue; privatemap attributes = newhashmap(); privatehierarchyidentifier hierarchyidentifier; privateintparentid; privatebooleanpersisted; //setters and getters for the members publicstring getatrributestring(){ objectmapper jsonmapper = newobjectmapper(); try{ returnjsonmapper.writevalueasstring(this.attributes); } catch(jsonprocessingexception e) { logger.severe(e.getmessage()); e.printstacktrace(); } returnnull; } } the xml tag name and value are stored as string and the attributes are stored as a map. also, there is another member object, hierarchyidentifier publicclasshierarchyidentifier { privateintdepth; privateintwidth; privateintid; //getters and setters for member element } the hierarchyidentifier class contains the id class which is used to identify an xmlelement , which translates to identifying an xml tag. each xmlelement has the id of it’s parent stored. and after the parse, the xml should be represented as a map of and each xmlelement identified by the corresponding integer id. you can see the sax parser here .. so we pass the map to the graphwriter object. we create a node with the following properties value – the value of the xml tag id – the id of the current node parent – the id of the parent tag attributes – the string representation of the tag attributes also, the node has the following labels node – for all nodes parent – for the seed ( highest parent ) node xml tag name – say, the node for the tag will have label food currently, there is only a single type of relationship, ‘child_of’ , between immediate nested tags. see the below the example for a bigger xml belgian waffles $5.95 two of our famous belgian waffles with plenty of real maple syrup 650 masala dosa $10.95 south india's famous slim pancake with mashed potatoes 650 25 i have added a second item to the ‘breakfast_menu’.. the legendary masala dosa . also, the second food item has a new child tag, . and the resultant graph looks prettier the sets of tags,relationships and properties can be seen in the neo4j local server interface. the entire codebase can be found here in github .. visualizing as a graphgist an easy representation can be done in graphgist. i have created the graph of this xml. here is the link to the gist . also, the gist script can be found in the github gist here ...

May 29, 2014

by Nikhil Kuriakose

· 25,869 Views

Debunking Myths About the VoltDB In-Memory Database

Originally written by John Hugg I’d like to take a quick moment to address some myths and misconceptions about VoltDB. Many people selling products who view VoltDB as competition seem to be repeating them. As you’ll read, much of what’s said is just plain FUD. VoltDB is an in-memory database that has benchmarked at over 3 million transactions a second on bare metal, and recently crushed previous performance records in the cloud, posting eye-popping YCSB (Yahoo! Cloud Service Benchmark) numbers on AWS, Amazon’s cloud platform. It’s fully transactional, supports full disk durability, and has very low latency and fully native high availability. It’s not surprising competing interests look for FUD. Note the comments below apply to the current shipping version of VoltDB at the time of this posting (4.2), unless otherwise noted. Myth #1: “VoltDB requires stored procedures.” This was true for 1.0, but no one seems to notice it’s been false since we shipped 1.1 in 2010. VoltDB supports unforeseen SQL without any stored procedure use. We have users in production who have never used a single stored procedure. SQL queries can be run through: JDBC Our command line SQL console A web portal built into every server The HTTP/JSON interface Native client drivers written in C++, C#, PHP, Python, Node.js and even Java. Community-provided language drivers ODBC drivers (in development to ship in Q2) There is one restriction: VoltDB doesn’t support external transaction control; VoltDB is permanently in auto-commit mode. Note this offers more transactionality than most NoSQL systems, and is a shared restriction of several NewSQL or “In-Memory” SQL systems. For those systems that do offer external transaction control, I’m not aware of any that publish benchmarks where transactions require multiple round trips to the client. This is because it’s generally understood to be a bad thing in a performance-critical system. In addition to ad-hoc SQL, VoltDB auto-generates optimized procedures for various CRUD operations (create, read, update, delete). For example, to fetch a row by primary key from table FOO, I can call a procedure named “FOO.select” with the key value as a parameter. Myth #2: “VoltDB doesn’t support ad-hoc SQL.” This is just a rephrasing of Myth #1 and is still false. Myth #3: “VoltDB is slow unless I use stored procedures.” Well, no. VoltDB can run faster with stored procedures, but it’s still fast if they are not used. In our internal benchmarks on pretty cheap single-socket hardware, we can run about 50k write statements per second, per host with full durability. That’s a bit lower than we can achieve with stored procedures, so we’re doing some engineering work to close the gap. Still, on a 3-node cluster costing about $3,000 for hardware, we can run well over 100k SQL updates per second with durability and double redundancy. If your current system can run that fast, let us know. Also, the auto-generated CRUD procedures mentioned above run at the full speed of pre-optimized procedures. Using just CRUD, VoltDB is the fastest durable Key-Value store I’m aware of. Myth #4: “I have to know Java to use VoltDB.” As of VoltDB 3.0, released over a year ago, (we’re on V4.2 today), a user can build VoltDB apps and run the server without ever directly interacting with the Java CLI tools or any Java code. Client apps can also be written without Java. As mentioned previously, VoltDB supports client drivers in many languages, with community-contributed drivers in others. If you do want to leverage VoltDB’s ability to perform multi-statement, transactional logic, only the most basic understanding of Java is needed to build stored procedures. We provide example applications in our kit that can be copied from and/or modified. We even support declaring procedures in Groovy right in your DDL. Myth #5: “VoltDB has garbage collection problems because it is written in Java.” The short answer is this is just not true. First, VoltDB uses native C++ code for all of its data storage and SQL execution paths. This is done to avoid putting long-lived data on the Java heap, but also to allow very fine control of memory use. VoltDB’s special native data structures use less memory than competing systems and actively fight allocation fragmentation. What’s left on the Java heap is either near-immortal configuration and setup data, or very short-lived per-transaction data. This is almost the perfect workload for the Java garbage collector, so there is minimal impact to the user’s workload. For more detail, read Ariel Weisberg’s excellent blog post on the topic. Myth #6: “VoltDB’s SQL is very limited.” This misconception is based on historical truth, but is way out of date in 2014. The original H-Store project, upon which VoltDB is based, started with very limited, OLTP-focused SQL, but much has changed in the past years. VoltDB is rapidly approaching SQL-92 functionality for DML and DQL SQL. One of the final holdouts, subselects, shipped in part in 4.2; more functionality will ship in upcoming versions. In addition to this ANSI SQL baseline, we’ve added materialized view support, function-based index support, native JSON functionality and more. We have a lot more coming in 2014. We’re proud to announce all of this is fully Unicode-compliant and is in production in several uses in countries with multi-byte character sets. Myth #7: “Yes, VoltDB supports cross-partition transactions, but they are too slow to use.” Yes we do, and no they aren’t too slow to use. Keep in mind that NoSQL systems often don’t offer cross-partition transactions and many other NewSQL systems have functionality or performance limitations. Few benchmarks stress this functionality, so I’ll be as clear as possible here. Traditionally, VoltDB’s cross-partition transactions have been slower than our partitioned operations. Additional coordination needs to happen to make these transactions ACID at the “serializable” isolation level at which all VoltDB transactions run. Furthermore, these operations don’t scale with cluster size and sometimes can vary in performance, depending on your networking performance. But are they too slow to use? I’ll let you be the judge. Cross-partition writes run on the order of 1,000 per second,regardless of cluster size. This number can vary by an order of magnitude depending on hardware. Cross partition reads run on the order of 50,000 per second, regardless of cluster size. Again, there is some variability, but less than with writes. It’s possible to send writes to each partition individually using the “Send Everywhere” pattern. These writes are transactional at each partition, but individual partitions might fail and rollback independently of others. A common pattern we see is a mix of single-partition writes and cross-partition reads. Cross-partition writes are often used to update infrequently changing lookup tables. Of course at 1000 per second, they can still change fairly frequently. Note that if your read operation is scanning millions of tuples, you won’t see the same throughput, but if it’s populating a schema supported leaderboard or computing a time window statistic using a materialized view, you will. The bottom line is that most of our customers not only use cross-partition transactions, but also these kinds of operations are one of the key benefits VoltDB offers. We are always working to improve the performance of cross-partition operations, and in recent releases we have made a lot of progress. In 4.0 we introduced read optimizations that got us to the 50k/second number. We also have write optimizations in the pipeline. In Conclusion Many of these myths and misconceptions are firmly rooted in the world of traditional RDBMSs. VoltDB is a NewSQL database. When evaluating VoltDB, consider the non-traditional features that set us head and shoulders above traditional RDBMSs, NoSQL and other NewSQL options: Intelligent, asynchronous clients allow tremendous throughput per-client, as well as terrific management. Native export connectors push tuples into downstream systems like message queues, analytic data-warehouses and Hadoop Active-Active intra-cluster replication with sub-millisecond latency ensures perfectly consistent high availability and redundant durability. Seamless node failure and replacement without impacting serializable ACID semantics. Transactional cluster expansion with low performance impact and without impacting serializable ACID semantics provides near-limitless scalability. 1500 active, connected clients per server node with minimal latency impact meets the needs of most high-velocity, high transaction volume businesses. VoltDB is open source (https://github.com/VoltDB/voltdb) If you want to hear more, or if you have a use case you want to talk to us about, contact us. We’d love to hear from you. If you’d like to try VoltDB out yourself, you can download a free trial here.

May 28, 2014

by John Piekos

· 9,848 Views · 1 Like

Implementing Correlation IDs in Spring Boot (for Distributed Tracing in SOA/Microservices)

After attending Sam Newman’s microservice talks at Geecon last week I started to think more about what is most likely an essential feature of service-oriented/microservice platforms for monitoring, reporting and diagnostics: correlation ids. Correlation ids allow distributed tracing within complex service oriented platforms, where a single request into the application can often be dealt with by multiple downstream service. Without the ability to correlate downstream service requests it can be very difficult to understand how requests are being handled within your platform. I’ve seen the benefit of correlation ids in several recent SOA projects I have worked on, but as Sam mentioned in his talks, it’s often very easy to think this type of tracing won’t be needed when building the initial version of the application, but then very difficult to retrofit into the application when you do realise the benefits (and the need for!). I’ve not yet found the perfect way to implement correlation ids within a Java/Spring-based application, but after chatting to Sam via email he made several suggestions which I have now turned into a simple project using Spring Boot to demonstrate how this could be implemented. Why? During both of Sam’s Geecon talks he mentioned that in his experience correlation ids were very useful for diagnostic purposes. Correlation ids are essentially an id that is generated and associated with a single (typically user-driven) request into the application that is passed down through the stack and onto dependent services. In SOA or microservice platforms this type of id is very useful, as requests into the application typically are ‘fanned out’ or handled by multiple downstream services, and a correlation id allows all of the downstream requests (from the initial point of request) to be correlated or grouped based on the id. So called ‘distributed tracing’ can then be performed using the correlation ids by combining all the downstream service logs and matching the required id to see the trace of the request throughout your entire application stack (which is very easy if you are using a centralised logging framework such as logstash) The big players in the service-oriented field have been talking about the need for distributed tracing and correlating requests for quite some time, and as such Twitter have created their open source Zipkin framework (which often plugs into their RPC framework Finagle), and Netflix has open-sourced their Karyon web/microservice framework, both of which provide distributed tracing. There are of course commercial offering in this area, one such product being AppDynamics, which is very cool, but has a rather hefty price tag. Creating a proof-of-concept in Spring Boot As great as Zipkin and Karyon are, they are both relatively invasive, in that you have to build your services on top of the (often opinionated) frameworks. This might be fine for some use cases, but no so much for others, especially when you are building microservices. I’ve been enjoying experimenting with Spring Boot of late, and this framework builds on the much known and loved (at least by me :-) ) Spring framework by providing lots of preconfigured sensible defaults. This allows you to build microservices (especially ones that communicate via RESTful interfaces) very rapidly. The remainder of this blog pos explains how I implemented a (hopefully) non-invasive way of implementing correlation ids. Goals Allow a correlation id to be generated for a initial request into the application Enable the correlation id to be passed to downstream services, using as method that is as non-invasive into the code as possible Implementation I have created two projects on GitHub, one containing an implementation where all requests are being handled in a synchronous style (i.e. the traditional Spring approach of handling all request processing on a single thread), and also one for when an asynchronous (non-blocking) style of communication is being used (i.e., using the Servlet 3 asynchronous support combined with Spring’s DeferredResult and Java’s Futures/Callables). The majority of this article describes the asynchronous implementation, as this is more interesting: Spring Boot asynchronous (DeferredResult + Futures) communication correlation id Github repo The main work in both code bases is undertaken by the CorrelationHeaderFilter, which is a standard Java EE Filter that inspects the HttpServletRequest header for the presence of a correlationId. If one is found then we set a ThreadLocal variable in the RequestCorrelation Class (discussed later). If a correlation id is not found then one is generated and added to the RequestCorrelation Class: public class CorrelationHeaderFilter implements Filter { //... @Override public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException { final HttpServletRequest httpServletRequest = (HttpServletRequest) servletRequest; String currentCorrId = httpServletRequest.getHeader(RequestCorrelation.CORRELATION_ID_HEADER); if (!currentRequestIsAsyncDispatcher(httpServletRequest)) { if (currentCorrId == null) { currentCorrId = UUID.randomUUID().toString(); LOGGER.info("No correlationId found in Header. Generated : " + currentCorrId); } else { LOGGER.info("Found correlationId in Header : " + currentCorrId); } RequestCorrelation.setId(currentCorrId); } filterChain.doFilter(httpServletRequest, servletResponse); } //... private boolean currentRequestIsAsyncDispatcher(HttpServletRequest httpServletRequest) { return httpServletRequest.getDispatcherType().equals(DispatcherType.ASYNC); } The only thing is this code that may not instantly be obvious is the conditional check currentRequestIsAsyncDispatcher(httpServletRequest), but this is here to guard against the correlation id code being executed when the Async Dispatcher thread is running to return the results (this is interesting to note, as I initially didn’t expect the Async Dispatcher to trigger the execution of the filter again?) Here is the RequestCorrelation Class, which contains a simple ThreadLocal static variable to hold the correlation id for the current Thread of execution (set via the CorrelationHeaderFilter above) public class RequestCorrelation { public static final String CORRELATION_ID = "correlationId"; private static final ThreadLocal id = new ThreadLocal(); public static String getId() { return id.get(); } public static void setId(String correlationId) { id.set(correlationId); } } Once the correlation id is stored in the RequestCorrelation Class it can be retrieved and added to downstream service requests (or data store access etc) as required by calling the static getId() method within RequestCorrelation. It is probably a good idea to encapsulate this behaviour away from your application services, and you can see an example of how to do this in a RestClient Class I have created, which composes Spring’s RestTemplate and handles the setting of the correlation id within the header transparently from the calling Class. @Component public class CorrelatingRestClient implements RestClient { private RestTemplate restTemplate = new RestTemplate(); @Override public String getForString(String uri) { String correlationId = RequestCorrelation.getId(); HttpHeaders httpHeaders = new HttpHeaders(); httpHeaders.set(RequestCorrelation.CORRELATION_ID, correlationId); LOGGER.info("start REST request to {} with correlationId {}", uri, correlationId); //TODO: error-handling and fault-tolerance in production ResponseEntity response = restTemplate.exchange(uri, HttpMethod.GET, new HttpEntity(httpHeaders), String.class); LOGGER.info("completed REST request to {} with correlationId {}", uri, correlationId); return response.getBody(); } } //... calling Class public String exampleMethod() { RestClient restClient = new CorrelatingRestClient(); return restClient.getForString(URI_LOCATION); //correlation id handling completely abstracted to RestClient impl } Making this work for asynchronous requests… The code included above works fine when you are handling all of your requests synchronously, but it is often a good idea in a SOA/microservice platform to handle requests in a non-blocking asynchronous manner. In Spring this can be achieved by using the DeferredResult Class in combination with the Servlet 3 asynchronous support. The problem with using ThreadLocal variables within the asynchronous approach is that the Thread that initially handles the request (and creates the DeferredResult/Future) will not be the Thread doing the actual processing. Accordingly, a bit of glue code is needed to ensure that the correlation id is propagated across the Threads. This can be achieved by extending Callable with the required functionality: (don’t worry if example Calling Class code doesn’t look intuitive – this adaption between DeferredResults and Futures is a necessary evil within Spring, and the full code including the boilerplate ListenableFutureAdapter is in my GitHub repo): public class CorrelationCallable implements Callable { private String correlationId; private Callable callable; public CorrelationCallable(Callable targetCallable) { correlationId = RequestCorrelation.getId(); callable = targetCallable; } @Override public V call() throws Exception { RequestCorrelation.setId(correlationId); return callable.call(); } } //... Calling Class @RequestMapping("externalNews") public DeferredResult externalNews() { return new ListenableFutureAdapter<>(service.submit(new CorrelationCallable<>(externalNewsService::getNews))); } And there we have it – the propagation of correlation id regardless of the synchronous/asynchronous nature of processing! You can clone the Github report containing my asynchronous example, and execute the application by running mvn spring-boot:run at the command line. If you access http://localhost:8080/externalNews in your browser (or via curl) you will see something similar to the following in your Spring Boot console, which clearly demonstrates a correlation id being generated on the initial request, and then this being propagated through to a simulated external call (have a look in the ExternalNewsServiceRest Class to see how this has been implemented): [nio-8080-exec-1] u.c.t.e.c.w.f.CorrelationHeaderFilter : No correlationId found in Header. Generated : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : start REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 [nio-8080-exec-2] u.c.t.e.c.w.f.CorrelationHeaderFilter : Found correlationId in Header : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : completed REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 Conclusion I’m quite happy with this simple prototype, and it does meet the two goals I listed above. Future work will include writing some tests for this code (shame on me for not TDDing!), and also extend this functionality to a more realistic example. I would like to say a massive thanks to Sam, not only for sharing his knowledge at the great talks at Geecon, but also for taking time to respond to my emails. If you’re interested in microservices and related work I can highly recommend Sam’s Microservice book which is available in Early Access at O’Reilly. I’ve enjoyed reading the currently available chapters, and having implemented quite a few SOA projects recently I can relate to a lot of the good advice contained within. I’ll be following the development of this book with keen interest! If you have any comments or thoughts then please do share them via the comment below, or feel free to get in touch via the usual mechanisms! References I used Tomasz Nurkiewicz’s excellent blog several times for learning how best to wire up all of the DeferredResult/Future code in Spring: http://www.nurkiewicz.com/2013/03/deferredresult-asynchronous-processing.html

May 28, 2014

by Daniel Bryant

· 73,844 Views · 2 Likes

Spring/Hibernate Improved SQL Logging with log4jdbc

Hibernate provides SQL logging out of the box, but such logging only shows prepared statements, and not the actual SQL queries sent to the database. It also does not log the execution time of each query, which is useful for performance troubleshooting. This blog post will go over how to setup Hibernate query logging, and then compare it to the logging that can be obtained with log4jdbc . The Hibernate query logging functionality Hibernate does not log the real SQL queries sent to the database. This is because Hibernate interacts with the database via the JDBC driver, to which it sends prepared statements but not the actual queries. So Hibernate can only log the prepared statements and the values of their binding parameters, but not the actual SQL queries themselves. This is how a query looks like when logged by Hibernate: select /* load your.package.Employee */ this_.code, ... from employee this_ where this_.employee_id=? TRACE 12-04-2014@16:06:02 BasicBinder - binding parameter [1] as [NUMBER] - 1000 See this post Hibernate Debugging - Finding the origin of a Query for how to setup this type of logging. Using log4jdbc For a developer it's useful to be able to copy paste a query from the log and be able to execute the query directly in an SQL client, but the variable placeholders ? make that unfeasible. Log4jdbc in an open source tool that allows to do just that, and more. Log4jdbc is a spy driver that will wrap itself around the real JDBC driver, logging queries as they go through it. The version linked from this post provides Spring integration, unlike several other log4jdbc forks. Setting up log4jdbc First include the log4jdbc-remix library in your pom.xml. This library is a fork of the original log4jdbc: org.lazyluke log4jdbc-remix0.2.7 Next, find in the Spring configuration the definition of the data source. As an example, when using the JNDI lookup element this is how the data source looks like: After finding the data source definition, rename it to the following name: Then define a new log4jdbc data source that wraps the real data source, and give it the original name: With this configuration, the query logging should already be working. It's possible to customize the logging level of the several log4jdbc loggers available. The original log4jdbc documentation provides more information on the available loggers: jdbc.sqlonly: Logs only SQL jdbc.sqltiming: Logs the SQL, post-execution, including timing execution statistics jdbc.audit: Logs ALL JDBC calls except for ResultSets jdbc.resultset: all calls to ResultSet objects are logged jdbc.connection: Logs connection open and close events The jdbc.audit logger is especially useful to validate the scope of transactions, as it logs the begin/commit/rollback events of a database transaction. This is the proposed log4j configuration that will print only the SQL queries together with their execution time: Conclusion Using log4jdbc does imply some initial setup, but once it's in place it's really convenient to have. Having a true query log is also useful for performance troubleshooting (to be described in a future post).

May 27, 2014

by Vasco Cavalheiro

· 20,248 Views · 1 Like

DO... WHILE and REPEAT... UNTIL Loops in MS SQL

Introduction When I am looking for a forum post related to SQL Server, one of the junior professional is asking how to use a DO…WHILE loop is MS SQL Server. Several people wrote their opinion related to it. Everyone is saying to use WHILE loop and some of them suggesting with T-SQL structure of CURSOR with WHILE LOOP. Obviously, when a junior professional is learning MS SQL server, the question in mind arises: is there DO… WHILE, REPEAT … UNTIL loop present in MS SQL Server as there is in C or C++ etc? No one is answering directly on the forum whether we can use DO… WHILE or REPEAT … UNTIL in MS SQL Server or NOT. If yes, how can we implement them? DO… WHILE in MS SQL Sever First we look at the algorithm of DO… WHILE. SET X = 1 DO PRINT X SET X = X + 1 WHILE X <= 10 Now we try to implement it in MS SQL Server. DECLARE @X INT=1; WAY: --> Here the DO statement PRINT @X; SET @X += 1; IF @X<=10 GOTO WAY; --> Here the WHILE @X<=1 REPEAT… UNTIL First we look at the algorithm of REPEAT... UNTIL SET X = 1 REPEAT PRINT X SET X = X + 1 UNTIL X > 10 Now we try to implement it in MS SQL Server DECLARE @X INT = 1; WAY: -- Here the REPEAT statement PRINT @X; SET @X += 1; IFNOT(@X >1 0) GOTO WAY; -- Here the UNTIL @X>10 So we see that it is possible, but a little complicated… So most developers prefer the WHILE loop in MS SQL Server.

May 26, 2014

by Joydeep Das

· 104,725 Views

String Interning — What, Why, and When?

Learn about string interning, a method of storing only one copy of each distinct string value, which must be immutable.

May 23, 2014

by Saurabh Chhajed

· 146,392 Views · 13 Likes

New report looks at the role of Chambers of Commerce

The business world is an increasingly complex one. In the past few IBM CEO surveys, they have highlighted the growing importance of both being able to manage this complexity, and to do so in a collaborative way. This shifting zeitgeist was reflected in a series of seminars hosted by Xincus, and prompted the launch of a two phase study into how Chambers of Commerce can evolve within this new landscape. The study, consisting of in depth one on one interviews and a nationwide online survey, aimed to better understand both how Chambers can adapt, and what changes would be required to do so. The findings from this research are now available in a new paper called Chamber 2.0: Digital – Connected – Global. The paper outlines both the main challenges currently facing Chambers, and the steps they can take to thrive in such an environment. Amongst the main challenges identified by the research was a fundamental desire to change and modernize, with a strategic positioning and business model that would allow Chambers to flourish. There was also a strong desire to work more effectively with partners, both inside and outside of the Chamber network, sharing both resources and insights. The report then concluded with a road map derived by molding these findings from within the network with best practice from the wider business world. The road-map consists of five broad stages, with each one containing more detailed steps Chambers can take to prepare for the modern world. Become a one stop shop for members, including positioning the Chamber brand for the modern world as centers for Business, Innovation, and Economic Development with a new and modernized approach to business that sees an adaptive and responsive leadership style essential to a revitalize business model. Offer new value, with a new emphasis on virtual services to reflect modern ways of working. Chambers will become a solution hub that connects and match makes members, with co-working spaces connecting the physical and virtual worlds. Collaborate beyond borders, by building an extensive Chamber alliance network, allowing Chambers to become specialized regional hubs, whilst tapping into the collective wisdom of the entire network as well as offering “health-club” type e-memberships to professionals, academics, entrepreneurs and “free agent” millennials alike. Nurture new economic development, by facilitating entrepreneurial collaboration between members and stakeholders, connecting the right people with the right resources, helping to forge an innovation economy and a thriving business community and jobs. Foster global innovation ecosystems, by tying all of these communities together to form a hyperconnected ecosystem, with Chambers at its heart, thus empowering the next wave of new economic development around the world. The report makes clear that whilst change is desired, the network remains positive that the right developments will occur. With Chambers striving to maintain their position at the heart of the business community, this report will go some way towards helping them achieve that goal. You can get your copy of the report here. Original post

May 22, 2014

by Adi Gaskell

· 3,429 Views