Data Resources

The Latest Data Topics

This is taken from a popular answer to the question Moving decimal places over in a double Question So I have a double set to equal 1234, I want to move a decimal place over to make it 12.34 So to do this I multiply .1 to 1234 two times, kinda like this double x = 1234; for(int i = 1; i <= 2; i++) { x = x*.1; } System.out.println(x); This will print the result, "12.340000000000002" Is there a way, without simply formatting it to two decimal places, to have the double store 12.34 correctly? Answer If you use double or float, you should use rounding or expect to see some rounding errors. If you can't do this, use BigDecimal. The problem you have is that 0.1 is not an exact representation, and by performing the calculation twice, you are compounding that error. However, 100 can be represented accurately, so try: double x = 1234; x /= 100; System.out.println(x); which prints: 12.34 This works because Double.toString(d) performs a small amount of rounding on your behalf, but it is not much. If you are wondering what it might look like without rounding: System.out.println(new BigDecimal(0.1)); System.out.println(new BigDecimal(x)); prints 0.100000000000000005551115123125782702118158340454101562 12.339999999999999857891452847979962825775146484375 In short, rounding is unavoidable for sensible answers in floating point whether you are doing this explicitly or not. Note: x / 100 and x * 0.01 are not exactly the same when it comes to rounding error. This is because the round error for the first expression depends on the values of x, whereas the 0.01 in the second has a fixed round error. for(int i = 0; i < 200; i++) { double d1 = (double) i / 100; double d2 = i * 0.01; if (d1 != d2) System.out.println(d1 + " != "+d2); } prints 0.35 != 0.35000000000000003 0.41 != 0.41000000000000003 0.47 != 0.47000000000000003 0.57 != 0.5700000000000001 0.69 != 0.6900000000000001 0.7 != 0.7000000000000001 0.82 != 0.8200000000000001 0.83 != 0.8300000000000001 0.94 != 0.9400000000000001 0.95 != 0.9500000000000001 1.13 != 1.1300000000000001 1.14 != 1.1400000000000001 1.15 != 1.1500000000000001 1.38 != 1.3800000000000001 1.39 != 1.3900000000000001 1.4 != 1.4000000000000001 1.63 != 1.6300000000000001 1.64 != 1.6400000000000001 1.65 != 1.6500000000000001 1.66 != 1.6600000000000001 1.88 != 1.8800000000000001 1.89 != 1.8900000000000001 1.9 != 1.9000000000000001 1.91 != 1.9100000000000001

November 25, 2012

by Peter Lawrey

· 15,454 Views

A Node.js speed dilemma: AJAX or Socket.IO?

Originally posted by Daniel Chirca One the first things I stumbled upon when I started my first Node.js project was how to handle the communication between the browser (the client) and my middleware (the middleware being a Node.js application using the CUBRID Node.js driver (node-cubrid) to exchange information with a CUBRID 8.4.1 database). I am already familiar with AJAX (btw, thanks God for jQuery!! ) but, while studying Node.js, I found out about the Socket.IO module and even found some pretty nice code examples on the internet... Examples which were very-very easy to (re)use... So this quickly become a dilemma: what to choose, AJAX or sockets.io? Obviously, as my experience was quite limited, I needed first more information from out there... In other words, it was time to do some quality Google search :) There’s a lot of information available and, obviously, one would need to filter out all the “noise” and keep what is really useful. Let me share with you some of the goods links I found on the topic: http://stackoverflow.com/questions/7193033/nodejs-ajax-vs-socket-io-pros-and-cons http://podefr.tumblr.com/post/22553968711/an-innovative-way-to-replace-ajax-and-jsonp-using http://stackoverflow.com/questions/4848642/what-is-the-disadvantage-of-using-websocket-socket-io-where-ajax-will-do?rq=1 http://howtonode.org/websockets-socketio To summarize, here’s what I quickly found: Socket.IO (usually) uses persistent connection between the client and the server (the middleware), so you can reach a maximum limit of concurrent connections depending on the resources you have on server side (while more AJAX async requests can be served with the same resources). With AJAX you can do RESTful requests. This means that you can take advantage of existing HTTP-infrastructure like e.g. proxies to cache requests and use conditional get requests. There is more (communication) data overhead in AJAX when compared to Socket.IO (HTTP headers, cookies etc.) AJAX is usually faster than Socket.IO to “code”... When using Socket.IO, it is possible to have a two-way communication where each side – client or server - can initiate a request. In AJAX, it is only the client who can initiate a request! Socket.IO has more transport options, including Adobe Flash. Now, for my own application, what I was most interested in was the speed of making requests and getting data from the (Node.js) server! Regarding the middleware data communication with the CUBRID database, as ~90% of my data access was read-only, a good data caching mechanism is obviously a great way to go! But about this, I’ll talk next time. So I decided to put up their (AJAX and socket.io) speed to test, to see which one is faster (at least on my hardware & software environment)....! My middleware was setup to run on an i5 processor, 8GB of RAM and an Intel X25 SSD drive. But seriously, every speed test and, generally speaking, any performance test depends so much(!) on your hardware and software configuration, that it is always a great idea to try the things on your own environment, rely less on various information you find on internet and more on your own findings! The tests I decided to do have to meet the following requirements: Test: AJAX Socket.IO persistent connection Socket.IO non-persistent connections Test 10, 100, 250 and 500 data exchanges between the client and the server Each data exchange between the middleware SERVER (a Node.js web server) and the client (a browser) is a 4KBytes random data string Run the server in release (not debug) mode Use Firefox as the client Minimize the console messages output, for both server and client Do each test after a client full page reload Repeat each test at least 3 times, to make sure the results are consistent Testing Socket.IO, using a persistent connection I've created a small Node.js server, which was handling the client requests: io.sockets.on('connection', function (client) { client.on('send_me_data', function (idx) { client.emit('you_have_data', idx, random_string(4096)); }); }); And this is the JS client script I used for test: var socket = io.connect(document.location.href); socket.on('you_have_data', function (idx, data) { var end_time = new Date(); total_time += end_time - start_time; logMsg(total_time + '(ms.) [' + idx + '] - Received ' + data.length + ' bytes.'); if (idx++ < countMax) { setTimeout(function () { start_time = new Date(); socket.emit('send_me_data', idx); }, 500); } }); Testing Socket.IO, using NON-persistent connection This time, for each data exchange, I opened a new socket-io connection. The Node.js server code was similar with the previous one, but I decided to send back the client data immediately after connect, as a new connection was initiated every time, for each data exchange: io.sockets.on('connection', function (client) { client.emit('you_have_data', random_string(4096)); }); The client test code was: function exchange(idx) { var start_time = new Date(); var socket = io.connect(document.location.href, {'force new connection' : true}); socket.on('you_have_data', function (data) { var end_time = new Date(); total_time += end_time - start_time; socket.removeAllListeners(); socket.disconnect(); logMsg(total_time + '(ms.) [' + idx + '] - Received ' + data.length + ' bytes.'); if (idx++ < countMax) { setTimeout(function () { exchange(idx); }, 500); } }); } Testing AJAX Finally, I put AJAX to test... The Node.js server code was, again, not that different from the previous ones: res.writeHead(200, {'Content-Type' : 'text/plain'}); res.end('_testcb(\'{"message": "' + random_string(4096) + '"}\')'); As for the client code, this is what I used to test: function exchange(idx) { var start_time = new Date(); $.ajax({ url : 'http://localhost:8080/', dataType : "jsonp", jsonpCallback : "_testcb", timeout : 300, success : function (data) { var end_time = new Date(); total_time += end_time - start_time; logMsg(total_time + '(ms.) [' + idx + '] - Received ' + data.length + ' bytes.'); if (idx++ < countMax) { setTimeout(function () { exchange(idx); }, 500); } }, error : function (jqXHR, textStatus, errorThrown) { alert('Error: ' + textStatus + " " + errorThrown); } }); } Remember, when coding together AJAX and Node.js, you need to take into account the you might be doing cross-domain requests and violating same origin policy, therefore you should use the JSONP based format! Btw, as you can see, I quoted only the most significant parts of the test code, to save space. If anyone needs the full code, server and client, please let me know – I’ll be happy to share them. OK – it’s time now to see what we got after all this work! I have run each test for 10, 100, 250 and 500 data exchanges and this is what I got in the end: Data exchanges Socket.IO NON-persistent (ms.) AJAX (ms.) Socket.IO persistent (ms.) 10 90 40 32 100 900 320 340 250 2,400 800 830 500 4,900 1,500 1,600 Looking into the results, we can notice a few things right away: For each type of test, the results behave quite linear; this is good – it shows that the results are consistent. The results clearly show that when using Socket.IO non-persistent connections, the performance numbers are significantly worse than others. It doesn’t seem to be a big difference between AJAX and the Socket.IO persistent connections – we are talking only about some milliseconds differences. This means that if you can live with less than 10,000 data exchanges per day, for example, there are high chances that the user won’t notice a speed difference... The graph below illustrates the numbers I obtained in test: ...So what’s next...? ...Well, I have to figure out what kind of traffic I need to support and then I will re-run the tests for those numbers, but this time excluding Socket.IO non-persistent connections. That’s because it is obvious that I need to choose between AJAX and persistent Socket.IO connections. And I also learned that, most probably, the difference in speed would not be as much as one would expect... at least not for a “small-traffic” web site, so I need to start looking into other advantages and disadvantages for each approach/technology when choosing my solution! That’s pretty much for this post - see you next time with a post about Node.js and caching! P.S. Here are a few more nice resources to find interesting stuff about Node.js, Socket.IO and AJAX: http://socket.io/#how-to-use http://www.hacksparrow.com/jquery-with-node-js.html http://www.slideshare.net/toddeichel/nodejs-talk-at-jquery-pittsburgh http://tech.burningbird.net/article/node-references-and-resources http://davidwalsh.name/websocket

November 22, 2012

by Esen Sagynov

· 17,436 Views

API Server Design - Making De-Normalization the Norm

In database design classes in Computer Science, we learn that normalization is a good thing. And it certainly is a good thing, for databases. In the case of APIs, it is a different story. If a client must do multiple GETs to obtain the data it needs, or multiple PUTs or POSTs to send up data, just because your database happens to be normalized, then something is wrong. One of the functions of an API Server is to de-normalize your data so that clients are spared from making extra REST API calls, with all of the overhead which goes with that. Mugunth Kumar explains this very well in this excellent presentation, using Twitter as an example. When you do a GET on a tweet, it not only returns you the Tweet itself, but also other information (e.g. description of the Twitter user who sent the tweet). This saves the API client (often a mobile app) from making another request for that data. Effectively, the API Server has gathered up that data, which may come from different database tables, and de-normalized it for the response. You can try it out yourself here, by looking at the JSON which comes back from this Twitter API GET the most recent Tweet from my timeline. Many Vordel customers are using the API Server to gather together the data which is returned to the API clients, often taking this data from multiple sources (not only databases, but also message queues and even from other APIs). This data is then amalgamated into single JSON or XML structures. It often then cached at the API Server, in this structure. In this way, clients are spared from doing multiple calls, and instead (like the Twitter API example above) get the data they need in one request, or can PUT or POST up data in one action, rather than piecemeal. De-normalization is key to this process, and is one of the great benefits of an API Server.

November 21, 2012

by Oren Eini

· 9,872 Views

Overflow And Underflow of Data Types in Java

Overflow and underflow of values of various data types is a very common occurence in Java programs. This is usually because the beginners dont' pay proper attention to the default values of various data types. If we are creating a byte type variable and assigning it a value, we should be aware that the value will be treated as an int and hence a potential overflow condition. In Java the overflow and underflow are more serious because there is no warning or exception raised by the JVM when such a condition occurs. Some developers argue that the program should either crash or raise exception in such case but the decision for adding such behavior is in the hands of creators of programming language. By looking at a problem in your program, you can't straightway tell that an overflow or underflow condition has occured. It is only after debugging that we come to know of the real cause. Overflow in int As int data type is 32 bit in Java, any value that surpasses 32 bits gets rolled over. In numerical terms, it means that after incrementing 1 on Integer.MAX_VALUE (2147483647), the returned value will be -2147483648. In fact you don't need to remember these values and the constants Integer.MIN_VALUE and Integer.MAX_VALUE can be used. Underflow of int Underflow is the opposite of overflow. While we reach the upper limit in case of overflow, we reach the lower limit in case of underflow. Thus after decrementing 1 from Integer.MIN_VALUE, we reach Integer.MAX_VALUE. Here we have rolled over from the lowest value of int to the maximum value. For non-integer based data types, the overflow and underflow result in INFINITY and ZERO values. You may try the following lines to verify this: float f = 3.4028235E38f * 20f; System.out.println(f); Note: As with int data type, we have wrappers for all primitive data types. So we can easily see the upper and lower limit of each data type by looking at the MAX_VALUE and MIN_VALUE constants in these wrapper classes. Read more: http://extreme-java.blogspot.com/2012/11/overflow-and-underflow-of-data-types-in.html#ixzz2BvqFu7fk

November 15, 2012

by Sandeep Bhandari

· 69,065 Views · 1 Like

Integration Testing with MongoDB & Spring Data

Integration Testing is an often overlooked area in enterprise development. This is primarily due to the associated complexities in setting up the necessary infrastructure for an integration test. For applications backed by databases, it’s fairly complicated and time-consuming to setup databases for integration tests, and also to clean those up once test is complete (ex. data files, schemas etc.), to ensure repeatability of tests. While there have been many tools (ex. DBUnit) and mechanisms (ex. rollback after test) to assist in this, the inherent complexity and issues have been there always. But if you are working with MongoDB, there’s a cool and easy way to do your unit tests, with almost the simplicity of writing a unit test with mocks. With ‘EmbedMongo’, we can easily setup an embedded MongoDB instance for testing, with in-built clean up support once tests are complete. In this article, we will walkthrough an example where EmbedMongo is used with JUnit for integration testing a Repository Implementation. Here’s the technology stack that we will be using. MongoDB 2.2.0 EmbedMongo 1.26 Spring Data – Mongo 1.0.3 Spring Framework 3.1 The Maven POM for the above setup looks like this. 4.0.0 com.yohanliyanage.blog.mongoit mongo-it 1.0 org.springframework.data spring-data-mongodb 1.0.3.RELEASE compile junit junit 4.10 test org.springframework spring-context 3.1.3.RELEASE compile de.flapdoodle.embed de.flapdoodle.embed.mongo 1.26 test Or if you prefer Gradle (by the way, Gradle is an awesome build tool which you should check out if you haven’t done so already). apply plugin: 'java' apply plugin: 'eclipse' sourceCompatibility = 1.6 group = "com.yohanliyanage.blog.mongoit" version = '1.0' ext.springVersion = '3.1.3.RELEASE' ext.junitVersion = '4.10' ext.springMongoVersion = '1.0.3.RELEASE' ext.embedMongoVersion = '1.26' repositories { mavenCentral() maven { url 'http://repo.springsource.org/release' } } dependencies { compile "org.springframework:spring-context:${springVersion}" compile "org.springframework.data:spring-data-mongodb:${springMongoVersion}" testCompile "junit:junit:${junitVersion}" testCompile "de.flapdoodle.embed:de.flapdoodle.embed.mongo:${embedMongoVersion}" } To begin with, here’s the document that we will be storing in Mongo. package com.yohanliyanage.blog.mongoit.model; import org.springframework.data.mongodb.core.index.Indexed; import org.springframework.data.mongodb.core.mapping.Document; /** * A Sample Document. * * @author Yohan Liyanage * */ @Document public class Sample { @Indexed private String key; private String value; public Sample(String key, String value) { super(); this.key = key; this.value = value; } public String getKey() { return key; } public void setKey(String key) { this.key = key; } public String getValue() { return value; } public void setValue(String value) { this.value = value; } } To assist with storing and managing this document, let’s write up a simple Repository implementation. The Repository Interface is as follows. package com.yohanliyanage.blog.mongoit.repository; import java.util.List; import com.yohanliyanage.blog.mongoit.model.Sample; /** * Sample Repository API. * * @author Yohan Liyanage * */ public interface SampleRepository { /** * Persists the given Sample. * @param sample */ void save(Sample sample); /** * Returns the list of samples with given key. * @param sample * @return */ List findByKey(String key); } And the implementation… package com.yohanliyanage.blog.mongoit.repository; import java.util.List; import static org.springframework.data.mongodb.core.query.Query.query; import static org.springframework.data.mongodb.core.query.Criteria.*; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.mongodb.core.MongoOperations; import org.springframework.stereotype.Repository; import com.yohanliyanage.blog.mongoit.model.Sample; /** * Sample Repository MongoDB Implementation. * * @author Yohan Liyanage * */ @Repository public class SampleRepositoryMongoImpl implements SampleRepository { @Autowired private MongoOperations mongoOps; /** * {@inheritDoc} */ public void save(Sample sample) { mongoOps.save(sample); } /** * {@inheritDoc} */ public List findByKey(String key) { return mongoOps.find(query(where("key").is(key)), Sample.class); } /** * Sets the MongoOps implementation. * * @param mongoOps the mongoOps to set */ public void setMongoOps(MongoOperations mongoOps) { this.mongoOps = mongoOps; } } To wire this up, we need a Spring Bean Configuration. Note that we do not need this for testing. But for the sake of completion, I have included this. The XML configuration is as follows. And now we are ready to write the Integration Test for our Repository Implementation using Embed Mongo. Ideally, the integration tests should be placed in a separate source directory, just like we place our unit tests (ex. src/test/java => src/integration-test/java). However, neither Maven nor Gradle supports this out of the box (yet – v1.2. For Gradle, there’s an on going discussion for this facility). Nevertheless, both Maven and Gradle are flexible, so you can configure the POM / build.gradle to handle this. However, to keep this discussion simple and focused, I will be placing the Integration Tests in the ‘src/test/java’, but I do not recommend this for a real application. Let’s start writing up the Integration Test. First, let’s begin with a simple JUnit based Test for the methods. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.fail; import org.junit.After; import org.junit.Before; import org.junit.Test; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private SampleRepositoryMongoImpl repoImpl; @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); } @After public void tearDown() throws Exception { } @Test public void testSave() { fail("Not yet implemented"); } @Test public void testFindByKey() { fail("Not yet implemented"); } } When this JUnit Test Case initializes, we need to fire up EmbedMongo to start an embedded Mongo server. Also, when the Test Case ends, we need to cleanup the DB. The below code snippet does this. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.fail; import java.io.IOException; import org.junit.*; import org.springframework.data.mongodb.core.MongoTemplate; import com.mongodb.Mongo; import com.yohanliyanage.blog.mongoit.model.Sample; import de.flapdoodle.embed.mongo.MongodExecutable; import de.flapdoodle.embed.mongo.MongodProcess; import de.flapdoodle.embed.mongo.MongodStarter; import de.flapdoodle.embed.mongo.config.MongodConfig; import de.flapdoodle.embed.mongo.config.RuntimeConfig; import de.flapdoodle.embed.mongo.distribution.Version; import de.flapdoodle.embed.process.extract.UserTempNaming; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private static final String LOCALHOST = "127.0.0.1"; private static final String DB_NAME = "itest"; private static final int MONGO_TEST_PORT = 27028; private SampleRepositoryMongoImpl repoImpl; private static MongodProcess mongoProcess; private static Mongo mongo; private MongoTemplate template; @BeforeClass public static void initializeDB() throws IOException { RuntimeConfig config = new RuntimeConfig(); config.setExecutableNaming(new UserTempNaming()); MongodStarter starter = MongodStarter.getInstance(config); MongodExecutable mongoExecutable = starter.prepare(new MongodConfig(Version.V2_2_0, MONGO_TEST_PORT, false)); mongoProcess = mongoExecutable.start(); mongo = new Mongo(LOCALHOST, MONGO_TEST_PORT); mongo.getDB(DB_NAME); } @AfterClass public static void shutdownDB() throws InterruptedException { mongo.close(); mongoProcess.stop(); } @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); template = new MongoTemplate(mongo, DB_NAME); repoImpl.setMongoOps(template); } @After public void tearDown() throws Exception { template.dropCollection(Sample.class); } @Test public void testSave() { fail("Not yet implemented"); } @Test public void testFindByKey() { fail("Not yet implemented"); } } The initializeDB() method is annotated with @BeforeClass to start this before test case beings. This method fires up an embedded MongoDB instance which is bound to the given port, and exposes a Mongo object which is set to use the given database. Internally, EmbedMongo creates the necessary data files in temporary directories. When this method executes for the first time, EmbedMongo will download the necessary Mongo implementation (denoted by Version.V2_2_0 in above code) if it does not exist already. This is a nice facility specially when it comes to Continuous Integration servers. You don’t have to manually setup Mongo in each of the CI servers. That’s one less external dependency for the tests. In the shutdownDB() method, which is annotated with @AfterClass, we stop the EmbedMongo process. This triggers the necessary cleanups in EmbedMongo to remove the temporary data files, restoring the state to where it was before Test Case was executed. We have now updated setUp() method to build a Spring MongoTemplate object which is backed by the Mongo instance exposed by EmbedMongo, and to setup our RepoImpl with that template. The tearDown() method is updated to drop the ‘Sample’ collection to ensure that each of our test methods start with a clean state. Now it’s just a matter of writing the actual test methods. Let’s start with the save method test. @Test public void testSave() { Sample sample = new Sample("TEST", "2"); repoImpl.save(sample); int samplesInCollection = template.findAll(Sample.class).size(); assertEquals("Only 1 Sample should exist collection, but there are " + samplesInCollection, 1, samplesInCollection); } We create a Sample object, pass it to repoImpl.save(), and assert to make sure that there’s only one Sample in the Sample collection. Simple, straight-forward stuff. And here’s the test method for findByKey method. @Test public void testFindByKey() { // Setup Test Data List samples = Arrays.asList( new Sample("TEST", "1"), new Sample("TEST", "25"), new Sample("TEST2", "66"), new Sample("TEST2", "99")); for (Sample sample : samples) { template.save(sample); } // Execute Test List matches = repoImpl.findByKey("TEST"); // Note: Since our test data (populateDummies) have only 2 // records with key "TEST", this should be 2 assertEquals("Expected only two samples with key TEST, but there are " + matches.size(), 2, matches.size()); } Initially, we setup the data by adding a set of Sample objects into the data store. It’s important that we directly use template.save() here, because repoImpl.save() is a method under-test. We are not testing that here, so we use the underlying “trusted” template.save() during data setup. This is a basic concept in Unit / Integration testing. Then we execute the method under test ‘findByKey’, and assert to ensure that only two Samples matched our query. Likewise, we can continue to write more tests for each of the repository methods, including negative tests. And here’s the final Integration Test file. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.*; import java.io.IOException; import java.util.Arrays; import java.util.List; import org.junit.*; import org.springframework.data.mongodb.core.MongoTemplate; import com.mongodb.Mongo; import com.yohanliyanage.blog.mongoit.model.Sample; import de.flapdoodle.embed.mongo.MongodExecutable; import de.flapdoodle.embed.mongo.MongodProcess; import de.flapdoodle.embed.mongo.MongodStarter; import de.flapdoodle.embed.mongo.config.MongodConfig; import de.flapdoodle.embed.mongo.config.RuntimeConfig; import de.flapdoodle.embed.mongo.distribution.Version; import de.flapdoodle.embed.process.extract.UserTempNaming; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private static final String LOCALHOST = "127.0.0.1"; private static final String DB_NAME = "itest"; private static final int MONGO_TEST_PORT = 27028; private SampleRepositoryMongoImpl repoImpl; private static MongodProcess mongoProcess; private static Mongo mongo; private MongoTemplate template; @BeforeClass public static void initializeDB() throws IOException { RuntimeConfig config = new RuntimeConfig(); config.setExecutableNaming(new UserTempNaming()); MongodStarter starter = MongodStarter.getInstance(config); MongodExecutable mongoExecutable = starter.prepare(new MongodConfig(Version.V2_2_0, MONGO_TEST_PORT, false)); mongoProcess = mongoExecutable.start(); mongo = new Mongo(LOCALHOST, MONGO_TEST_PORT); mongo.getDB(DB_NAME); } @AfterClass public static void shutdownDB() throws InterruptedException { mongo.close(); mongoProcess.stop(); } @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); template = new MongoTemplate(mongo, DB_NAME); repoImpl.setMongoOps(template); } @After public void tearDown() throws Exception { template.dropCollection(Sample.class); } @Test public void testSave() { Sample sample = new Sample("TEST", "2"); repoImpl.save(sample); int samplesInCollection = template.findAll(Sample.class).size(); assertEquals("Only 1 Sample should exist in collection, but there are " + samplesInCollection, 1, samplesInCollection); } @Test public void testFindByKey() { // Setup Test Data List samples = Arrays.asList( new Sample("TEST", "1"), new Sample("TEST", "25"), new Sample("TEST2", "66"), new Sample("TEST2", "99")); for (Sample sample : samples) { template.save(sample); } // Execute Test List matches = repoImpl.findByKey("TEST"); // Note: Since our test data (populateDummies) have only 2 // records with key "TEST", this should be 2 assertEquals("Expected only two samples with key TEST, but there are " + matches.size(), 2, matches.size()); } } On a side note, one of the key concerns with Integration Tests is the execution time. We all want to keep our test execution times as low as possible, ideally a couple of seconds to make sure that we can run all the tests during CI, with minimal build and verification times. However, since Integration Tests rely on underlying infrastructure, usually Integration Tests take time to run. But with EmbedMongo, this is not the case. In my machine, above test suite runs in 1.8 seconds, and each test method takes only .166 seconds max. See the screenshot below. I have uploaded the code for above project into GitHub. You can download / clone it from here: https://github.com/yohanliyanage/blog-mongo-integration-tests. For more information regarding EmbedMongo, refer to their site at GitHub https://github.com/flapdoodle-oss/embedmongo.flapdoodle.de.

November 11, 2012

by Yohan Liyanage

· 26,504 Views

How to do a presentation in China? Some of my experiences

So the culture is different from Western culture we all know that! I am certainly not an expert on China but after living in China for almost 2 years knowing some language and working in a chinese company seeing presentations every week and also visiting over 30 western and chinese companies placed in China I think I have some insights about how you should organize your presentation in China. Since I recently went to Shanghai in order to to research exchange with Jiaotong University I was about to give a presentation to introduce my institute and me. So here you can find my rather uncommon presentation and some remarks, why some slides where designed in the way they are. http://www.rene-pickhardt.de/wp-content/uploads/2012/11/ApexLabIntroductionOfWeST.pdf Guanxi – your relations First of all I think it is really important to understand that in China everything is related to your relations (http://en.wikipedia.org/wiki/Guanxi). A chinese business card will always name a view of your best and strongest contacts. This is more important than your adress for example. If a conference starts people exchange namecards before they sit down and discuss. This principle of Guanxi is also reflected in the style presentations are made. Here are some basic rules: Show pictures of people you worked together with Show pictures of groups while you organized events Show pictures of the panels that run events Show your partners (for business not only clients but also people you are buying from or working together with in general) My way of respecting these principles: I first showed a group picture of our institute! I also showed for almost every project where I could get hold of it pictures of the people that are responsible for the project I did not only show the European research projects our university is in but listed all the different partners and showed logos of them Family The second thing is that in China the concept of family is very important. I would say as a rule of thumb if you want to make business with someone in china and you havent been introduced to their family things are not going like you might expect this. For this reason I have included some slides with a worldmap going further down to the place where I was born and where I studied and where my parents still leave! Localizing When I choosed a worldmap I did not only take one with Chinese language but I also took one where china was centered. In my contact data I also put chinese social networks. Remember Twitter, Facebook and many other sites are blocked in China. So if you really want to communicate with chinese people why not getting a QQ number or weibo account? Design of the slides You saw this on conferences many times. Chinese people just put a hack a lot of stuff on a slide. I strongly believe this is due to the fact that reading and recognizing Chinese characters is much faster than western characters. So if your presentation is in Chinese Language don’t be afraid to stuff your slides with information. I have seen many talks by Chinese people that where literally reading word by word what was written on the slides. Where in western countries this is considered bad practice in China this is all right. Language Speaking of Language: Of course if you know some chinese it shows respect if you at least try to include some chinese. I split my presentation in 2 parts. One which was in chinese and one that was in english. Have an interesting take away message So in my case I included the fact that we have PhD positions open and scholarships. That our institut is really international and the working language is english. Of course I also included some slides about my past and current research like Graphity and Typology During the presentation: In China it is not rude at all if ones cellphone rings and one has more important stuff to do. You as presenter should switch of your phone but you should not be disturbed or annoyed if people in the audience receive phone calls and go out of the room doing that business. This is very common in China. I am sure there are many more rules on how to hold a presentation in China and maybe I even made some mistakes in my presentation but at least I have the feeling that the reaction was quite positiv. So if you have questions, suggestions and feedback feel free to drop a line I am more than happy to discuss cultural topics!

November 11, 2012

by René Pickhardt

· 17,362 Views

Applying a Namespace During JAXB Unmarshal

For some an XML schema is a strict set of rules for how the XML document must be structured. But for others it is a general guideline to indicate what the XML should look like. This means that sometimes people want to accept input that doesn't conform to the XML schema for some reason. In this example I will demonstrate how this can be done by leveraging a SAX XMLFilter. Java Model Below is the Java model that will be used for this example. Customer package blog.namespace.sax; import javax.xml.bind.annotation.XmlRootElement; @XmlRootElement public class Customer { private String name; public String getName() { return name; } public void setName(String name) { this.name = name; } } package-info We will use the package level @XmlSchema annotation to specify the namespace qualification for our model. @XmlSchema( namespace="http://www.example.com/customer", elementFormDefault=XmlNsForm.QUALIFIED) package blog.namespace.sax; import javax.xml.bind.annotation.*; XML Input (input.xml) Even though our metadata specified that all the elements should be qualified with a namespace (http://www.example.com/customer) our input document is not namespace qualified. An XMLFilter will be used to add the namespace during the unmarshal operation. Jane Doe XMLFilter (NamespaceFilter) The easiest way to create an XMLFilter is to extend XMLFilterImpl. For our use case we will override the startElement and endElement methods. In each of these methods we will call the corresponding super method passing in the default namespace as the URI parameter. package blog.namespace.sax; import org.xml.sax.*; import org.xml.sax.helpers.XMLFilterImpl; public class NamespaceFilter extends XMLFilterImpl { private static final String NAMESPACE = "http://www.example.com/customer"; @Override public void endElement(String uri, String localName, String qName) throws SAXException { super.endElement(NAMESPACE, localName, qName); } @Override public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { super.startElement(NAMESPACE, localName, qName, atts); } } Demo In the demo code below we will do a SAX parse of the XML document. The XMLReader will be wrapped in our XMLFilter. We will leverage JAXB's UnmarshallerHandler as the ContentHandler. Once the parse has been done we can ask the UnmarshallerHandler for the resulting Customer object. package blog.namespace.sax; import javax.xml.bind.*; import javax.xml.parsers.*; import org.xml.sax.*; public class Demo { public static void main(String[] args) throws Exception { // Create the JAXBContext JAXBContext jc = JAXBContext.newInstance(Customer.class); // Create the XMLFilter XMLFilter filter = new NamespaceFilter(); // Set the parent XMLReader on the XMLFilter SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); XMLReader xr = sp.getXMLReader(); filter.setParent(xr); // Set UnmarshallerHandler as ContentHandler on XMLFilter Unmarshaller unmarshaller = jc.createUnmarshaller(); UnmarshallerHandler unmarshallerHandler = unmarshaller .getUnmarshallerHandler(); filter.setContentHandler(unmarshallerHandler); // Parse the XML InputSource xml = new InputSource("src/blog/namespace/sax/input.xml"); filter.parse(xml); Customer customer = (Customer) unmarshallerHandler.getResult(); // Marshal the Customer object back to XML Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); marshaller.marshal(customer, System.out); } } Output Below is the output from running the demo code. Note how the output contains the namespace qualification based on the metadata. Jane Doe Further Reading If you enjoyed this post then you may also be interested in: JAXB & Namespaces Preventing Entity Expansion Attacks in JAXB

November 10, 2012

by Blaise Doughan

· 64,695 Views · 1 Like

Bluetooth Data Transfer with Android

to develop an android application making use of data transfers via bluetooth (bt), one would logically start at the android developer's bluetooth page , where all the required steps are described in details: device discovery, pairing, client/server sockets, rfcomm channels, etc. but before jumping into sockets and threads programming just to perform a basic bt operation, let's consider a simpler alternative, based on one of android's most important features: the ability for a given application to send the user to another one, which, in this case, would be the device's default bt application. doing so will have the android os itself do all the low-level work for us. first things first, a bit of defensive programming: import android.bluetooth.bluetoothadapter; //... // inside method // check if bluetooth is supported bluetoothadapter btadapter = bluetoothadapter.getdefaultadapter(); if (btadapter == null) { // device does not support bluetooth // inform user that we're done. } the above is the first check we need to perform. done that, let's see how he can start bt from within our own application. in a previous post on sms programming , we talked about implicit intents , which basically allow us to specify the action we would like the system to handle for us. android will then display all the activities that are able to complete the action we want, in a chooser list. here's an example: // bring up android chooser intent intent = new intent(); intent.setaction(intent.action_send); intent.settype("text/plain"); intent.putextra(intent.extra_stream, uri.fromfile(file_to_transfer) ); //... startactivity(intent); in the code snippet above, we are letting the android system know that we intend to send a text file. the system then displays all installed applications capable of handling that action: we can see that the bt application is among those handlers. we could of course let the user pick that application from the list and be done with it. but if we feel we should be a tad more user-friendly, we need to go further and start the application ourselves, instead of simply displaying it in a midst of other unnecessary options...but how? one way to do that would be to use android's packagemanager this way: //list of apps that can handle our intent packagemanager pm = getpackagemanager(); list appslist = pm.queryintentactivities( intent, 0); if(appslist.size() > 0 { // proceed } the above packagemanager method returns the list we saw earlier of all activities susceptible to handle our file transfer intent, in the form of a list of resolveinfo objects that encapsulate information we need: //select bluetooth string packagename = null; string classname = null; boolean found = false; for(resolveinfo info: appslist){ packagename = info.activityinfo.packagename; if( packagename.equals("com.android.bluetooth")){ classname = info.activityinfo.name; found = true; break;// found } } if(! found){ toast.maketext(this, r.string.blu_notfound_inlist, toast.length_short).show(); // exit } we now have the necessary information to start bt ourselves: //set our intent to launch bluetooth intent.setclassname(packagename, classname); startactivity(intent); what we did was to use the package and its corresponding class retrieved earlier. since we are a curious bunch, we may wonder what the class name for the "com.android.bluetooth" package is. this is what we would get if we were to print it out: com.broadcom.bt.app.opp.opplauncheractivity . opp stands for object push profile, and is the android component allowing to wirelessly share files. all fine and dandy, but in order for all the above code to be of any use, bt doesn't simply need to be supported by the device, but also enabled by the user. so one of the first things we want to do, is to ask the user to enable bt for the time we deem necessary (here, 300 seconds): import android.bluetooth.bluetoothadapter; //... // duration that the device is discoverable private static final int discover_duration = 300; // our request code (must be greater than zero) private static final int request_blu = 1; //... public void enableblu(){ // enable device discovery - this will automatically enable bluetooth intent discoveryintent = new intent(bluetoothadapter.action_request_discoverable); discoveryintent.putextra(bluetoothadapter.extra_discoverable_duration, discover_duration ); startactivityforresult(discoveryintent, request_blu); } once we specify that we want to get a result back from our activity with startactivityforresult , the following enabling dialog is presented to the user: now whenever the activity finishes, it will return the request code we have sent (request_blu), along with the data and a result code to our main activity through the onactivityresult callback method. we know which request code we have to check against, but how about the result code ? simple: if the user responds "no" to the above permission request (or if an error occurs), the result code will be result_canceled. on the other hand, if the user accepts, the bt documentation specifies that the result code will be equal to the duration that the device is discoverable (i.e. discover_duration, i.e. 300). so the way to process the bt dialog above would be: // when startactivityforresult completes... protected void onactivityresult (int requestcode, int resultcode, intent data) { if (resultcode == discover_duration && requestcode == request_blu) { // processing code goes here } else{ // cancelled or error toast.maketext(this, r.string.blu_cancelled, toast.length_short).show(); } } putting all our processing flow in order, here's what we are basically doing: are we done yet? almost. last but not least, we need to ask for the bt permissions in the android manifest: we're ready to deploy now. to test all this, we need to use at least two android devices, one being the file sender (where our application is installed) and the other any receiving device supporting bt. here are the screen shots. for the sender: and the corresponding receiving device : note that, once the receiver accepts the connection. the received file ( kmemo.dat ) is saved inside the bt folder on the sd card. all the lower-level data transfer has been handled by the android os. source: tony's blog .

November 6, 2012

by Tony Siciliani

· 78,105 Views

Exploring the HTML5 Web Audio: Visualizing Sound

If you've read some of my other articles on this blog you probably know I'm a fan of HTML5. With HTML5 we get all this interesting functionality, directly in the browser, in a way that, eventually, is standard across browsers. One of the new HTML5 APIs that is slowly moving through the standardization process is the Web Audio API. With this API, currently only supported in Chrome, we get access to all kinds of interesting audio components you can use to create, modify and visualize sounds (such as the following spectrogram). So why do I start with visualizations? It looks nice, that's one reason, but not the important one. This API provides a number of more complex components, whose behavior is much easier to explain when you can see what happens. With a filter you can instantly see whether some frequencies are filtered, instead of trying to listen to the resulting audio for thse changes. There are many interesting examples that use this API. The problem is, though, that getting started with this API and with digital signal processing (DSP) usually isn't explained. In this article I'll walk you through a couple of steps that shows how to do the following: Create a signal volume meter Visualize the frequencies using a spectrum analyzer And show a time based spectrogram We start with the basic setup that we can use as the basis for the components we'll create. Setting up the basic If we want to experiment with sound, we need some sound source. We could use the microphone (as we'll do later in this series), but to keep it simple, for now we'll just use an mp3 as our input. To get this working using web audio we have to take the following steps: Load the data Read it in a buffer node and play the sound Load the data With the web audio we can use different types of audio sources. We've got a MediaElementAudioSourceNode that can be used to use the audio provided by a media element. There's also a MediaStreamAudioSourceNode. With this audio source node we can use the microphone as input (see my previous article on sound recognition). Finally there is the AudioBufferSourceNode. With this node we can load the data from an existing audio file (e.g mp3) and use that as input. For this example we'll use this last approach. // create the audio context (chrome only for now) var context = new webkitAudioContext(); var audioBuffer; var sourceNode; // load the sound setupAudioNodes(); loadSound("wagner-short.ogg"); function setupAudioNodes() { // create a buffer source node sourceNode = context.createBufferSource(); // and connect to destination sourceNode.connect(context.destination); } // load the specified sound function loadSound(url) { var request = new XMLHttpRequest(); request.open('GET', url, true); request.responseType = 'arraybuffer'; // When loaded decode the data request.onload = function() { // decode the data context.decodeAudioData(request.response, function(buffer) { // when the audio is decoded play the sound playSound(buffer); }, onError); } request.send(); } function playSound(buffer) { sourceNode.buffer = buffer; sourceNode.noteOn(0); } // log if an error occurs function onError(e) { console.log(e); } In this example you can see a couple of functions. The setupAudioNodes function creates a BufferSource audio node and connects it to the destination. The loadSound function shows how you can load an audio file. The buffer which is passed into the playSound function contains decoded audio that can be used by the web audio API. In this example I use an .ogg file, for a complete overview of the formats supported look at: https://sites.google.com/a/chromium.org/dev/audio-video Play the sound To play this audio file, all we have to do is turn the source node on, this is done in the playSound function: function playSound(buffer) { sourceNode.buffer = buffer; sourceNode.noteOn(0); } You can test this out at the following page: Example 1: Loading and playing a sound with Web Audio API. When you open that page, you'll hear some music. Nothing to spectacular for now, but nevertheless an easy way to load audio that'll use for the rest of this article. The first item on our list was the volume meter. Create a volume meter One of the basic scenario's, and often one of the first steps someone new to this API tries to create, is a simple signal volume meter (or an UV meter). I expected this to be a standard component in this API, where I could just read off the signal strength as a property. But, no such node exists. But not to worry, with the components that are available, it's pretty easy (not straightforward, but easy nevertheless) to get an indication of the signal strength of your audio file. Int this section we'll create the following simple volume meter: As you can see this is a simple volume meter where we measure the signal strength for the left and the right audio channel. This is drawn on the canvas, but you could have also used divs or svg to visualize this. Lets start with a single volume meter, instead of one for each channel. For this we need to do the following: Create an analyzer node: With this node we get realtime information about the data that is processed. This data we use to determine the signal strength Create a javascript node: We use this node as a timer to update the volume meters with new information Connect everything together Analyser node With the analyser node we can perform real-time frequency and time domain analysis. From the specification: a node which is able to provide real-time frequency and time-domain analysis information. The audio stream will be passed un-processed from input to output. I won't go into the mathematical details behind this node, since there are many articles out there that explain how this works (a good one is the chapter on fourier transformation from here). What you should now about this node is that it splits up the signal in frequency buckets and we get the amplitude (the signal strenght) for each set of frequencies (the bucket). The best way to understand this, is to skip a bit ahead in this article and look at the frequency distribution we'll create later on. This image plots the result from the analyser node. The frequencies increase from left to right, and the height of the bar shows the strength of that specific frequency bucket. More on this later on in the article. For now we don't want to see the strength of the separate frequency buckets, but the strength of the total signal. For this we'll just add all the strenghts from each bucket and divide it by the number of buckets. First we need to create an analyzer node // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; This creates an analyzer node whose result will be used to create the volume meter. We use a smoothingTimeConstant to make the meter less jittery. With this variable we use input from a longer time period to calculate the amplitudes, this results in a more smooth meter. The fftSize determine how many buckets we get containing frequency information. If we have a fftSize of 1024 we get 512 buckets (more info on this in the book on DPS and fourier transformations). When this node receives a stream of data, it analyzes this stream and provides us with information about the frequencies in that signal and their strengths. We now need a timer to update the meter at regular intervals. We could use the standard javascript setInterval function, but since we're looking at the Web Audio API lets use one of its nodes. The JavaScriptNode. The javascript node With the javascriptnode we can process the raw audio data directly from javascript. We can use this to write our own analyzers or complex components. We're not going to do that, though. When creating the javascript node, you can specify the interval at which it is called. We'll use that feature to update the meter at regulat intervals. Creating a javascript node is very easy. // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); This will create a javascriptnode that is called whenever the 2048 frames have been sampled. Since our data is sampled at 44.1k, this function will be called approximately 21 times a second. Now what happens when this function is called: // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function() { // get the average, bincount is fftsize / 2 var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); var average = getAverageVolume(array) // clear the current state ctx.clearRect(0, 0, 60, 130); // set the fill style ctx.fillStyle=gradient; // create the meters ctx.fillRect(0,130-average,25,130); } function getAverageVolume(array) { var values = 0; var average; var length = array.length; // get all the frequency amplitudes for (var i = 0; i < length; i++) { values += array[i]; } average = values / length; return average; } In these two functions we calculate the average and draw the meter directly on the canvas (using a gradient so we have nice colors). Now all we have to do is connect the output from the audiosource to the analyser, the analyser to the javasource node (and if we want audio to hear, we also need to connect something to the destionation). Connect everything together Connecting everything together is easy: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; // create a buffer source node sourceNode = context.createBufferSource(); // connect the source to the analyser sourceNode.connect(analyser); // we use the javascript node to draw at a specific interval. analyser.connect(javascriptNode); // and connect to destination, if you want audio sourceNode.connect(context.destination); } And that's it. This will draw a single volume meter, for the complete signal. Now what do we do when we want to have a volume meter for each channel. For this we use a ChannelSplitter. Let's dive right into the code to connect everything: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; analyser2 = context.createAnalyser(); analyser2.smoothingTimeConstant = 0.0; analyser2.fftSize = 1024; // create a buffer source node sourceNode = context.createBufferSource(); splitter = context.createChannelSplitter(); // connect the source to the analyser and the splitter sourceNode.connect(splitter); // connect one of the outputs from the splitter to // the analyser splitter.connect(analyser,0,0); splitter.connect(analyser2,1,0); // we use the javascript node to draw at a // specific interval. analyser.connect(javascriptNode); // and connect to destination sourceNode.connect(context.destination); } As you can see we don't really change much. We introduce a new node, the splitter node. This node splits the sound into a left and a right channel. These channels can be processed separately. With this layout the following happens: The audiosource creates a signal based on the buffered audio. This signal is sent to the splitter, who splits the signal into a left and right stream. Each of these two streams is processed by their own realtime analyser. From the javascript node, we now get the information from both analysers and plot both meters I've shown step 1 through 3, let's quickly move on the step 4. For this we simply add the following to the onaudioprocess node: javascriptNode.onaudioprocess = function() { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); var average = getAverageVolume(array); // get the average for the second channel var array2 = new Uint8Array(analyser2.frequencyBinCount); analyser2.getByteFrequencyData(array2); var average2 = getAverageVolume(array2); // clear the current state ctx.clearRect(0, 0, 60, 130); // set the fill style ctx.fillStyle=gradient; // create the meters ctx.fillRect(0,130-average,25,130); ctx.fillRect(30,130-average2,25,130); } And now we've got two signal meters, one for each channel. Example 2: Visualize the signal strength with a volume meter. Or view the result on youtube: Now lets see how we can get the view of the frequencies I showed earlier. Create a frequency spectrum With all the work we already did in the previous section, creating a frequency spectrum overview is now very easy. We're going to aim for this: We set up the nodes just like we did in the first example: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 512; // create a buffer source node sourceNode = context.createBufferSource(); sourceNode.connect(analyser); analyser.connect(javascriptNode); // sourceNode.connect(context.destination); } So this time we don't split the channels and we set the fftSize to 512. This means we get 256 bars that represent our frequency. We now just need to alter the onaudioprocess method and the gradient we use: var gradient = ctx.createLinearGradient(0,0,0,300); gradient.addColorStop(1,'#000000'); gradient.addColorStop(0.75,'#ff0000'); gradient.addColorStop(0.25,'#ffff00'); gradient.addColorStop(0,'#ffffff'); // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function() { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); // clear the current state ctx.clearRect(0, 0, 1000, 325); // set the fill style ctx.fillStyle=gradient; drawSpectrum(array); } function drawSpectrum(array) { for ( var i = 0; i < (array.length); i++ ){ var value = array[i]; ctx.fillRect(i*5,325-value,3,325); } }; In the drawSpectrum function we iterate over the array, and draw a vertical bar based on the value. That's it. For a live example, click on the following link: Example 3: Visualize the frequency spectrum. Or view it on youtube: And then the final one. The spectrogram. Time based spectrogram When you run the previous demo you see the strength of the various frequency buckets in real time. While this is a nice visualization, it doesn't allow you to analyze information over a period of time. If you want to do that you can create a spectrogram. With a spectrogram we plot a single line for each measurement. The y-axis represents the frequency, the x-asis the time and the color of a pixel the strength of that frequency. It can be used to analyze the received audio, and also creates nice looking images. The good thing, is that to output this data we don't have to change much from what we've already got in place. The only function that'll change is the onaudioprocess node and we'll create a slightly different analyser. analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0; analyser.fftSize = 1024; The enalyser we create here has an fftSize of 1024, this means we get 512 frequency buckets with strengths. So we can draw a spectrogram that has a height of 512 pixels. Also note that the smoothingTimeConstant is set to 0. This means we don't use any of the previous results in the analysis. We want to show the real information, not provide a smooth volume meter or frequency spectrum analysis. The easiest way to draw a spectrogram is by just start drawing the line at the left, and for each new set of frequencies increase the x-coordinate by one. The problem is that this will quickly fill up our canvas, and we'll only be able to see the first half a minute of the audio. To fix this, we need some creative canvas copying. The complete code for drawing the spectrogram is shown here: // create a temp canvas we use for copying and scrolling var tempCanvas = document.createElement("canvas"), tempCtx = tempCanvas.getContext("2d"); tempCanvas.width=800; tempCanvas.height=512; // used for color distribution var hot = new chroma.ColorScale({ colors:['#000000', '#ff0000', '#ffff00', '#ffffff'], positions:[0, .25, .75, 1], mode:'rgb', limits:[0, 300] }); ... // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function () { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); // draw the spectrogram if (sourceNode.playbackState == sourceNode.PLAYING_STATE) { drawSpectrogram(array); } } function drawSpectrogram(array) { // copy the current canvas onto the temp canvas var canvas = document.getElementById("canvas"); tempCtx.drawImage(canvas, 0, 0, 800, 512); // iterate over the elements from the array for (var i = 0; i < array.length; i++) { // draw each pixel with the specific color var value = array[i]; ctx.fillStyle = hot.getColor(value).hex(); // draw the line at the right side of the canvas ctx.fillRect(800 - 1, 512 - i, 1, 1); } // set translate on the canvas ctx.translate(-1, 0); // draw the copied image ctx.drawImage(tempCanvas, 0, 0, 800, 512, 0, 0, 800, 512); // reset the transformation matrix ctx.setTransform(1, 0, 0, 1, 0, 0); } To draw the spectrogram we do the following: We copy what is currently drawn to a hidden canvas Next we draw a line of the current values at the far right of the canvas We set the translate on the canvas to -1 We copy the copied information back to the original canvas (that is now drawn 1 pixel to the left) And reset the transformation matrix See a running example here: Example 4: Create a spectrogram Or view it here: Last thing I'd like to mention regarding the code is the chroma.js library I used for the colors. If you ever need to draw something color or gradient related (e.g maps, strengths, levels) you can easily create color scales with this library. Two final pointers, I know I'll get questions about: Volume could be represented as a magnitude, just didn't want to complicate matters for this. The spectogram doesn't use logarithmic scales. Once again, didn't want to complicate things

October 23, 2012

by Jos Dirksen

· 70,091 Views · 1 Like

Understanding JVM Internals, from Basic Structure to Java SE 7 Features

Learn about the structure of JVM, how it works, executes Java bytecode, the order of execution, examples of common mistakes and their solutions, new Java SE 7 features.

October 19, 2012

by Esen Sagynov

· 180,109 Views · 20 Likes

Debugging Hibernate Envers - Historical Data

recently in our project we reported a strange bug. in one report where we display historical data provided by hibernate envers , users encountered duplicated records in the dropdown used for filtering. we tried to find the source of this bug, but after spending a few hours looking at the code responsible for this functionality we had to give up and ask for a dump from production database to check what actually is stored in one table. and when we got it and started investigating, it turned out that there is a bug in hibernate envers 3.6 that is a cause of our problems. but luckily after some investigation and invaluable help from adam warski (author of envers) we were able to fix this issue. bug itself let’s consider following scenario: a transaction is started. we insert some audited entities during it and then it is rolled back. the same entitymanager is reused to start another transaction second transaction is committed but when we check audit tables for entities that were created and then rolled back in step one, we will notice that they are still there and were not rolled back as we expected. we were able to reproduce it in a failing test in our project, so the next step was to prepare failing test in envers so we could verify if our fix is working. failing test the simplest test cases already present in envers are located in simple.java class and they look quite straightforward: public class simple extends abstractentitytest { private integer id1; public void configure(ejb3configuration cfg) { cfg.addannotatedclass(inttestentity.class); } @test public void initdata() { entitymanager em = getentitymanager(); em.gettransaction().begin(); inttestentity ite = new inttestentity(10); em.persist(ite); id1 = ite.getid(); em.gettransaction().commit(); em.gettransaction().begin(); ite = em.find(inttestentity.class, id1); ite.setnumber(20); em.gettransaction().commit(); } @test(dependsonmethods = "initdata") public void testrevisionscounts() { assert arrays.aslist(1, 2).equals(getauditreader().getrevisions(inttestentity.class, id1)); } @test(dependsonmethods = "initdata") public void testhistoryofid1() { inttestentity ver1 = new inttestentity(10, id1); inttestentity ver2 = new inttestentity(20, id1); assert getauditreader().find(inttestentity.class, id1, 1).equals(ver1); assert getauditreader().find(inttestentity.class, id1, 2).equals(ver2); } } so preparing my failing test executing scenario described above wasn’t a rocket science: /** * @author tomasz dziurko (tdziurko at gmail dot com) */ public class transactionrollbackbehaviour extends abstractentitytest { public void configure(ejb3configuration cfg) { cfg.addannotatedclass(inttestentity.class); } @test public void testauditrecordsrollback() { // given entitymanager em = getentitymanager(); em.gettransaction().begin(); inttestentity itetorollback = new inttestentity(30); em.persist(itetorollback); integer rollbackediteid = itetorollback.getid(); em.gettransaction().rollback(); // when em.gettransaction().begin(); inttestentity ite2 = new inttestentity(50); em.persist(ite2); integer ite2id = ite2.getid(); em.gettransaction().commit(); // then list revisionsforsavedclass = getauditreader().getrevisions(inttestentity.class, ite2id); assertequals(revisionsforsavedclass.size(), 1, "there should be one revision for inserted entity"); list revisionsforrolledbackclass = getauditreader().getrevisions(inttestentity.class, rollbackediteid); assertequals(revisionsforrolledbackclass.size(), 0, "there should be no revisions for insert that was rolled back"); } } now i could verify that tests are failing on the forked 3.6 branch and check if the fix that we had is making this test green. the fix after writing a failing test in our project, i placed several breakpoints in envers code to understand better what is wrong there. but imagine being thrown in a project developed for a few years by many programmers smarter than you. i felt overwhelmed and had no idea where the fix should be applied and what exactly is not working as expected. luckily in my company we have adam warski on board. he is the initial author of envers and actually he pointed us the solution. the fix itself contains only one check that registers audit processes that will be executed on transaction completion only when such processes iare still in the map for the given transaction. it sounds complicated, but if you look at the class auditprocessmanager in this commit it should be more clear what is happening there. official path besides locating a problem and fixing it, there are some more official steps that must be performed to have fix included in envers. step 1. create jira issue with bug - https://hibernate.onjira.com/browse/hhh-7682 step 2: create local branch envers-bugfix-hhh-7682 of forked hibernate 3.6 step 3: commit and push failing test and fix to your local and remote repository on github step 4: create pull request - https://github.com/hibernate/hibernate-orm/pull/393 step 5: wait for merge and that’s all. now fix is merged into main repository and we have one bug less in the world of open source

October 17, 2012

by Tomasz Dziurko

· 7,813 Views

Bug Fixing: To Estimate, or Not to Estimate: That is The Question

According to Steve McConnell in Code Complete (data from 1975-1992) most bugs don’t take long to fix. About 85% of errors can be fixed in less than a few hours. Some more can be fixed in a few hours to a few days. But the rest take longer, sometimes much longer – as I talked about in an earlier post. Given all of these factors and uncertainty, how to you estimate a bug fix? Or should you bother? Block out some time for bug fixing Some teams don’t estimate bug fixes upfront. Instead they allocate a block of time, some kind of buffer for bug fixing as a regular part of the team’s work, especially if they are working in time boxes. Developers come back with an estimate only if it looks like the fix will require a substantial change – after they’ve dug into the code and found out that the fix isn’t going to be easy, that it may require a redesign or require changes to complex or critical code that needs careful review and testing. Use a rule of thumb placeholder for each bug fix Another approach is to use a rough rule of thumb, a standard place holder for every bug fix. Estimate ½ day of development work for each bug, for example. According to this post on Stack Overflow the ½ day suggestion comes from Jeff Sutherland, one of the inventors of Scrum. This place holder should work for most bugs. If it takes a developer more than ½ day to come up with a fix, then they probably need help and people need to know anyways. Pick a place holder and use it for a while. If it seems too small or too big, change it. Iterate. You will always have bugs to fix. You might get better at fixing them over time, or they might get harder to find and fix once you’ve got past the obvious ones. Or you could use the data earlier from Capers Jones on how long it takes to fix a bug by the type of bug. A day or half day works well on average, especially since most bugs are coding bugs (on average 3 hours) or data bugs (6.5 hours). Even design bugs on average only take little more than a day to resolve. Collect some data – and use it Steve McConnell, In Software Estimation: Demystifying the Black Art says that it’s always better to use data than to guess. He suggests collecting time data for as little as a few weeks or maybe a couple of months on how long on average it takes to fix a bug, and use this as a guide for estimating bug fixes going forward. If you have enough defect data, you can be smarter about how to use it. If you are tracking bugs in a bug database like Jira, and if programmers are tracking how much time they spend on fixing each bug for billing or time accounting purposes (which you can also do in Jira), then you can mine the bug database for similar bugs and see how long they took to fix – and maybe get some ideas on how to fix the bug that you are working on by reviewing what other people did on other bugs before you. You can group different bugs into buckets (by size – small, medium, large, x-large – or type) and then come up with an average estimate, and maybe even a best case, worst case and most likely for each type. Use Benchmarks For a maintenance team (a sustaining engineering or break/fix team responsible for software repairs only), you could use industry productivity benchmarks to project how many bugs your team can handle. Capers Jones in Estimating Software Costs says that the average programmer (in the US, in 2009), can fix 8-10 bugs per month (of course, if you’re an above-average programmer working in Canada in 2012, you’ll have to set these numbers much higher). Inexperienced programmers can be expected to fix 6 a month, while experienced developers using good tools can fix up to 20 per month. If you’re focusing on fixing security vulnerabilities reported by a pen tester or a scan, check out the remediation statistical data that Denim Group has started to collect, to get an idea on how long it might take to fix a SQL injection bug or an XSS vulnerability. So, do you estimate bug fixes, or not? Because you can’t estimate how long it will take to fix a bug until you’ve figured out what’s wrong, and most of the work in fixing a bug involves figuring out what’s wrong, it doesn’t make sense to try to do an in-depth estimate of how long it will take to fix each bug as they come up. Using simple historical data, a benchmark, or even a rough guess place holder as a rule-of-thumb all seem to work just as well. Whatever you do, do it in the simplest and most efficient way possible, don’t waste time trying to get it perfect – and realize that you won’t always be able to depend on it. Remember the 10x rule – some outlier bugs can take up to 10x as long to find and fix than an average bug. And some bugs can’t be found or fixed at all – or at least not with the information that you have today. When you’re wrong (and sometimes you’re going to be wrong), you can be really wrong, and even careful estimating isn’t going to help. So stick with a simple, efficient approach, and be prepared when you hit a hard problem, because it's gonna happen.

October 12, 2012

by Jim Bird

· 23,089 Views

Spring 3.1: Caching and EhCache

If you look around the web for examples of using Spring 3.1’s built in caching then you’ll usually bump into Spring’s SimpleCacheManager, which the Guys at Spring say is “Useful for testing or simple caching declarations”. I actually prefer to think of SimpleCacheManager as lightweight rather than simple; useful in those situations where you want a small in memory cache on a per JVM basis. If the Guys at Spring were running a supermarket then SimpleCacheManagerwould be in their own brand ‘basics’ product range. If, on the other hand, you need a heavy duty cache, one that’s scalable, persistent and distributed, then Spring also comes with a built in ehCache wrapper. The good news is that swapping between Spring's caching implementations is easy. In theory it’s all a matter of configuration and, to prove the theory correct, I took the sample code from my Caching and @Cacheable blog and ran it using an EhCache implementation. The configuration steps are similar to those described in my last blog Caching and Config in that you still need to specify: ...in your Spring config file to switch caching on. You also need to define a bean with an id of cacheManager, only this time you reference Spring’s EhCacheCacheManager class instead of SimpleCacheManager. The example above demonstrates an EhCacheCacheManager configuration. Notice that it references a second bean with an id of 'ehcache'. This is configured as follows: "ehcache" has two properties: configLocation and shared. 'configLocation' is an optional attribute that’s used to specify the location of an ehcache configuration file. In my test code I used the following example file: ...which creates two caches: a default cache and one named “employee”. If this file is missing then the EhCacheManagerFactoryBean simply picks up a default ehcache config file: ehcache-failsafe.xml, which is located in ehcache’s ehcache-core jar file. The other EhCacheManagerFactoryBean attribute is 'shared'. This is supposed to be optional as the documentation states that it defines "whether the EHCache CacheManager should be shared (as a singleton at the VM level) or independent (typically local within the application). Default is 'false', creating an independent instance.” However, if this is set to false then you’ll get the following exception: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.cache.interceptor.CacheInterceptor#0': Cannot resolve reference to bean 'cacheManager' while setting bean property 'cacheManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cacheManager' defined in class path resource [ehcache-example.xml]: Cannot resolve reference to bean 'ehcache' while setting bean property 'cacheManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ehcache' defined in class path resource [ehcache-example.xml]: Invocation of init method failed; nested exception is net.sf.ehcache.CacheException: Another unnamed CacheManager already exists in the same VM. Please provide unique names for each CacheManager in the config or do one of following: 1. Use one of the CacheManager.create() static factory methods to reuse same CacheManager with same name or create one if necessary 2. Shutdown the earlier cacheManager before creating new one with same name. The source of the existing CacheManager is: InputStreamConfigurationSource [stream=java.io.BufferedInputStream@424c414] at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:328) at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:106) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1360) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1118) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:517) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456) ... stack trace shortened for clarity at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cacheManager' defined in class path resource [ehcache-example.xml]: Cannot resolve reference to bean 'ehcache' while setting bean property 'cacheManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ehcache' defined in class path resource [ehcache-example.xml]: Invocation of init method failed; nested exception is net.sf.ehcache.CacheException: Another unnamed CacheManager already exists in the same VM. Please provide unique names for each CacheManager in the config or do one of following: 1. Use one of the CacheManager.create() static factory methods to reuse same CacheManager with same name or create one if necessary 2. Shutdown the earlier cacheManager before creating new one with same name. The source of the existing CacheManager is: InputStreamConfigurationSource [stream=java.io.BufferedInputStream@424c414] at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:328) at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:106) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1360) ... stack trace shortened for clarity at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193) at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:322) ... 38 more Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ehcache' defined in class path resource [ehcache-example.xml]: Invocation of init method failed; nested exception is net.sf.ehcache.CacheException: Another unnamed CacheManager already exists in the same VM. Please provide unique names for each CacheManager in the config or do one of following: 1. Use one of the CacheManager.create() static factory methods to reuse same CacheManager with same name or create one if necessary 2. Shutdown the earlier cacheManager before creating new one with same name. The source of the existing CacheManager is: InputStreamConfigurationSource [stream=java.io.BufferedInputStream@424c414] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1455) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456) at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:294) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:225) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:291) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193) at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:322) ... 48 more Caused by: net.sf.ehcache.CacheException: Another unnamed CacheManager already exists in the same VM. Please provide unique names for each CacheManager in the config or do one of following: 1. Use one of the CacheManager.create() static factory methods to reuse same CacheManager with same name or create one if necessary 2. Shutdown the earlier cacheManager before creating new one with same name. The source of the existing CacheManager is: InputStreamConfigurationSource [stream=java.io.BufferedInputStream@424c414] at net.sf.ehcache.CacheManager.assertNoCacheManagerExistsWithSameName(CacheManager.java:521) at net.sf.ehcache.CacheManager.init(CacheManager.java:371) at net.sf.ehcache.CacheManager.(CacheManager.java:339) at org.springframework.cache.ehcache.EhCacheManagerFactoryBean.afterPropertiesSet(EhCacheManagerFactoryBean.java:104) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1514) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1452) ... 55 more ...when you try to run a bunch of unit tests. I think that this comes down to a simple bug Spring’s the ehcache manager factory as it’s trying to create multiple cache instances using new() rather than using, as the exception states, “one of the CacheManager.create() static factory methods" which allows it to reuse same CacheManager with same name. Hence, my first JUnit test works okay, but all others fail. The offending line of code is: this.cacheManager = (this.shared ? CacheManager.create() : new CacheManager()); My full XML config file is listed below for completeness: In using ehcache, the only other configuration details to consider are the Maven dependencies. These are pretty straight forward as the Guys at Ehcache have combined all the various ehcache jars into one Maven POM module. This POM module can be added to your project's POM file using the XML below: net.sf.ehcache ehcache 2.6.0 pom test Finally, the ehcache Jar files are available from both the Maven Central and Sourceforge repositories: sourceforge http://oss.sonatype.org/content/groups/sourceforge/ true true

October 5, 2012

by Roger Hughes

· 106,179 Views

Record Audio Using webrtc in Chrome and Speech Recognition With Websockets

there are many different web api standards that are turning the web browser into a complete application platform. with websockets we get nice asynchronous communication, various standards allow us access to sensors in laptops and mobile devices and we can even determine how full the battery is. one of the standards i'm really interested in is webrtc. with webrtc we can get real-time audio and video communication between browsers without needing plugins or additional tools. a couple of months ago i wrote about how you can use webrtc to access the webcam and use it for face recognition . at that time, none of the browser allowed you to access the microphone. a couple of months later though, and both the developer version of firefox and developer version of chrome, allow you to access the microphone! so let's see what we can do with this. most of the examples i've seen so far focus on processing the input directly, within the browser, using the web audio api . you get synthesizers, audio visualizations, spectrometers etc. what was missing, however, was a means of recording the audio data and storing it for further processing at the server side. in this article i'll show you just that. i'm going to show you how you can create the following (you might need to enlarge it to read the response from the server): in this screencast you can see the following: a simple html page that access your microphone the speech is recorded and using websockets is sent to a backend the backend combines the audio data and sends it to google's speech to text api the result from this api call is returned to the browser and all this is done without any plugins in the browser! so what's involved to accomplish all this. allowing access to your microphone the first thing you need to do is make sure you've got an up to date version of chrome. i use the dev build, and am currently on this version: since this is still an experimental feature we need to enable this using the chrome flags. make sure the "web audio input" flag is enabled. with this configuration out of the way we can start to access our microphone. access the audio stream from the microphone this is actually very easy: function callback(stream) { var context = new webkitaudiocontext(); var mediastreamsource = context.createmediastreamsource(stream); ... } $(document).ready(function() { navigator.webkitgetusermedia({audio:true}, callback); ... } as you can see i use the webkit prefix functions directly, you could, of course, also use a shim so it is browser independent. what happens in the code above is rather straightforward. we ask, using getusermedia, for access to the microphone. if this is successful our callback gets called with the audio stream as its parameter. in this callback we use the web audio specification to create a mediastreamsource from our microphone. with this mediastreamsource we can do all the nice web audio tricks you can see here . but we don't want that, we want to record the stream and send it to a backend server for further processing. in future versions this will probably be possible directly from the webrtc api, at this time, however, this isn't possible yet. luckily, though, we can use a feature from the web audio api to get access to the raw data. with the javascriptaudionode we can create a custom node, which we can use to access the raw data (which is pcm encoded). before i started my own work on this i searched around a bit and came across the recoder.js project from here: https://github.com/mattdiamond/recorderjs . matt created a recorder that can record the output from web audio nodes, and that's exactly what i needed. all i needed to do now was connect the stream we just created to the recorder library: function callback(stream) { var context = new webkitaudiocontext(); var mediastreamsource = context.createmediastreamsource(stream); rec = new recorder(mediastreamsource); } with this code, we create a recorder from our stream. this recorder provides the following functions: record: start recording from the input stop: stop recording clear: clear the current recording exportwav: export the data as a wav file connect the recorder to the buttons i've created a simple webpage with an output for the text and two buttons to control the recording: the 'record' button starts the recording, and once you hit the 'export' button the recording stops, and is sent to the backend for processing. record button: $('#record').click(function() { rec.record(); ws.send("start"); $("#message").text("click export to stop recording and analyze the input"); // export a wav every second, so we can send it using websockets intervalkey = setinterval(function() { rec.exportwav(function(blob) { rec.clear(); ws.send(blob); }); }, 1000); }); this function (using jquery to connect it to the button) when clicked starts the recording. it also uses a websocket (ws), see further down on how to setup the websocket, to indicate to the backend server to expect a new recording (more on this later). finally when the button is clicked an interval is created that passes the data to the backend, encoded as wav file, every second. we do this to avoid sending too large chunks of data to the backend and improve performance. export button: $('#export').click(function() { // first send the stop command rec.stop(); ws.send("stop"); clearinterval(intervalkey); ws.send("analyze"); $("#message").text(""); }); the export button, bad naming i think when i'm writing this, stops the recording, the interval and informs the backend server that it can send the received data to the google api for further processing. connecting the frontend to the backend to connect the webapplication to the backend server we use websockets. in the previous code fragments you've already seen how they are used. we create them with the following: var ws = new websocket("ws://127.0.0.1:9999"); ws.onopen = function () { console.log("openened connection to websocket"); }; ws.onmessage = function(e) { var jsonresponse = jquery.parsejson(e.data ); console.log(jsonresponse); if (jsonresponse.hypotheses.length > 0) { var bestmatch = jsonresponse.hypotheses[0].utterance; $("#outputtext").text(bestmatch); } } we create a connection, and when we receive a message from the backend we just assume it contains the response to our speech analysis. and that's it for the complete front end of the application. we use getusermedia to access the microphone, use the web audio api to get access to the raw data and communicate with websockets with the backend server. the backend server our backend server needs to do a couple of things. it first needs to combine the incoming chunks to a single audio file, next it needs to convert this to a format google apis expect, which is flac. finally we make a call to the google api and return the response. i've used jetty as the websocket server for this example. if you want to know the details about setting this up, look at the facedetection example. in this article i'll only show the code to process the incoming messages. first step, combine the incoming data the data we receive is encoded as wav (thanks to the recorder.js library we don't have to do this ourselves). in our backend we thus receive sound fragments with a length of one second. we can't just concatenate these together, since wav files have a header that tells how long the fragment is (amongst other things), so we have to combine them, and rewrite the header. lets first look at the code (ugly code, but works good enough for now :) public void onmessage(byte[] data, int offset, int length) { if (currentcommand.equals("start")) { try { // the temporary file that contains our captured audio stream file f = new file("out.wav"); // if the file already exists we append it. if (f.exists()) { log.info("adding received block to existing file."); // two clips are used to concat the data audioinputstream clip1 = audiosystem.getaudioinputstream(f); audioinputstream clip2 = audiosystem.getaudioinputstream(new bytearrayinputstream(data)); // use a sequenceinput to cat them together audioinputstream appendedfiles = new audioinputstream( new sequenceinputstream(clip1, clip2), clip1.getformat(), clip1.getframelength() + clip2.getframelength()); // write out the output to a temporary file audiosystem.write(appendedfiles, audiofileformat.type.wave, new file("out2.wav")); // rename the files and delete the old one file f1 = new file("out.wav"); file f2 = new file("out2.wav"); f1.delete(); f2.renameto(new file("out.wav")); } else { log.info("starting new recording."); fileoutputstream fout = new fileoutputstream("out.wav",true); fout.write(data); fout.close(); } } catch (exception e) { ...} } } this method gets called for each chunk of audio we receive from the browser. what we do here is the following: first, we check whether we have a temp audio file, if not we create it if the file exists we use java's audiosystem to create an audio sequence this sequence is then written to another file the original is deleted and the new one is renamed. we repeat this for each chunk so at this point we have a wav file that keeps on growing for each added chunk. now before we convert this, lets look at the code we use to control the backend. public void onmessage(string data) { if (data.startswith("start")) { // before we start we cleanup anything left over cleanup(); currentcommand = "start"; } else if (data.startswith("stop")) { currentcommand = "stop"; } else if (data.startswith("clear")) { // just remove the current recording cleanup(); } else if (data.startswith("analyze")) { // convert to flac ... // send the request to the google speech to text service ... } } the previous method responded to binary websockets messages. the one shown above responds to string messages. we use this to control, from the browser, what the backend should do. let's look at the analyze command, since that is the interesting one. when this command is issued from the frontend the backend needs to convert the wav file to flac and send it to the google service. convert to flac for the conversion to flac we need an external library since java standard has no support for this. i used the javaflacencoder from here for this. // get an encoder flac_fileencoder flacencoder = new flac_fileencoder(); // point to the input file file inputfile = new file("out.wav"); file outputfile = new file("out2.flac"); // encode the file log.info("start encoding wav file to flac."); flacencoder.encode(inputfile, outputfile); log.info("finished encoding wav file to flac."); easy as that. now we got a flac file that we can send to google for analysis. send to google for analysis a couple of weeks ago i ran across an article that explained how someone analyzed chrome and found out about an undocumented google api you can use for speech to text. if you post a flac file to this url: https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&l... you receive a response like this: { "status": 0, "id": "ae466ffa24a1213f5611f32a17d5a42b-1", "hypotheses": [ { "utterance": "the quick brown fox", "confidence": 0.857393 }] } to do this from java code, using httpclient, you do the following: // send the request to the google speech to text service log.info("sending file to google for speech2text"); httpclient client = new defaulthttpclient(); httppost p = new httppost(url); p.addheader("content-type", "audio/x-flac; rate=44100"); p.setentity(new fileentity(outputfile, "audio/x-flac; rate=44100")); httpresponse response = client.execute(p); f (response.getstatusline().getstatuscode() == 200) { log.info("received valid response, sending back to browser."); string result = new string(ioutils.tobytearray(response.getentity().getcontent())); this.connection.sendmessage(result); } and that are all the steps that are needed.

October 5, 2012

by Jos Dirksen

· 20,567 Views · 1 Like

SQL Query Optimization and Normalization

Explore SQL query optimization and normalization.

October 4, 2012

by Michael Georgiou

· 37,802 Views · 2 Likes

Parsing a Connection String With 'Sprache' C# Parser

Sprache is a very cool lightweight parser library for C#. Today I was experimenting with parsing EasyNetQ connection strings, so I thought I’d have a go at getting Sprache to do it. An EasyNetQ connection string is a list of key-value pairs like this: key1=value1;key2=value2;key3=value3 The motivation for looking at something more sophisticated than simply chopping strings based on delimiters, is that I’m thinking of having more complex values that would themselves need parsing. But that’s for the future, today I’m just going to parse a simple connection string where the values can be strings or numbers (ushort to be exact). So, I want to parse a connection string that looks like this: virtualHost=Copa;username=Copa;host=192.168.1.1;password=abc_xyz;port=12345;requestedHeartbeat=3 … into a strongly typed structure like this: public class ConnectionConfiguration : IConnectionConfiguration { public string Host { get; set; } public ushort Port { get; set; } public string VirtualHost { get; set; } public string UserName { get; set; } public string Password { get; set; } public ushort RequestedHeartbeat { get; set; } } I want it to be as easy as possible to add new connection string items. First let’s define a name for a function that updates a ConnectionConfiguration. A uncommonly used version of the ‘using’ statement allows us to give a short name to a complex type: using UpdateConfiguration = Func; Now lets define a little function that creates a Sprache parser for a key value pair. We supply the key and a parser for the value and get back a parser that can update the ConnectionConfiguration. public static Parser BuildKeyValueParser( string keyName, Parser valueParser, Expression> getter) { return from key in Parse.String(keyName).Token() from separator in Parse.Char('=') from value in valueParser select (Func)(c => { CreateSetter(getter)(c, value); return c; }); } The CreateSetter is a little function that turns a property expression (like x => x.Name) into an Action. Next let’s define parsers for string and number values: public static Parser Text = Parse.CharExcept(';').Many().Text(); public static Parser Number = Parse.Number.Select(ushort.Parse); Now we can chain a series of BuildKeyValueParser invocations and Or them together so that we can parse any of our expected key-values: public static Parser Part = new List> { BuildKeyValueParser("host", Text, c => c.Host), BuildKeyValueParser("port", Number, c => c.Port), BuildKeyValueParser("virtualHost", Text, c => c.VirtualHost), BuildKeyValueParser("requestedHeartbeat", Number, c => c.RequestedHeartbeat), BuildKeyValueParser("username", Text, c => c.UserName), BuildKeyValueParser("password", Text, c => c.Password), }.Aggregate((a, b) => a.Or(b)); Each invocation of BuildKeyValueParser defines an expected key-value pair of our connection string. We just give the key name, the parser that understands the value, and the property on ConnectionConfiguration that we want to update. In effect we’ve defined a little DSL for connection strings. If I want to add a new connection string value, I simply add a new property to ConnectionConfiguration and a single line to the above code. Now lets define a parser for the entire string, by saying that we’ll parse any number of key-value parts: public static Parser> ConnectionStringBuilder = from first in Part from rest in Parse.Char(';').Then(_ => Part).Many() select Cons(first, rest); All we have to do now is parse the connection string and apply the chain of update functions to a ConnectionConfiguration instance: public IConnectionConfiguration Parse(string connectionString) { var updater = ConnectionStringGrammar.ConnectionStringBuilder.Parse(connectionString); return updater.Aggregate(new ConnectionConfiguration(), (current, updateFunction) => updateFunction(current)); } We get lots of nice things out of the box with Sprache, one of the best is the excellent error messages: Parsing failure: unexpected 'x'; expected host or port or virtualHost or requestedHeartbeat or username or password (Line 1, Column 1). Sprache is really nice for this kind of task. I’d recommend checking it out.

October 3, 2012

by Mike Hadlow

· 7,554 Views

Customizing Spring Data JPA Repository

Spring Data is a very convenient library. However, as the project as quite new, it is not well featured. By default, Spring Data JPA will provide implementation of the DAO based on SimpleJpaRepository. In recent project, I have developed a customize repository base class so that I could add more features on it. You could add vendor specific features to this repository base class as you like. Configuration You have to add the following configuration to you spring beans configuration file. You have to specified a new repository factory class. We will develop the class later. extends SimpleJpaRepository implements GenericRepository , Serializable{ private static final long serialVersionUID = 1L; static Logger logger = Logger.getLogger(GenericRepositoryImpl.class); private final JpaEntityInformation entityInformation; private final EntityManager em; private final DefaultPersistenceProvider provider; private Class springDataRepositoryInterface; public Class getSpringDataRepositoryInterface() { return springDataRepositoryInterface; } public void setSpringDataRepositoryInterface( Class springDataRepositoryInterface) { this.springDataRepositoryInterface = springDataRepositoryInterface; } /** * Creates a new {@link SimpleJpaRepository} to manage objects of the given * {@link JpaEntityInformation}. * * @param entityInformation * @param entityManager */ public GenericRepositoryImpl (JpaEntityInformation entityInformation, EntityManager entityManager , Class springDataRepositoryInterface) { super(entityInformation, entityManager); this.entityInformation = entityInformation; this.em = entityManager; this.provider = DefaultPersistenceProvider.fromEntityManager(entityManager); this.springDataRepositoryInterface = springDataRepositoryInterface; } /** * Creates a new {@link SimpleJpaRepository} to manage objects of the given * domain type. * * @param domainClass * @param em */ public GenericRepositoryImpl(Class domainClass, EntityManager em) { this(JpaEntityInformationSupport.getMetadata(domainClass, em), em, null); } public S save(S entity) { if (this.entityInformation.isNew(entity)) { this.em.persist(entity); flush(); return entity; } entity = this.em.merge(entity); flush(); return entity; } public T saveWithoutFlush(T entity) { return super.save(entity); } public List saveWithoutFlush(Iterable entities) { List result = new ArrayList(); if (entities == null) { return result; } for (T entity : entities) { result.add(saveWithoutFlush(entity)); } return result; } } As a simple example here, I just override the default save method of the SimpleJPARepository. The default behaviour of the save method will not flush after persist. I modified to make it flush after persist. On the other hand, I add another method called saveWithoutFlush() to allow developer to call save the entity without flush. Define Custom repository factory bean The last step is to create a factory bean class and factory class to produce repository based on your customized base repository class. public class DefaultRepositoryFactoryBean , S, ID extends Serializable> extends JpaRepositoryFactoryBean { /** * Returns a {@link RepositoryFactorySupport}. * * @param entityManager * @return */ protected RepositoryFactorySupport createRepositoryFactory( EntityManager entityManager) { return new DefaultRepositoryFactory(entityManager); } } /** * * The purpose of this class is to override the default behaviour of the spring JpaRepositoryFactory class. * It will produce a GenericRepositoryImpl object instead of SimpleJpaRepository. * */ public class DefaultRepositoryFactory extends JpaRepositoryFactory{ private final EntityManager entityManager; private final QueryExtractor extractor; public DefaultRepositoryFactory(EntityManager entityManager) { super(entityManager); Assert.notNull(entityManager); this.entityManager = entityManager; this.extractor = DefaultPersistenceProvider.fromEntityManager(entityManager); } @SuppressWarnings({ "unchecked", "rawtypes" }) protected JpaRepository getTargetRepository( RepositoryMetadata metadata, EntityManager entityManager) { Class repositoryInterface = metadata.getRepositoryInterface(); JpaEntityInformation entityInformation = getEntityInformation(metadata.getDomainType()); if (isQueryDslExecutor(repositoryInterface)) { return new QueryDslJpaRepository(entityInformation, entityManager); } else { return new GenericRepositoryImpl(entityInformation, entityManager, repositoryInterface); //custom implementation } } @Override protected Class getRepositoryBaseClass(RepositoryMetadata metadata) { if (isQueryDslExecutor(metadata.getRepositoryInterface())) { return QueryDslJpaRepository.class; } else { return GenericRepositoryImpl.class; } } /** * Returns whether the given repository interface requires a QueryDsl * specific implementation to be chosen. * * @param repositoryInterface * @return */ private boolean isQueryDslExecutor(Class repositoryInterface) { return QUERY_DSL_PRESENT && QueryDslPredicateExecutor.class .isAssignableFrom(repositoryInterface); } } Conclusion You could now add more features to base repository class. In your program, you could now create your own repository interface extending GenericRepository instead of JpaRepository. public interface MyRepository extends GenericRepository { void someCustomMethod(ID id); } In next post, I will show you how to add hibernate filter features to this GenericRepository.

September 27, 2012

by Boris Lam

· 98,135 Views · 4 Likes

Choosing Static vs. Dynamic Languages for Your Startup

Everyone is thinking why in the world would anyone pick static, when you can be dynamic? Usually the thought process is, "what language am I most proficient in, that can do the job." Totally not a bad way to go about it. Now does this choice affect anything else? Testing? Speed of development? Robustness? Dynamic vs. Static Dynamic languages are languages that don’t necessarily need variables to be declared before they are used. Examples of dynamic languages are Python, Ruby, and PHP. So in dynamic languages the following is possible: num = 10 We have successfully assigned a value to variable without declaring it before hand. Simple enough, try doing this in Java (you can’t). This can *increase* development speed, without having to write boilerplate code. This can somewhat be a double edge sword, since dynamic languages types are checked during runtime, there is no way to tell if there is a bug in code until it is run. I know you can test, but you can’t test for everything. You can’t test for everything. Here is an example albeit trivial. def get_first_problem(problems): for problem in problems: problam = problem + 1 return problam Now if you are raging to some serious dubstep, its easy enough to miss that small typo, you go screw it and do it live, and deploy to production. Python will simply create the new variable and not a single thing will be said. Only you can stop bugs in production! Static languages are languages that variables need to be declared before use and type checking is done at compile time. Examples of static languages include Java, C, and C++. So in static languages the following is enforced static int awesomeNumber; awesomeNumber = 10; Many argue this increases robustness as well as decrease chances of Runtime Errors. Since the compiler will catch those horrible horrible mistakes you made throughout your code. Your methods contracts are tighter, downside to this is crap ton of boilerplate code. Weak and Strong Typing can be often be confused with dynamic and static languages. Weak typed languages can lead to philosophical questions like what does the number 2 added to the word ‘two’ give you? Things like this are possible with a weak typed language. a = 2 b = "2" concatenate(a, b) // Returns "22" add(a, b) // Returns 4 Traditionally languages may place restriction on what transaction may occur for example in a strong typed language adding a string and integer will result in a type error as shown below. >>> a = 10 >>> b = 'ten' >>> a + b Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'int' and 'str' >>> Conclusion Regardless of where you land on this discussion, claiming one is better than the other would lead to flame war, but there are places where each is strong. Dynamic languages are good for fast quick development cycles and prototyping, while static languages are better suited to longer development cycles where trivial bugs could be extremely costly (telecommunication systems, air traffic control). For example if some giant company called Moo Corp. spent millions of dollars on QA and Testing and a bug somehow gets into the field, to fix it would mean another round of testing. When sitting in that chair the choice is clear static languages FTW, its a hard job but someone has to milk the cows. Test, test, and test. Just a little food for thought, for when you are starting your next project. You never know what limitations you maybe placing on yourself and your team. What do you do consider when selecting a programming language for a project?

September 25, 2012

by Mahdi Yusuf

· 24,951 Views

Nested Data Structures, and non-1NF design in PostgreSQL

This has been adapted from an ongoing series currently running on my blog. It has been adapted to be more self-contained, and rely less on other blog entries. For more see http://ledgersmbdev.blogspot.com PostgreSQL provides a very advanced set of tools for doing data modelling in ways which drift back and forth across a relational and non-relational divide. While it is generally a good idea to make the database relational first, and add objects later, the principles of object-relational database design allow you to do a lot more with PostgreSQL than you can on many other database platforms. This article will discuss the use of non-first-normal-form designs, in particular the storage of arrays of tuples in columns to simulate a nested table. The possible uses and problems of such a design will be discussed in detail. One of the promises of object-relational modelling is the ability to address information modelling on complex and nested data structures. Nested data structures bring considerable richness to the database, which is lost in a pure, flat, relational model. Nested data structures can be used to model tuple constraints in ways that are impossible to do when looking at flat data structures, at least as long as those constraints are limited to the information in a single tuple. At the same time there are cases where they simplify things and cases where they complicate things. This is true both in the case of using these for storage and for interfacing with stored procedures. PostgreSQL allows for nested tuples to be stored in a database, and for arrays of tuples. Other ORDBMS's allow something similar (Informix, DB2, and Oracle all support nested tables). Nested tables in PostgreSQL provide a number of gotchas, and additionally exposing the data in them to relational queries takes some extra work. In this post we will look at modelling general ledger transactions using a nested table approach, and both the benefits and limitations of this approach. In general this trades one set of problems for another and it is important to recognize the problems going in. The storage example came out of a brainstorming session I had with Marc Balmer of Micro Systems, though it is worth noting that this is not the solution they use in their products, nor is it the approach currently used by LedgerSMB. Basic Table Structure: The basic data schema will end up looking like this: CREATE TABLE journal_type ( id serial not null unique, label text primary key ); CREATE TABLE account ( id serial not null unique, control_code text primary key, -- account number description text ); CREATE TYPE journal_line_type AS ( account_id int, amount numeric ); CREATE TABLE journal_entry ( id serial not null unique, journal_type int references journal_type(id), source_document_id text,-- for example invoice number date_posted date not null, description text, line_items journal_line_type[], PRIMARY KEY (journal_type, source_document_id) ); This schema has a number of obvious gotchas and cannot, by itself, guarantee the sorts of things we want to do. However, using object-relational modelling we can fix these in ways that cannot do in a purely relational schema. The main problems are: First, since this is a double entry model, we need a constraint that says that the sum of the amounts of the lines must always equal zero. However, if we just add a sum() aggregate, we will end up with it summing every record in the db every time we do an insert, which is not what we want. We also want to make sure that no account_id's are null and no amounts are null. Additionally it is not possible in the schema above to easily expose the journal line information to purely relational tools. However we can use a VIEW to do this, though this produces yet more problems. Finally referential integrity enforcement between the account lines and accounts cannot be done declaratively. We will have to create TRIGGERs to enforce this manually. These problems are traded off against the fact that the relational model does not allow for the first problem to be solved at all so we trade off the fact that we have some solutions which are a bit of a pain for the fact that we have some solutions at all. Nested Table Constraints If we simply had a tuple as a column, we could look inside the tuple with check constraints. Something like check((column).subcolumn is not null). However in this case we cannot do that because we need to aggregate on a set of tuples attached to the row. To do this instead we create a set of table methods for managing the constraints: CREATE OR REPLACE FUNCTION is_balanced(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ SELECT sum(amount) = 0 FROM unnest($1.line_items); $$; CREATE OR REPLACE FUNCTION has_no_null_account_ids(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ SELECT bool_and(account_id is not null) FROM unnest($1.line_items); $$; CREATE OR REPLACE FUNCTION has_no_null_amounts(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ select bool_and(amount is not null) from unnest($1.line_items); $$; We can then create our constraints. Note that because we have to create the methods first, we have to add our constraints after the functions are defined, and these are added after the table is constructed. I have gone ahead and given these friendly names so that errors are easier for people (and machines) to process and handle. ALTER TABLE journal_entry ADD CONSTRAINT is_balanced CHECK ((journal_entry).is_balanced); ALTER TABLE journal_entry ADD CONSTRAINT has_no_null_account_ids CHECK ((journal_entry).has_no_null_account_ids); ALTER TABLE journal_entry ADD CONSTRAINT has_no_null_amounts CHECK ((journal_entry).has_no_null_amounts); Now we have integrity constraints reaching into our nested data. So let's test this out. insert into journal_type (label) values ('General'); We will re-use the account data from the previous post: or_examples=# select * from account; id | control_code | description ----+--------------+------------- 1 | 1500 | Inventory 2 | 4500 | Sales 3 | 5500 | Purchase (3 rows) Let's try inserting a few meaningless transactions, some of which violate our constraints: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type]); ERROR: new row for relation "journal_entry" violates check constraint "is_balanced" So far so good. insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(null, -100)::journal_line_type]); ERROR: new row for relation "journal_entry" violates check constraint "has_no_null_account_ids" Still good. insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(2, -100)::journal_line_type, row(3, NULL)::journal_line_type]) ERROR: new row for relation "journal_entry" violates check constraint "has_no_null_amounts" Great. All constraints working properly. Let's try inserting a valid row: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(2, -100)::journal_line_type]); And it works! or_examples=# select * from journal_entry; id | journal_type | source_document_id | date_posted | description | li ne_items ----+--------------+--------------------+-------------+----------------+------------------------ 5 | 1 | ref-10001 | 2012-08-23 | This is a test | {"(1,100)","(2,-100)"} (1 row) Break-Out Views A second major problem that we will be facing with this schema is that if someone wants to create a report using a reporting tool that only really supports relational data very well, then the financial data will be opaque and not available. This scenario is one of the reasons why I think it is important generally to push the relational model to its breaking point before looking at object-relational functions. Consequently I think when doing nested tables it is important to ensure that the data in them is available through a relational interface, in this case, a view. In this case, we may want to model debits and credits in a way which is re-usable, so we will start by creating two type methods: CREATE OR REPLACE FUNCTION debits(journal_line_type) RETURNS NUMERIC LANGUAGE SQL AS $$ SELECT CASE WHEN $1.amount < 0 THEN $1.amount * -1 ELSE NULL END $$; CREATE OR REPLACE FUNCTION credits(journal_line_type) RETURNS NUMERIC LANGUAGE SQL AS $$ SELECT CASE WHEN $1.amount > 0 THEN $1.amount ELSE NULL END $$; Now we can use these as virtual columns anywhere a journal_line_type is used. The view definition itself is rather convoluted and this may impact performance. I am waiting for the LATERAL construct to become available which will make this easier. CREATE VIEW journal_line_items AS SELECT id AS journal_entry_id, (li).*, (li).debits, (li).credits FROM (SELECT je.*, unnest(line_items) li FROM journal_entry je) j; Remember li.debits and li.credits gets turned by the parser into debits(li) and credits(li), allowing for class.method notation here. Testing this out: SELECT * FROM journal_line_items; gives us journal_entry_id | account_id | amount | debits | credits ------------------+------------+--------+--------+--------- 5 | 1 | 100 | | 100 5 | 2 | -100 | 100 | 6 | 1 | 200 | | 200 6 | 3 | -200 | 200 | As you can see, this works. Now people with purely relational tools can access the information in the nested table. In general it is almost always worth creating break-out views of this sort where nested data is stored. However it is important to note that with larger data sets this is insufficient because indexing considerations makes it hard to look up specific information on a row level. This may or may not be the end of the world depending on data set size. Referential Integrity Controls The final problem is that relational integrity is not a well defined concept for nested data. For this reason, if we value relational integrity and foreign keys are involved, we must find ways of enforcing these. The simplest solution is a trigger which runs on insert, update, or delete, and manages another relation which can be used as a proxy for relational integrity checks. For example, we could: CREATE TABLE je_account ( je_id int references journal_entry (id), account_id int references account(id), primary key (je_id, account_id) ); This will be a very narrow table and so should be quick to search. It may also be useful in determining which accounts to look at for transactions if we need to do that. This table could then be used to optimize queries. To maintain the table we need to recognize that never ever will a journal entry's line items be updated or deleted. This is due to the need to maintain clear audit controls and trails. We may add other flags to the table to indicate transactions but we can handle insert, update, and delete conditions with a trigger, namely: CREATE FUNCTION je_ri_management() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ DECLARE accounts int[]; BEGIN IF TG_OP ILIKE 'INSERT' THEN INSERT INTO je_account (je_id, account_id) SELECT NEW.id, account_id FROM unnest(NEW.line_items) GROUP BY account_id; RETURN NEW; ELSIF TG_OP ILIKE 'UPDATE' THEN IF NEW.line_items <> OLD.line_items THEN RAISE EXCEPTION 'Cannot journal entry line items!'; ELSE RETURN NEW; END IF; ELSIF TG_OP ILIKE 'DELETE' THEN RAISE EXCEPTION 'Cannot delete journal entries!'; ELSE RAISE EXCEPTION 'Invalid TG_OP in trigger'; END IF; END; $$; Then we add the trigger with: CREATE TRIGGER je_breakout_for_ri AFTER INSERT OR UPDATE OR DELETE ON journal_entry FOR EACH ROW EXECUTE PROCEDURE je_ri_management(); The final invalid TG_OP could be omitted but this is not a bad check to have. Let's try this out: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10003', now()::date, 'This is a test', ARRAY[row(1, 200)::journal_line_type, row(3, -200)::journal_line_type]); or_examples=# select * from je_account; je_id | account_id -------+------------ 10 | 3 10 | 1 (2 rows) In this way referential integrity can be enforced. Solution 2.0: Refactoring the above to eliminate the view. The above solution will work great for small businesses but for larger businesses, querying this data will become slow for certain kinds of reports. Storage here is tied to a specific criteria, and indexing is somewhat problematic. There are ways we can address this, but they are not always optimal. At the same time our work is simplified because the actual accounting details are append-only. One solution to this is to refactor the above solution. Instead of: Main table Relational view Materialized view for referential integrity checking we can have: Main table, with tweaked storage for line items Materialized view for RI checking and relational access Unfortunately this sort of refactoring after the fact isn't simple. Typically you want to convert the journal_line_type type to a journal_line_type table, and inherit this in your materialized view table. You cannot simply drop and recreate since the column you are storing the data in is dependent on the structure. The solution is to rename the type, create a new one in its place. This must be done manually and there is no current capability to copy a composite type's structure into a table. You will then need to create a cast and a cast function. Then, when you can afford the downtime, you will want to convert the table to the new type. It is quite possible that the downtime will be delayed and you will have an extended time period where you are half-way through migrating the structure of your database. You can, however, decide to create a cast between the table and the type, perhaps an implicit one (though this is not inherited) and use this to centralize your logic. Unfortunately this leads to duplication-related complexity and in an ideal world would be avoided. However, assuming that the downtime ends up being tolerable, the resulting structures will end up such that they can be more readily optimized for a variety of workloads. In this regard you would have a main table, most likely with line_items moved to extended storage, whose function is to model journal entries as journal entries and apply relevant constraints, and a second table which models journal entry lines as independent lines. This also simplifies some of the constraint issues on the first table, and makes the modelling easier because we only have to look into the nested storage where we are looking at subset constraints. This section then provides a warning regarding the use of advanced ORDBMS functionality, namely that it is easy to get tunnel vision and create problems for the future. The complexity cost here is so high, that the primary model should generally remain relational, with things like nested storage primarily used to create constraints that cannot be effectively modelled otherwise. However, this becomes a great deal more complicated where values may be update or deleted. Here, however, we have a relatively simple case regarding data writes combined with complex constraints that cannot be effectively expressed in normalized, relational SQL. Therefore the standard maintenance concerns that counsel against duplicating information may give way to the fact that such duplication allows for richer constraints. Now, if we had been aware of the problems going in we would have chosen this structure all along. Our design would have been: CREATE TYPE journal_line AS ( entry_id bigserial primary key, --only possible key je_id int not null, account_id int, amount numeric ); After creating the journal entry table we'd: ALTER TABLE journal_line ADD FOREIGN KEY (je_id) REFERENCES journal_entry(id); If we have to handle purging old data we can make that key ON DELETE CASCADE. And the lines would have been of this type instead. We can then get rid of all constraints and their supporting functions other than the is_balanced one. Our debit and credit functions then also reference this type. Our trigger then looks like: CREATE FUNCTION je_ri_management() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ DECLARE accounts int[]; BEGIN IF TG_OP ILIKE 'INSERT' THEN INSERT INTO journal_line (je_id, account_id, amount) SELECT NEW.id, account_id, amount FROM unnest(NEW.line_items); RETURN NEW; ELSIF TG_OP ILIKE 'UPDATE' THEN RAISE EXCEPTION 'Cannot journal entry line items!'; ELSIF TG_OP ILIKE 'DELETE' THEN RAISE EXCEPTION 'Cannot delete journal entries!'; ELSE RAISE EXCEPTION 'Invalid TG_OP in trigger'; END IF; END; $$; Approval workflows can be handled with a separate status table with its own constraints. Deletions of old information (up to a specific snapshot) can be handled by a stored procedure which is unit tested and disables this trigger before purging data. This system has the advantage of having several small components which are all complete and easily understood, and it is made possible because the data is exclusively append-only. As you can see from the above examples, nested data structures greatly complicate the data model and create problems with relational math that must be addressed if data logic will remain meaningful. This is a complex field, and it adds a lot of complexity to storage. In general, these are best avoided in actual data storage except where this approach makes formerly insurmountable problems manageable. Moreover, they add complexity to optimization once data gets large. Thus while non-atomic fields in this regard make sense as an initial point of entry in some narrow cases, as a point of actual query, they are very rarely the right approaches. It is possible that, at some point, nested storage will be able to have its own indexes, foreign keys, etc. but I cannot imagine this being a high priority and so it isn't clear that this will ever happen. In general, it usually makes the most sense to simply store the data in a pseudo-normalized way, with any non-1NF designs being the initial point of entry in a linear write model. Nested Data Structures as Interfaces Nested data structures as interfaces to stored procedures are a little more manageable. The main difficulties are in application-side data construction and output parsing. Some languages handle this more easily than others. Upper-level construction and handling of these structures is relatively straight-forward on the database-side and poses none of these problems. However, they do cause additional complexity and this must be managed carefully. The biggest issue when interfacing with an application is that ROW types are not usually automatically constructed by application-level frameworks even if they have arrays. This leaves the programmer to choose between unstructured text arrays which are fundamentally non-discoverable (and thus brittle), and arrays of tuples which are discoverable but require a lot of additional application code to handle. At the same time as a chicken and egg problem, frameworks will not add handling for this sort of problem unless people are already trying to do it. So my general recommendation is to use nested data types everywhere in the database sparingly, only where the benefits clearly outweigh the complexity costs. Complexity costs are certainly lower in the interface level and there are many more cases where it these techniques are net wins there, but that does not mean that they should be routinely used even there.

September 25, 2012

by Chris Travers

· 20,873 Views

IndexedDB: MultiEntry Explained

For a long time I was not sure what the purpose of the multiEntry attribute was. Since non of the browsers supported it yet, but since sometime Firefox and even the latest builds of Chrome support it, it all came clear to me. The multiEntry attribute enables you to filter on the individual values of an array. For this reason, the multiEntry attribute is only useful when the index is put on a property that contains an array as value. When the multiEntry attribute is on true, there will be a record added for every value in the array. The key of this record will be the value of the array and the value will be the object keeping the array. Because the values in the array are used as key, means that the values inside the array need to be valid keys. This means they can only be of the following types: Array DOMString float Date So far for the theory, an example will make everything clear. In the example below I will use an object Blog. A blog contains out of the following properties: var blog = { Id: 1 , Title: "Blog post" , content: "content" , tags: ["html5", "indexeddb", "linq2indexeddb"]}; In the indexeddb we have an object store called blog which has an index on the tags property. The index has the multiEntry attribute turned on. If we would insert the object above, we would see the following records in the index: key value “"html5” { Id:1, Title: “Blogpost”, content:”content”, tags: [“html5”, “indexeddb”, “linq2indexeddb”]} “indexeddb” { Id:1, Title: “Blogpost”, content:”content”, tags: [“html5”, “indexeddb”, “linq2indexeddb”]} “linq2indexeddb” { Id:1, Title: “Blogpost”, content:”content”, tags: [“html5”, “indexeddb”, “linq2indexeddb”]} So for every value in the array of the tags attribute, a record is added in the index. This means when you start filtering, it is possible that the same object can be added to the result multiple times. For example if you would filter on all tags greater then “i”, the result would be 2 times the blog object I use in this example.

September 24, 2012

by Kristof Degrave

· 6,741 Views