Data Engineering Resources

The Latest Data Engineering Topics

Introduction A composite key consists of one or more primary key fields. Each field must be of data type supported by underlying data-store. In JPA (Java Persistence API), there are two ways of specifying composite keys: 1. Composite Primary Key: @Entity @IdClass(TimelineId.class) public class Timeline { @Id int userId; @Id long tweetId; //Other non-primary key fields } Class TimelineId { int userId; long tweetId; } 2. Embedded Primary Key: @Entity public class Timeline { @EmbeddedId TimelineId id; //Other non-primary key fields } @Embeddable Class TimelineId { int userId; long tweetId; } Above Timeline entity is inspired from famous twissandra example. Starting 1.1 release, Cassandra supports composite keys. Cassandra Composite Keys in Action Visit this page in order to understand Cassandra Schema in general. In this section I will give you a feel of how composite keys are stored in Cassandra. Let's start Cassandra 1.1.x server and run following commands from Cassandra/bin directory: CQL: ./cqlsh -3 localhost 9160 CREATE KEYSPACE twissandra with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1; use twissandra; CREATE TABLE timeline( user_id varchar, tweet_id varchar, tweet_device varchar, author varchar, body varchar, PRIMARY KEY(user_id,tweet_id,tweet_device)); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't1', 'web', 'Amresh', 'Here is my first tweet'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't2', 'sms', 'Saurabh', 'Howz life Xamry'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't1', 'iPad', 'Kuldeep', 'You der?'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't2', 'mobile', 'Vivek', 'Yep, I suppose'); cqlsh:twissandra> select * from timeline; user_id | tweet_id | author | body ---------+----------+---------+------------------------ xamry | t1 | Amresh | Here is my first tweet xamry | t2 | Saurabh | Howz life Xamry mevivs | t1 | Kuldeep | You der? mevivs | t2 | Vivek | Yep, I suppose cqlsh:twissandra> SELECT * FROM timeline WHERE user_id='xamry'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+---------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet xamry | t2 | sms | Saurabh | Howz life Xamry cqlsh:twissandra> select * from timeline where tweet_id = 't1'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+---------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet mevivs | t1 | iPad | Kuldeep | You der? cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id='t1'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+--------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet cqlsh:twissandra> select * from timeline where user_id = 'xamry' and author='Amresh'; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_device='web'; Bad Request: PRIMARY KEY part tweet_device cannot be restricted (preceding part tweet_id is either not restricted or by a non-EQ relation) cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id = 't1' and tweet_device='web'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+--------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet Cassandra-cli: impadmin@impetus-ubuntu:/usr/local/apache-cassandra-1.1.2/bin$ ./cassandra-cli -h localhost -p 9160 Connected to: "Test Cluster" on localhost/9160 Welcome to Cassandra CLI version 1.1.2 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use twissandra; Authenticated to keyspace: twissandra [default@twissandra] list timeline; Using default limit of 100 Using default column limit of 100 ------------------- RowKey: xamry => (column=t1:web:author, value=Amresh, timestamp=1343729388951000) => (column=t1:web:body, value=Here is my first tweet, timestamp=1343729388951001) => (column=t2:sms:author, value=Saurabh, timestamp=1343729388973000) => (column=t2:sms:body, value=Howz life Xamry, timestamp=1343729388973001) ------------------- RowKey: mevivs => (column=t1:iPad:author, value=Kuldeep, timestamp=1343729388991000) => (column=t1:iPad:body, value=You der?, timestamp=1343729388991001) => (column=t2:mobile:author, value=Vivek, timestamp=1343729389941000) => (column=t2:mobile:body, value=Yep, I suppose, timestamp=1343729389941001) Observations First part of composite key (user_id) is called "Partition Key", rest (tweet_id, tweet_device) are remaining keys. Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (":" as separator) to form column names. Column values remain unchanged. Remaining keys (other than partition keys) are ordered, and it's not allowed to search on any random column, you have to start with the first one and then you can move to the second one and so on. This is evident from "Bad Request" error.

November 14, 2012

by Amresh Singh

· 20,014 Views · 1 Like

Integration Testing with MongoDB & Spring Data

Integration Testing is an often overlooked area in enterprise development. This is primarily due to the associated complexities in setting up the necessary infrastructure for an integration test. For applications backed by databases, it’s fairly complicated and time-consuming to setup databases for integration tests, and also to clean those up once test is complete (ex. data files, schemas etc.), to ensure repeatability of tests. While there have been many tools (ex. DBUnit) and mechanisms (ex. rollback after test) to assist in this, the inherent complexity and issues have been there always. But if you are working with MongoDB, there’s a cool and easy way to do your unit tests, with almost the simplicity of writing a unit test with mocks. With ‘EmbedMongo’, we can easily setup an embedded MongoDB instance for testing, with in-built clean up support once tests are complete. In this article, we will walkthrough an example where EmbedMongo is used with JUnit for integration testing a Repository Implementation. Here’s the technology stack that we will be using. MongoDB 2.2.0 EmbedMongo 1.26 Spring Data – Mongo 1.0.3 Spring Framework 3.1 The Maven POM for the above setup looks like this. 4.0.0 com.yohanliyanage.blog.mongoit mongo-it 1.0 org.springframework.data spring-data-mongodb 1.0.3.RELEASE compile junit junit 4.10 test org.springframework spring-context 3.1.3.RELEASE compile de.flapdoodle.embed de.flapdoodle.embed.mongo 1.26 test Or if you prefer Gradle (by the way, Gradle is an awesome build tool which you should check out if you haven’t done so already). apply plugin: 'java' apply plugin: 'eclipse' sourceCompatibility = 1.6 group = "com.yohanliyanage.blog.mongoit" version = '1.0' ext.springVersion = '3.1.3.RELEASE' ext.junitVersion = '4.10' ext.springMongoVersion = '1.0.3.RELEASE' ext.embedMongoVersion = '1.26' repositories { mavenCentral() maven { url 'http://repo.springsource.org/release' } } dependencies { compile "org.springframework:spring-context:${springVersion}" compile "org.springframework.data:spring-data-mongodb:${springMongoVersion}" testCompile "junit:junit:${junitVersion}" testCompile "de.flapdoodle.embed:de.flapdoodle.embed.mongo:${embedMongoVersion}" } To begin with, here’s the document that we will be storing in Mongo. package com.yohanliyanage.blog.mongoit.model; import org.springframework.data.mongodb.core.index.Indexed; import org.springframework.data.mongodb.core.mapping.Document; /** * A Sample Document. * * @author Yohan Liyanage * */ @Document public class Sample { @Indexed private String key; private String value; public Sample(String key, String value) { super(); this.key = key; this.value = value; } public String getKey() { return key; } public void setKey(String key) { this.key = key; } public String getValue() { return value; } public void setValue(String value) { this.value = value; } } To assist with storing and managing this document, let’s write up a simple Repository implementation. The Repository Interface is as follows. package com.yohanliyanage.blog.mongoit.repository; import java.util.List; import com.yohanliyanage.blog.mongoit.model.Sample; /** * Sample Repository API. * * @author Yohan Liyanage * */ public interface SampleRepository { /** * Persists the given Sample. * @param sample */ void save(Sample sample); /** * Returns the list of samples with given key. * @param sample * @return */ List findByKey(String key); } And the implementation… package com.yohanliyanage.blog.mongoit.repository; import java.util.List; import static org.springframework.data.mongodb.core.query.Query.query; import static org.springframework.data.mongodb.core.query.Criteria.*; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.mongodb.core.MongoOperations; import org.springframework.stereotype.Repository; import com.yohanliyanage.blog.mongoit.model.Sample; /** * Sample Repository MongoDB Implementation. * * @author Yohan Liyanage * */ @Repository public class SampleRepositoryMongoImpl implements SampleRepository { @Autowired private MongoOperations mongoOps; /** * {@inheritDoc} */ public void save(Sample sample) { mongoOps.save(sample); } /** * {@inheritDoc} */ public List findByKey(String key) { return mongoOps.find(query(where("key").is(key)), Sample.class); } /** * Sets the MongoOps implementation. * * @param mongoOps the mongoOps to set */ public void setMongoOps(MongoOperations mongoOps) { this.mongoOps = mongoOps; } } To wire this up, we need a Spring Bean Configuration. Note that we do not need this for testing. But for the sake of completion, I have included this. The XML configuration is as follows. And now we are ready to write the Integration Test for our Repository Implementation using Embed Mongo. Ideally, the integration tests should be placed in a separate source directory, just like we place our unit tests (ex. src/test/java => src/integration-test/java). However, neither Maven nor Gradle supports this out of the box (yet – v1.2. For Gradle, there’s an on going discussion for this facility). Nevertheless, both Maven and Gradle are flexible, so you can configure the POM / build.gradle to handle this. However, to keep this discussion simple and focused, I will be placing the Integration Tests in the ‘src/test/java’, but I do not recommend this for a real application. Let’s start writing up the Integration Test. First, let’s begin with a simple JUnit based Test for the methods. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.fail; import org.junit.After; import org.junit.Before; import org.junit.Test; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private SampleRepositoryMongoImpl repoImpl; @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); } @After public void tearDown() throws Exception { } @Test public void testSave() { fail("Not yet implemented"); } @Test public void testFindByKey() { fail("Not yet implemented"); } } When this JUnit Test Case initializes, we need to fire up EmbedMongo to start an embedded Mongo server. Also, when the Test Case ends, we need to cleanup the DB. The below code snippet does this. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.fail; import java.io.IOException; import org.junit.*; import org.springframework.data.mongodb.core.MongoTemplate; import com.mongodb.Mongo; import com.yohanliyanage.blog.mongoit.model.Sample; import de.flapdoodle.embed.mongo.MongodExecutable; import de.flapdoodle.embed.mongo.MongodProcess; import de.flapdoodle.embed.mongo.MongodStarter; import de.flapdoodle.embed.mongo.config.MongodConfig; import de.flapdoodle.embed.mongo.config.RuntimeConfig; import de.flapdoodle.embed.mongo.distribution.Version; import de.flapdoodle.embed.process.extract.UserTempNaming; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private static final String LOCALHOST = "127.0.0.1"; private static final String DB_NAME = "itest"; private static final int MONGO_TEST_PORT = 27028; private SampleRepositoryMongoImpl repoImpl; private static MongodProcess mongoProcess; private static Mongo mongo; private MongoTemplate template; @BeforeClass public static void initializeDB() throws IOException { RuntimeConfig config = new RuntimeConfig(); config.setExecutableNaming(new UserTempNaming()); MongodStarter starter = MongodStarter.getInstance(config); MongodExecutable mongoExecutable = starter.prepare(new MongodConfig(Version.V2_2_0, MONGO_TEST_PORT, false)); mongoProcess = mongoExecutable.start(); mongo = new Mongo(LOCALHOST, MONGO_TEST_PORT); mongo.getDB(DB_NAME); } @AfterClass public static void shutdownDB() throws InterruptedException { mongo.close(); mongoProcess.stop(); } @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); template = new MongoTemplate(mongo, DB_NAME); repoImpl.setMongoOps(template); } @After public void tearDown() throws Exception { template.dropCollection(Sample.class); } @Test public void testSave() { fail("Not yet implemented"); } @Test public void testFindByKey() { fail("Not yet implemented"); } } The initializeDB() method is annotated with @BeforeClass to start this before test case beings. This method fires up an embedded MongoDB instance which is bound to the given port, and exposes a Mongo object which is set to use the given database. Internally, EmbedMongo creates the necessary data files in temporary directories. When this method executes for the first time, EmbedMongo will download the necessary Mongo implementation (denoted by Version.V2_2_0 in above code) if it does not exist already. This is a nice facility specially when it comes to Continuous Integration servers. You don’t have to manually setup Mongo in each of the CI servers. That’s one less external dependency for the tests. In the shutdownDB() method, which is annotated with @AfterClass, we stop the EmbedMongo process. This triggers the necessary cleanups in EmbedMongo to remove the temporary data files, restoring the state to where it was before Test Case was executed. We have now updated setUp() method to build a Spring MongoTemplate object which is backed by the Mongo instance exposed by EmbedMongo, and to setup our RepoImpl with that template. The tearDown() method is updated to drop the ‘Sample’ collection to ensure that each of our test methods start with a clean state. Now it’s just a matter of writing the actual test methods. Let’s start with the save method test. @Test public void testSave() { Sample sample = new Sample("TEST", "2"); repoImpl.save(sample); int samplesInCollection = template.findAll(Sample.class).size(); assertEquals("Only 1 Sample should exist collection, but there are " + samplesInCollection, 1, samplesInCollection); } We create a Sample object, pass it to repoImpl.save(), and assert to make sure that there’s only one Sample in the Sample collection. Simple, straight-forward stuff. And here’s the test method for findByKey method. @Test public void testFindByKey() { // Setup Test Data List samples = Arrays.asList( new Sample("TEST", "1"), new Sample("TEST", "25"), new Sample("TEST2", "66"), new Sample("TEST2", "99")); for (Sample sample : samples) { template.save(sample); } // Execute Test List matches = repoImpl.findByKey("TEST"); // Note: Since our test data (populateDummies) have only 2 // records with key "TEST", this should be 2 assertEquals("Expected only two samples with key TEST, but there are " + matches.size(), 2, matches.size()); } Initially, we setup the data by adding a set of Sample objects into the data store. It’s important that we directly use template.save() here, because repoImpl.save() is a method under-test. We are not testing that here, so we use the underlying “trusted” template.save() during data setup. This is a basic concept in Unit / Integration testing. Then we execute the method under test ‘findByKey’, and assert to ensure that only two Samples matched our query. Likewise, we can continue to write more tests for each of the repository methods, including negative tests. And here’s the final Integration Test file. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.*; import java.io.IOException; import java.util.Arrays; import java.util.List; import org.junit.*; import org.springframework.data.mongodb.core.MongoTemplate; import com.mongodb.Mongo; import com.yohanliyanage.blog.mongoit.model.Sample; import de.flapdoodle.embed.mongo.MongodExecutable; import de.flapdoodle.embed.mongo.MongodProcess; import de.flapdoodle.embed.mongo.MongodStarter; import de.flapdoodle.embed.mongo.config.MongodConfig; import de.flapdoodle.embed.mongo.config.RuntimeConfig; import de.flapdoodle.embed.mongo.distribution.Version; import de.flapdoodle.embed.process.extract.UserTempNaming; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private static final String LOCALHOST = "127.0.0.1"; private static final String DB_NAME = "itest"; private static final int MONGO_TEST_PORT = 27028; private SampleRepositoryMongoImpl repoImpl; private static MongodProcess mongoProcess; private static Mongo mongo; private MongoTemplate template; @BeforeClass public static void initializeDB() throws IOException { RuntimeConfig config = new RuntimeConfig(); config.setExecutableNaming(new UserTempNaming()); MongodStarter starter = MongodStarter.getInstance(config); MongodExecutable mongoExecutable = starter.prepare(new MongodConfig(Version.V2_2_0, MONGO_TEST_PORT, false)); mongoProcess = mongoExecutable.start(); mongo = new Mongo(LOCALHOST, MONGO_TEST_PORT); mongo.getDB(DB_NAME); } @AfterClass public static void shutdownDB() throws InterruptedException { mongo.close(); mongoProcess.stop(); } @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); template = new MongoTemplate(mongo, DB_NAME); repoImpl.setMongoOps(template); } @After public void tearDown() throws Exception { template.dropCollection(Sample.class); } @Test public void testSave() { Sample sample = new Sample("TEST", "2"); repoImpl.save(sample); int samplesInCollection = template.findAll(Sample.class).size(); assertEquals("Only 1 Sample should exist in collection, but there are " + samplesInCollection, 1, samplesInCollection); } @Test public void testFindByKey() { // Setup Test Data List samples = Arrays.asList( new Sample("TEST", "1"), new Sample("TEST", "25"), new Sample("TEST2", "66"), new Sample("TEST2", "99")); for (Sample sample : samples) { template.save(sample); } // Execute Test List matches = repoImpl.findByKey("TEST"); // Note: Since our test data (populateDummies) have only 2 // records with key "TEST", this should be 2 assertEquals("Expected only two samples with key TEST, but there are " + matches.size(), 2, matches.size()); } } On a side note, one of the key concerns with Integration Tests is the execution time. We all want to keep our test execution times as low as possible, ideally a couple of seconds to make sure that we can run all the tests during CI, with minimal build and verification times. However, since Integration Tests rely on underlying infrastructure, usually Integration Tests take time to run. But with EmbedMongo, this is not the case. In my machine, above test suite runs in 1.8 seconds, and each test method takes only .166 seconds max. See the screenshot below. I have uploaded the code for above project into GitHub. You can download / clone it from here: https://github.com/yohanliyanage/blog-mongo-integration-tests. For more information regarding EmbedMongo, refer to their site at GitHub https://github.com/flapdoodle-oss/embedmongo.flapdoodle.de.

November 11, 2012

by Yohan Liyanage

· 26,105 Views

How to do a presentation in China? Some of my experiences

So the culture is different from Western culture we all know that! I am certainly not an expert on China but after living in China for almost 2 years knowing some language and working in a chinese company seeing presentations every week and also visiting over 30 western and chinese companies placed in China I think I have some insights about how you should organize your presentation in China. Since I recently went to Shanghai in order to to research exchange with Jiaotong University I was about to give a presentation to introduce my institute and me. So here you can find my rather uncommon presentation and some remarks, why some slides where designed in the way they are. http://www.rene-pickhardt.de/wp-content/uploads/2012/11/ApexLabIntroductionOfWeST.pdf Guanxi – your relations First of all I think it is really important to understand that in China everything is related to your relations (http://en.wikipedia.org/wiki/Guanxi). A chinese business card will always name a view of your best and strongest contacts. This is more important than your adress for example. If a conference starts people exchange namecards before they sit down and discuss. This principle of Guanxi is also reflected in the style presentations are made. Here are some basic rules: Show pictures of people you worked together with Show pictures of groups while you organized events Show pictures of the panels that run events Show your partners (for business not only clients but also people you are buying from or working together with in general) My way of respecting these principles: I first showed a group picture of our institute! I also showed for almost every project where I could get hold of it pictures of the people that are responsible for the project I did not only show the European research projects our university is in but listed all the different partners and showed logos of them Family The second thing is that in China the concept of family is very important. I would say as a rule of thumb if you want to make business with someone in china and you havent been introduced to their family things are not going like you might expect this. For this reason I have included some slides with a worldmap going further down to the place where I was born and where I studied and where my parents still leave! Localizing When I choosed a worldmap I did not only take one with Chinese language but I also took one where china was centered. In my contact data I also put chinese social networks. Remember Twitter, Facebook and many other sites are blocked in China. So if you really want to communicate with chinese people why not getting a QQ number or weibo account? Design of the slides You saw this on conferences many times. Chinese people just put a hack a lot of stuff on a slide. I strongly believe this is due to the fact that reading and recognizing Chinese characters is much faster than western characters. So if your presentation is in Chinese Language don’t be afraid to stuff your slides with information. I have seen many talks by Chinese people that where literally reading word by word what was written on the slides. Where in western countries this is considered bad practice in China this is all right. Language Speaking of Language: Of course if you know some chinese it shows respect if you at least try to include some chinese. I split my presentation in 2 parts. One which was in chinese and one that was in english. Have an interesting take away message So in my case I included the fact that we have PhD positions open and scholarships. That our institut is really international and the working language is english. Of course I also included some slides about my past and current research like Graphity and Typology During the presentation: In China it is not rude at all if ones cellphone rings and one has more important stuff to do. You as presenter should switch of your phone but you should not be disturbed or annoyed if people in the audience receive phone calls and go out of the room doing that business. This is very common in China. I am sure there are many more rules on how to hold a presentation in China and maybe I even made some mistakes in my presentation but at least I have the feeling that the reaction was quite positiv. So if you have questions, suggestions and feedback feel free to drop a line I am more than happy to discuss cultural topics!

November 11, 2012

by René Pickhardt

· 16,434 Views

Applying a Namespace During JAXB Unmarshal

For some an XML schema is a strict set of rules for how the XML document must be structured. But for others it is a general guideline to indicate what the XML should look like. This means that sometimes people want to accept input that doesn't conform to the XML schema for some reason. In this example I will demonstrate how this can be done by leveraging a SAX XMLFilter. Java Model Below is the Java model that will be used for this example. Customer package blog.namespace.sax; import javax.xml.bind.annotation.XmlRootElement; @XmlRootElement public class Customer { private String name; public String getName() { return name; } public void setName(String name) { this.name = name; } } package-info We will use the package level @XmlSchema annotation to specify the namespace qualification for our model. @XmlSchema( namespace="http://www.example.com/customer", elementFormDefault=XmlNsForm.QUALIFIED) package blog.namespace.sax; import javax.xml.bind.annotation.*; XML Input (input.xml) Even though our metadata specified that all the elements should be qualified with a namespace (http://www.example.com/customer) our input document is not namespace qualified. An XMLFilter will be used to add the namespace during the unmarshal operation. Jane Doe XMLFilter (NamespaceFilter) The easiest way to create an XMLFilter is to extend XMLFilterImpl. For our use case we will override the startElement and endElement methods. In each of these methods we will call the corresponding super method passing in the default namespace as the URI parameter. package blog.namespace.sax; import org.xml.sax.*; import org.xml.sax.helpers.XMLFilterImpl; public class NamespaceFilter extends XMLFilterImpl { private static final String NAMESPACE = "http://www.example.com/customer"; @Override public void endElement(String uri, String localName, String qName) throws SAXException { super.endElement(NAMESPACE, localName, qName); } @Override public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { super.startElement(NAMESPACE, localName, qName, atts); } } Demo In the demo code below we will do a SAX parse of the XML document. The XMLReader will be wrapped in our XMLFilter. We will leverage JAXB's UnmarshallerHandler as the ContentHandler. Once the parse has been done we can ask the UnmarshallerHandler for the resulting Customer object. package blog.namespace.sax; import javax.xml.bind.*; import javax.xml.parsers.*; import org.xml.sax.*; public class Demo { public static void main(String[] args) throws Exception { // Create the JAXBContext JAXBContext jc = JAXBContext.newInstance(Customer.class); // Create the XMLFilter XMLFilter filter = new NamespaceFilter(); // Set the parent XMLReader on the XMLFilter SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); XMLReader xr = sp.getXMLReader(); filter.setParent(xr); // Set UnmarshallerHandler as ContentHandler on XMLFilter Unmarshaller unmarshaller = jc.createUnmarshaller(); UnmarshallerHandler unmarshallerHandler = unmarshaller .getUnmarshallerHandler(); filter.setContentHandler(unmarshallerHandler); // Parse the XML InputSource xml = new InputSource("src/blog/namespace/sax/input.xml"); filter.parse(xml); Customer customer = (Customer) unmarshallerHandler.getResult(); // Marshal the Customer object back to XML Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); marshaller.marshal(customer, System.out); } } Output Below is the output from running the demo code. Note how the output contains the namespace qualification based on the metadata. Jane Doe Further Reading If you enjoyed this post then you may also be interested in: JAXB & Namespaces Preventing Entity Expansion Attacks in JAXB

November 10, 2012

by Blaise Doughan

· 63,455 Views · 1 Like

Bluetooth Data Transfer with Android

to develop an android application making use of data transfers via bluetooth (bt), one would logically start at the android developer's bluetooth page , where all the required steps are described in details: device discovery, pairing, client/server sockets, rfcomm channels, etc. but before jumping into sockets and threads programming just to perform a basic bt operation, let's consider a simpler alternative, based on one of android's most important features: the ability for a given application to send the user to another one, which, in this case, would be the device's default bt application. doing so will have the android os itself do all the low-level work for us. first things first, a bit of defensive programming: import android.bluetooth.bluetoothadapter; //... // inside method // check if bluetooth is supported bluetoothadapter btadapter = bluetoothadapter.getdefaultadapter(); if (btadapter == null) { // device does not support bluetooth // inform user that we're done. } the above is the first check we need to perform. done that, let's see how he can start bt from within our own application. in a previous post on sms programming , we talked about implicit intents , which basically allow us to specify the action we would like the system to handle for us. android will then display all the activities that are able to complete the action we want, in a chooser list. here's an example: // bring up android chooser intent intent = new intent(); intent.setaction(intent.action_send); intent.settype("text/plain"); intent.putextra(intent.extra_stream, uri.fromfile(file_to_transfer) ); //... startactivity(intent); in the code snippet above, we are letting the android system know that we intend to send a text file. the system then displays all installed applications capable of handling that action: we can see that the bt application is among those handlers. we could of course let the user pick that application from the list and be done with it. but if we feel we should be a tad more user-friendly, we need to go further and start the application ourselves, instead of simply displaying it in a midst of other unnecessary options...but how? one way to do that would be to use android's packagemanager this way: //list of apps that can handle our intent packagemanager pm = getpackagemanager(); list appslist = pm.queryintentactivities( intent, 0); if(appslist.size() > 0 { // proceed } the above packagemanager method returns the list we saw earlier of all activities susceptible to handle our file transfer intent, in the form of a list of resolveinfo objects that encapsulate information we need: //select bluetooth string packagename = null; string classname = null; boolean found = false; for(resolveinfo info: appslist){ packagename = info.activityinfo.packagename; if( packagename.equals("com.android.bluetooth")){ classname = info.activityinfo.name; found = true; break;// found } } if(! found){ toast.maketext(this, r.string.blu_notfound_inlist, toast.length_short).show(); // exit } we now have the necessary information to start bt ourselves: //set our intent to launch bluetooth intent.setclassname(packagename, classname); startactivity(intent); what we did was to use the package and its corresponding class retrieved earlier. since we are a curious bunch, we may wonder what the class name for the "com.android.bluetooth" package is. this is what we would get if we were to print it out: com.broadcom.bt.app.opp.opplauncheractivity . opp stands for object push profile, and is the android component allowing to wirelessly share files. all fine and dandy, but in order for all the above code to be of any use, bt doesn't simply need to be supported by the device, but also enabled by the user. so one of the first things we want to do, is to ask the user to enable bt for the time we deem necessary (here, 300 seconds): import android.bluetooth.bluetoothadapter; //... // duration that the device is discoverable private static final int discover_duration = 300; // our request code (must be greater than zero) private static final int request_blu = 1; //... public void enableblu(){ // enable device discovery - this will automatically enable bluetooth intent discoveryintent = new intent(bluetoothadapter.action_request_discoverable); discoveryintent.putextra(bluetoothadapter.extra_discoverable_duration, discover_duration ); startactivityforresult(discoveryintent, request_blu); } once we specify that we want to get a result back from our activity with startactivityforresult , the following enabling dialog is presented to the user: now whenever the activity finishes, it will return the request code we have sent (request_blu), along with the data and a result code to our main activity through the onactivityresult callback method. we know which request code we have to check against, but how about the result code ? simple: if the user responds "no" to the above permission request (or if an error occurs), the result code will be result_canceled. on the other hand, if the user accepts, the bt documentation specifies that the result code will be equal to the duration that the device is discoverable (i.e. discover_duration, i.e. 300). so the way to process the bt dialog above would be: // when startactivityforresult completes... protected void onactivityresult (int requestcode, int resultcode, intent data) { if (resultcode == discover_duration && requestcode == request_blu) { // processing code goes here } else{ // cancelled or error toast.maketext(this, r.string.blu_cancelled, toast.length_short).show(); } } putting all our processing flow in order, here's what we are basically doing: are we done yet? almost. last but not least, we need to ask for the bt permissions in the android manifest: we're ready to deploy now. to test all this, we need to use at least two android devices, one being the file sender (where our application is installed) and the other any receiving device supporting bt. here are the screen shots. for the sender: and the corresponding receiving device : note that, once the receiver accepts the connection. the received file ( kmemo.dat ) is saved inside the bt folder on the sd card. all the lower-level data transfer has been handled by the android os. source: tony's blog .

November 6, 2012

by Tony Siciliani

· 77,752 Views

20 Database Design Best Practices

Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID ...). Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names. Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you'll write “Student Course”. StudentCourse is much better). Don’t use unnecessary prefixes or suffixes for table names (i.e. use School instead of TblSchool, SchoolTable etc.). Keep passwords as encrypted for security. Decrypt them in application when required. Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing ...). Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems. Use bit fields for boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”. Provide authentication for database access. Don’t give admin role to each user. Avoid “select *” queries until it is really needed. Use "select [required_columns_list]" for better performance. Use an ORM (object relational mapping) framework (i.e. hibernate, iBatis ...) if application code is big enough. Performance issues of ORM frameworks can be handled by detailed configuration parameters. Partition big and unused/rarely used tables/table parts to different physical storages for better query performance. For big, sensitive and mission critic database systems, use disaster recovery and security services like failover clustering, auto backups, replication etc. Use constraints (foreign key, check, not null ...) for data integrity. Don’t give whole control to application code. Lack of database documentation is evil. Document your database design with ER schemas and instructions. Also write comment lines for your triggers, stored procedures and other scripts. Use indexes for frequently used queries on big tables. Analyser tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better. Database server and the web server must be placed in different machines. This will provide more security (attackers can’t access data directly) and server CPU and memory performance will be better because of reduced request number and process usage. Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables. Normalization must be used as required, to optimize the performance. Under-normalization will cause excessive repetition of data, over-normalization will cause excessive joins across too many tables. Both of them will get worse performance. Spend time for database modeling and design as much as required. Otherwise saved(!) design time will cause (saved(!) design time) * 10/100/1000 maintenance and re-design time.

November 4, 2012

by Cagdas Basaraner

· 256,338 Views · 13 Likes

What's New in JAX-RS 2.0

JAX-RS is a framework designed to help you write RESTful applications both on the client and server side. With Java EE 7 is being slated to be released next year, 2013, JAX-RS is one of the specifications getting a deep revision. JAX-RS 2.0 is currently in the Public Draft phase at the JCP, so now is a good time to discuss some of the key new features so that you can start playing with your favorite JAX-RS implementation and give some valuable feedback the expert group needs to finalize the specification. Key features in 2.0: Client API Server-side Asynchronous HTTP Filters and Interceptors This article gives a brief overview of each of these features. Client Framework One huge thing missing from JAX-RS 1.0 was a client API. While it was easy to write a portable JAX-RS service, each JAX-RS implementation defined their own proprietary API. JAX-RS 2.0 fills in this gap with a fluent, low-level, request building API. Here's a simple example: Client client = ClientFactory.newClient(); WebTarget target = client.target("http://example.com/shop"); Form form = new Form().param("customer", "Bill") .param("product", "IPhone 5") .param("CC", "4444 4444 4444 4444"); Response response = target.request().post(Entity.form(form)); assert response.getStatus() == 200; Order order = response.readEntity(Order.class); Let's dissect this example code. The Client interface manages and configures HTTP connections. It is also a factory for WebTargets. WebTargets represent a specific URI. You build and execute requests from a WebTarget instance. Response is the same class defined in JAX-RS 1.0, but it has been expanded to support the client side. The example client, allocates an instance of a Client, creates a WebTarget, then posts form data to the URI represented by the WebTarget. The Response object is tested to see if the status is 200, then the application class Order is extracted from the response using the readEntity() method. The MessageBodyReader and MessageBodyWriter content handler interfaces defined in JAX-RS 1.0 are reused on the client side. When the readEntity() method is invoked in the example code, a MessageBodyReader is matched with the response's Content-Type and the Java type (Order) passed in as a parameter to the readEntity() method. If you are optimistic that your service will return a successful response, there are some nice helper methods that allow you to get the Java object directly without having to interact with and write additional code around a Response object. Customer cust = client.target("http://example.com/customers") .queryParam("name", "Bill Burke") .request().get(Customer.class); In this example we target a URI and specify an additional query parameter we want appended to the request URI. The get() method has an additional parameter of the Java type we want to unmarshal the HTTP response to. If the HTTP response code is something other than 200, OK, JAX-RS picks an exception that maps to the error code from a defined exception hierarchy in the JAX-RS client API. Asynchronous Client API The JAX-RS 2.0 client framework also supports an asynchronous API and a callback API. This allows you to execute HTTP requests in the background and either poll for a response or receive a callback when the request finishes. Future future = client.target("http://e.com/customers") .queryParam("name", "Bill Burke") .request() .async() .get(Customer.class); try { Customer cust = future.get(1, TimeUnit.MINUTES); } catch (TimeoutException ex) { System.err.println("timeout"); } The Future interface is a JDK interface that has been around since JDK 5.0. The above code executes an HTTP request in the background, then blocks for one minute while it waits for a response. You could also use the Future to poll to see if the request is finished or not. Here's an example of using a callback interface. InvocationCallback callback = new InvocationCallback { public void completed(Response res) { System.out.println("Request success!"); } public void failed(ClientException e) { System.out.println("Request failed!");n } }; client.target("http://example.com/customers") .queryParam("name", "Bill Burke") .request() .async() .get(callback); In this example, we instantiate an implementation of the InvocationCallback interface. We invoke a GET request in the background and register this callback instance with the request. The callback interface will output a message on whether the request executed successfully or not. Those are the main features of the client API. I suggest browsing the specification and Javadoc to learn more. Server-Side Asynchronous HTTP On the server-side, JAX-RS 2.0 provides support for asynchronous HTTP. Asynchronous HTTP is generally used to implement long-polling interfaces or server-side push. JAX-RS 2.0 support for Asynchronous HTTP is annotation driven and is very analogous with how the Servlet 3.0 specification handles asynchronous HTTP support through the AsyncContext interface. Here's an example of writing a crude chat program. @Path("/listener") public class ChatListener { List listeners = ...some global list...; @GET public void listen(@Suspended AsyncResponse res) { list.add(res); } } For those of you who have used the Servlet 3.0 asynchronous interfaces, the above code may look familiar to you. An AsyncResponse is injected into the JAX-RS resource method via the @Suspended annotation. This act disassociates the calling thread to the HTTP socket connection. The example code takes the AsyncResponse instance and adds it to a application-defined global List object. When the JAX-RS method returns, the JAX-RS runtime will do no response processing. A different thread will handle response processing. @Path("/speaker") public class ChatSpeaker { List listeners = ...some global list...; @POST @Consumes("text/plain") public void speak(String speech) { for (AsyncResponse res : listeners) { res.resume(Response.ok(speech, "text/plain").build());n } } } When a client posts text to this ChatSpeaker interface, the speak() method loops through the list of registered AsyncResponses and sends back an 200, OK response with the posted text. Those are the main features of the asynchronous HTTP interface, check out the Javadocs for a deeper detail. Filters and Entity Interceptors JAX-RS 2.0 has an interceptor API that allows framework developers to intercept request and response processing. This powerful API allows framework developers to transparently add orthogonal concerns like authentication, caching, and encoding without polluting application code. Prior to JAX-RS 2.0 many JAX-RS providers like Resteasy, Jersey, and Apache CXF wrote their own proprietary interceptor frameworks to deliver various features in their implementations. So, while JAX-RS 2.0 filters and interceptors can be a bit complex to understand please note that it is very use-case driven based on real-world examples. I wrote a blog on JAX-RS interceptor requirements awhile back to help guide the JAX-RS 2.0 JSR Expert Group on defining such an API. The blog is a bit dated, but hopefully you can get the gist of why we did what we did. JAX-RS 2.0 has two different concepts for interceptions: Filters and Entity Interceptors. Filters are mainly used to modify or process incoming and outgoing request headers or response headers. They execute before and after request and response processing. Entity Interceptors are concerned with marshaling and unmarshalling of HTTP message bodies. They wrap around the execution of MessageBodyReader and MessageBodyWriter instances. Server Side Filters On the server-side you have two different types of filters. ContainerRequestFilters run before your JAX-RS resource method is invoked. ContainerResponseFilters run after your JAX-RS resource method is invoked. As an added caveat, ContainerRequestFilters come in two flavors: pre-match and post-matching. Pre-matching ContainerRequestFilters are designated with the @PreMatching annotation and will execute before the JAX-RS resource method is matched with the incoming HTTP request. Pre-matching filters often are used to modify request attributes to change how it matches to a specific resource. For example, some firewalls do not allow PUT and/or DELETE invocations. To circumvent this limitation many applications tunnel the HTTP method through the HTTP header X-Http-Method-Override. A pre-matching ContainerRequestFilter could implement this behavior. @Provider public class HttpOverride implements ContainerRequestFilter { public void filter(ContainerRequestContext ctx) { String method = ctx.getHeaderString("X-Http-Method-Override"); if (method != null) ctx.setMethod(method); } } Post matching ContainerRequestFilters execute after the Java resource method has been matched. These filters can implement a range of features for example, annotation driven security protocols. After the resource class method is executed, JAX-RS will run all ContainerResponseFilters. These filters allow you to modify the outgoing response before it is marshalled and sent to the client. One example here is a filter that automatically sets a Cache-Control header. @Provider public class CacheControlFilter implements ContainerResponseFilter { public void filter(ContainerRequestContext req, ContainerResponseContext res) { if (req.getMethod().equals("GET")) { req.getHeaders().add("Cache-Control", cacheControl); } } } Client Side Filters On the client side you also have two types of filters: ClientRequestFilter and ClientResponseFilter. ClientRequestFilters run before your HTTP request is sent over the wire to the server. ClientResponseFilters run after a response is received from the server, but before the response body is unmarshalled. A good example of client request and response filters working together is a client-side cache that supports conditional GETs. The ClientRequestFilter would be responsible for setting the If-None-Match or If-Modified-Since headers if the requested URI is already cached. Here's what that code might look like. @Provider public class ConditionalGetFilter implements ClientRequestFilter { public void filter(ClientRequestContext req) { if (req.getMethod().equals("GET")) { CacheEntry entry = cache.getEntry(req.getURI()); if (entry != null) { req.getHeaders().putSngle("If-Modified-Since", entry.getLastModified()); } } } } The ClientResponseFilter would be responsible for either buffering and caching the response, or, if a 302, Not Modified response was sent back, to edit the Response object to change its status to 200, set the appropriate headers and buffer to the currently cached entry. This code would be a bit more complicated, so for brevity, we're not going to illustrate it within this article. Reader and Writer Interceptors While filters modify request or response headers, interceptors deal with message bodies. Interceptors are executed in the same call stack as their corresponding reader or writer. ReaderInterceptors wrap around the execution of MessageBodyReaders. WriterInterceptors wrap around the execution of MessageBodyWriters. They can be used to implement a specific content-encoding. They can be used to generate digital signatures or to post or pre-process a Java object model before or after it is marshalled. Here's an example of a GZIP encoding WriterInterceptor. @Provider public class GZIPEndoer implements WriterInterceptor { public void aroundWriteTo(WriterInterceptorContext ctx) throws IOException, WebApplicationException { GZIPOutputStream os = new GZIPOutputStream(ctx.getOutputStream()); try { ctx.setOutputStream(os); return ctx.proceed(); } finally { os.finish(); } } } Resource Method Filters and Interceptors Sometimes you want a filter or interceptor to only run for a specific resource method. You can do this in two different ways: register an implementation of DynamicFeature or use the @NameBinding annotation. The DynamicFeature interface is executed at deployment time for each resource method. You just use the Configurable interface to register the filters and interceptors you want for the specific resource method. @Provider public class ServerCachingFeature implements DynamicFeature { public void configure(ResourceInfo resourceInfo, Configurable configurable) { if (resourceInfo.getMethod().isAnnotationPresent(GET.class)) { configurable.register(ServerCacheFilter.class); } } } On the other hande, @NameBinding works a lot like CDI interceptors. You annotate a custom annotation with @NameBinding and then apply that custom annotation to your filter and resource method @NameBinding public @interface DoIt {} @DoIt public class MyFilter implements ContainerRequestFilter {...} @Path public class MyResource { @GET @DoIt public String get() {...} Wrapping Up Well, those are the main features of JAX-RS 2.0. There's also a bunch of minor features here and there, but youll have to explore them yourselves. If you want to testdrive JAX-RS 2.0 (and hopefully also give feedback to the expert group), Red Hat's Resteasy 3.0 and Oracle's Jersey project have implementations you can download and use. Useful Links Below are some useful links. I've also included links to some features in Resteasy that make use of filters and interceptors. This code might give you a more in-depth look into what you can do with this new JAX-RS 2.0 feature. JAX-RS 2.0 Public Draft Specification Resteasy 3.0 Download Jersey Resteasy 3.0 client cache implementation code (to see how filters interceptors work on client side) Doseta digital signature headers (good use case or interceptors) File suffix content negotiation implementation (server-side filter example) Other server-side examples (cache-control annotations, gzip encoding, role-based security)

November 1, 2012

by Bill Burke

· 90,521 Views · 3 Likes

Algorithm of the Week: Shortest Path in a Directed Acyclic Graph

Introduction We saw how to find the shortest path in a graph with positive edges using the Dijkstra’s algorithm. We also know how to find the shortest paths from a given source node to all other nodes even when there are negative edges using the Bellman-Ford algorithm. Now we’ll see that there’s a faster algorithm running in linear time that can find the shortest paths from a given source node to all other reachable vertices in a directed acyclic graph, also known as a DAG. Because the DAG is acyclic we don’t have to worry about negative cycles. As we already know it’s pointless to speak about shortest path in the presence of negative cycles because we can “loop” over these cycles and practically our path will become shorter and shorter. The presence of a negative cycles make our attempt to find the shortest path pointless! Thus we have two problems to overcome with Dijkstra and the Bellman-Ford algorithms. First of all we needed only positive weights and on the second place we didn’t want cycles. Well, we can handle both cases in this algorithm. Overview The first thing we know about DAGs is that they can easily be topologically sorted. Topological sort can be used in many practical cases, but perhaps the mostly used one is when trying to schedule dependent tasks. Topological sort is often used to “sort” dependent tasks! After a topological sort we end with a list of vertices of the DAG and we’re sure that if there’s an edge (u, v), u will precede v in the topologically sorted list. If there’s an edge (u,v) then u must precede v. This results in the more general case from the image. There’s no edge between B and D, but B precedes D! This information is precious and the only thing we need to do is to pass through this sorted list and to calculate distances for a shortest paths just like the algorithm of Dijkstra. OK, so let’s summarize this algorithm: - First we must topologically sort the DAG; - As a second step we set the distance to the source to 0 and infinity to all other vertices; - Then for each vertex from the list we pass through all its neighbors and we check for shortest path; It’s pretty much like the Dijkstra’s algorithm with the main difference that we used a priority queue then, while this time we use the list from the topological sort. Code This time the code is actually a pseudocode. Although all the examples so far was in PHP, perhaps pseudocode is easier to understand and doesn’t bind you in a specific language implementation. Also if you don’t feel comfortable with the given programming language it can be more difficult for you to understand the code than by reading pseudocode. 1. Topologically sort G into L; 2. Set the distance to the source to 0; 3. Set the distances to all other vertices to infinity; 4. For each vertex u in L 5. - Walk through all neighbors v of u; 6. - If dist(v) > dist(u) + w(u, v) 7. - Set dist(v) <- dist(u) + w(u, v); Application It’s clear why and where we must use this algorithm. The only problem is that we must be sure that the graph doesn’t have cycles. However if we’re aware of how the graph is created we may have some additional information if there are cycles or not – then this linear time algorithm can be very applicable.

October 30, 2012

by Stoimen Popov

· 28,093 Views

Exporting and Importing VM Settings with Azure Command-Line Tools

We've talked previously about the Windows Azure command-line tools, and have used them in a few posts such as Brian's Migrating Drupal to a Windows Azure VM. While the tools are generally useful for tons of stuff, one of the things that's been painful to do with the command-line is export the settings for a VM, and then recreate the VM from those settings. You might be wondering why you'd want to export a VM and then recreate it. For me, cost is the first thing that comes to mind. It costs more to keep a VM running than it does to just keep the disk in storage. So if I had something in a VM that I'm only using a few hours a day, I'd delete the VM when I'm not using it and recreate it when I need it again. Another potential reason is that you want to create a copy of the disk so that you can create a duplicate virtual machine. The export process used to be pretty arcane stuff; using the azure vm show command with a --json parameter and piping the output to file. Then hacking the .json file to fix it up so it could be used with the azure vm create-from command. It was bad. It was so bad, the developers added a new export command to create the .json file for you. Here's the basic process: Create a VM VM creation has been covered multiple ways already; you're either going to use the portal or command line tools, and you're either going to select an image from the library or upload a VHD. In my case, I used the following command: azure vm create larryubuntu CANONICAL__Canonical-Ubuntu-12-04-amd64-server-20120528.1.3-en-us-30GB.vhd larry NotaRe This command creates a new VM in the East US data center, enables SSH on port 22 and then stores a disk image for this VM in a blob. You can see the new disk image in blob storage by running: azure vm disk list The results should return something like: info: Executing command vm disk list + Fetching disk images data: Name OS data: ---------------------------------------- ------- data: larryubuntu-larryubuntu-0-20121019170709 Linux info: vm disk list command OK That's the actual disk image that is mounted by the VM. Export and Delete the VM Alright, I've done my work and it's the weekend. I need to export the VM settings so I can recreate it on Monday, then delete the VM so I won't get charged for the next 48 hours of not working. To export the settings for the VM, I use the following command: azure vm export larryubuntu c:\stuff\vminfo.json This tells Windows Azure to find the VM named larryubuntu and export its settings to c:\stuff\vminfo.json. The .json file will contain something like this: { "RoleName":"larryubuntu", "RoleType":"PersistentVMRole", "ConfigurationSets": [ { "ConfigurationSetType":"NetworkConfiguration", "InputEndpoints": [ { "LocalPort":"22", "Name":"ssh", "Port":"22", "Protocol":"tcp", "Vip":"168.62.177.227" } ], "SubnetNames":[] } ], "DataVirtualHardDisks":[], "OSVirtualHardDisk": { "HostCaching":"ReadWrite", "DiskName":"larryubuntu-larryubuntu-0-20121024155441", "OS":"Linux" }, "RoleSize":"Small" } If you're like me, you'll immediately start thinking "Hrmmm, I wonder if I can mess around with things like RoleSize." And yes, you can. If you wanted to bump this up to medium, you'd just change that parameter to medium. If you want to play around more with the various settings, it looks like the schema is maintained at https://github.com/WindowsAzure/azure-sdk-for-node/blob/master/lib/services/serviceManagement/models/roleschema.json. Once I've got the file, I can safely delete the VM by using the following command. azure vm delete larryubuntu It spins a bit and then no more VM. Recreate the VM Ugh, Monday. Time to go back to work, and I need my VM back up and running. So I run the following command: azure vm create-from larryubuntu c:\stuff\vminfo.json --location "East US" It takes only a minute or two to spin up the VM and it's ready for work. That's it - fast, simple, and far easier than the old process of generating the .json settings file. Note that I haven't played around much with the various settings described in the schema for the json file that I linked above. If you find anything useful or interesting that can be accomplished by hacking around with the .json, leave a comment about it.

October 29, 2012

by Larry Franks

· 6,148 Views

Exploring the HTML5 Web Audio: Visualizing Sound

If you've read some of my other articles on this blog you probably know I'm a fan of HTML5. With HTML5 we get all this interesting functionality, directly in the browser, in a way that, eventually, is standard across browsers. One of the new HTML5 APIs that is slowly moving through the standardization process is the Web Audio API. With this API, currently only supported in Chrome, we get access to all kinds of interesting audio components you can use to create, modify and visualize sounds (such as the following spectrogram). So why do I start with visualizations? It looks nice, that's one reason, but not the important one. This API provides a number of more complex components, whose behavior is much easier to explain when you can see what happens. With a filter you can instantly see whether some frequencies are filtered, instead of trying to listen to the resulting audio for thse changes. There are many interesting examples that use this API. The problem is, though, that getting started with this API and with digital signal processing (DSP) usually isn't explained. In this article I'll walk you through a couple of steps that shows how to do the following: Create a signal volume meter Visualize the frequencies using a spectrum analyzer And show a time based spectrogram We start with the basic setup that we can use as the basis for the components we'll create. Setting up the basic If we want to experiment with sound, we need some sound source. We could use the microphone (as we'll do later in this series), but to keep it simple, for now we'll just use an mp3 as our input. To get this working using web audio we have to take the following steps: Load the data Read it in a buffer node and play the sound Load the data With the web audio we can use different types of audio sources. We've got a MediaElementAudioSourceNode that can be used to use the audio provided by a media element. There's also a MediaStreamAudioSourceNode. With this audio source node we can use the microphone as input (see my previous article on sound recognition). Finally there is the AudioBufferSourceNode. With this node we can load the data from an existing audio file (e.g mp3) and use that as input. For this example we'll use this last approach. // create the audio context (chrome only for now) var context = new webkitAudioContext(); var audioBuffer; var sourceNode; // load the sound setupAudioNodes(); loadSound("wagner-short.ogg"); function setupAudioNodes() { // create a buffer source node sourceNode = context.createBufferSource(); // and connect to destination sourceNode.connect(context.destination); } // load the specified sound function loadSound(url) { var request = new XMLHttpRequest(); request.open('GET', url, true); request.responseType = 'arraybuffer'; // When loaded decode the data request.onload = function() { // decode the data context.decodeAudioData(request.response, function(buffer) { // when the audio is decoded play the sound playSound(buffer); }, onError); } request.send(); } function playSound(buffer) { sourceNode.buffer = buffer; sourceNode.noteOn(0); } // log if an error occurs function onError(e) { console.log(e); } In this example you can see a couple of functions. The setupAudioNodes function creates a BufferSource audio node and connects it to the destination. The loadSound function shows how you can load an audio file. The buffer which is passed into the playSound function contains decoded audio that can be used by the web audio API. In this example I use an .ogg file, for a complete overview of the formats supported look at: https://sites.google.com/a/chromium.org/dev/audio-video Play the sound To play this audio file, all we have to do is turn the source node on, this is done in the playSound function: function playSound(buffer) { sourceNode.buffer = buffer; sourceNode.noteOn(0); } You can test this out at the following page: Example 1: Loading and playing a sound with Web Audio API. When you open that page, you'll hear some music. Nothing to spectacular for now, but nevertheless an easy way to load audio that'll use for the rest of this article. The first item on our list was the volume meter. Create a volume meter One of the basic scenario's, and often one of the first steps someone new to this API tries to create, is a simple signal volume meter (or an UV meter). I expected this to be a standard component in this API, where I could just read off the signal strength as a property. But, no such node exists. But not to worry, with the components that are available, it's pretty easy (not straightforward, but easy nevertheless) to get an indication of the signal strength of your audio file. Int this section we'll create the following simple volume meter: As you can see this is a simple volume meter where we measure the signal strength for the left and the right audio channel. This is drawn on the canvas, but you could have also used divs or svg to visualize this. Lets start with a single volume meter, instead of one for each channel. For this we need to do the following: Create an analyzer node: With this node we get realtime information about the data that is processed. This data we use to determine the signal strength Create a javascript node: We use this node as a timer to update the volume meters with new information Connect everything together Analyser node With the analyser node we can perform real-time frequency and time domain analysis. From the specification: a node which is able to provide real-time frequency and time-domain analysis information. The audio stream will be passed un-processed from input to output. I won't go into the mathematical details behind this node, since there are many articles out there that explain how this works (a good one is the chapter on fourier transformation from here). What you should now about this node is that it splits up the signal in frequency buckets and we get the amplitude (the signal strenght) for each set of frequencies (the bucket). The best way to understand this, is to skip a bit ahead in this article and look at the frequency distribution we'll create later on. This image plots the result from the analyser node. The frequencies increase from left to right, and the height of the bar shows the strength of that specific frequency bucket. More on this later on in the article. For now we don't want to see the strength of the separate frequency buckets, but the strength of the total signal. For this we'll just add all the strenghts from each bucket and divide it by the number of buckets. First we need to create an analyzer node // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; This creates an analyzer node whose result will be used to create the volume meter. We use a smoothingTimeConstant to make the meter less jittery. With this variable we use input from a longer time period to calculate the amplitudes, this results in a more smooth meter. The fftSize determine how many buckets we get containing frequency information. If we have a fftSize of 1024 we get 512 buckets (more info on this in the book on DPS and fourier transformations). When this node receives a stream of data, it analyzes this stream and provides us with information about the frequencies in that signal and their strengths. We now need a timer to update the meter at regular intervals. We could use the standard javascript setInterval function, but since we're looking at the Web Audio API lets use one of its nodes. The JavaScriptNode. The javascript node With the javascriptnode we can process the raw audio data directly from javascript. We can use this to write our own analyzers or complex components. We're not going to do that, though. When creating the javascript node, you can specify the interval at which it is called. We'll use that feature to update the meter at regulat intervals. Creating a javascript node is very easy. // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); This will create a javascriptnode that is called whenever the 2048 frames have been sampled. Since our data is sampled at 44.1k, this function will be called approximately 21 times a second. Now what happens when this function is called: // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function() { // get the average, bincount is fftsize / 2 var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); var average = getAverageVolume(array) // clear the current state ctx.clearRect(0, 0, 60, 130); // set the fill style ctx.fillStyle=gradient; // create the meters ctx.fillRect(0,130-average,25,130); } function getAverageVolume(array) { var values = 0; var average; var length = array.length; // get all the frequency amplitudes for (var i = 0; i < length; i++) { values += array[i]; } average = values / length; return average; } In these two functions we calculate the average and draw the meter directly on the canvas (using a gradient so we have nice colors). Now all we have to do is connect the output from the audiosource to the analyser, the analyser to the javasource node (and if we want audio to hear, we also need to connect something to the destionation). Connect everything together Connecting everything together is easy: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; // create a buffer source node sourceNode = context.createBufferSource(); // connect the source to the analyser sourceNode.connect(analyser); // we use the javascript node to draw at a specific interval. analyser.connect(javascriptNode); // and connect to destination, if you want audio sourceNode.connect(context.destination); } And that's it. This will draw a single volume meter, for the complete signal. Now what do we do when we want to have a volume meter for each channel. For this we use a ChannelSplitter. Let's dive right into the code to connect everything: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; analyser2 = context.createAnalyser(); analyser2.smoothingTimeConstant = 0.0; analyser2.fftSize = 1024; // create a buffer source node sourceNode = context.createBufferSource(); splitter = context.createChannelSplitter(); // connect the source to the analyser and the splitter sourceNode.connect(splitter); // connect one of the outputs from the splitter to // the analyser splitter.connect(analyser,0,0); splitter.connect(analyser2,1,0); // we use the javascript node to draw at a // specific interval. analyser.connect(javascriptNode); // and connect to destination sourceNode.connect(context.destination); } As you can see we don't really change much. We introduce a new node, the splitter node. This node splits the sound into a left and a right channel. These channels can be processed separately. With this layout the following happens: The audiosource creates a signal based on the buffered audio. This signal is sent to the splitter, who splits the signal into a left and right stream. Each of these two streams is processed by their own realtime analyser. From the javascript node, we now get the information from both analysers and plot both meters I've shown step 1 through 3, let's quickly move on the step 4. For this we simply add the following to the onaudioprocess node: javascriptNode.onaudioprocess = function() { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); var average = getAverageVolume(array); // get the average for the second channel var array2 = new Uint8Array(analyser2.frequencyBinCount); analyser2.getByteFrequencyData(array2); var average2 = getAverageVolume(array2); // clear the current state ctx.clearRect(0, 0, 60, 130); // set the fill style ctx.fillStyle=gradient; // create the meters ctx.fillRect(0,130-average,25,130); ctx.fillRect(30,130-average2,25,130); } And now we've got two signal meters, one for each channel. Example 2: Visualize the signal strength with a volume meter. Or view the result on youtube: Now lets see how we can get the view of the frequencies I showed earlier. Create a frequency spectrum With all the work we already did in the previous section, creating a frequency spectrum overview is now very easy. We're going to aim for this: We set up the nodes just like we did in the first example: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 512; // create a buffer source node sourceNode = context.createBufferSource(); sourceNode.connect(analyser); analyser.connect(javascriptNode); // sourceNode.connect(context.destination); } So this time we don't split the channels and we set the fftSize to 512. This means we get 256 bars that represent our frequency. We now just need to alter the onaudioprocess method and the gradient we use: var gradient = ctx.createLinearGradient(0,0,0,300); gradient.addColorStop(1,'#000000'); gradient.addColorStop(0.75,'#ff0000'); gradient.addColorStop(0.25,'#ffff00'); gradient.addColorStop(0,'#ffffff'); // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function() { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); // clear the current state ctx.clearRect(0, 0, 1000, 325); // set the fill style ctx.fillStyle=gradient; drawSpectrum(array); } function drawSpectrum(array) { for ( var i = 0; i < (array.length); i++ ){ var value = array[i]; ctx.fillRect(i*5,325-value,3,325); } }; In the drawSpectrum function we iterate over the array, and draw a vertical bar based on the value. That's it. For a live example, click on the following link: Example 3: Visualize the frequency spectrum. Or view it on youtube: And then the final one. The spectrogram. Time based spectrogram When you run the previous demo you see the strength of the various frequency buckets in real time. While this is a nice visualization, it doesn't allow you to analyze information over a period of time. If you want to do that you can create a spectrogram. With a spectrogram we plot a single line for each measurement. The y-axis represents the frequency, the x-asis the time and the color of a pixel the strength of that frequency. It can be used to analyze the received audio, and also creates nice looking images. The good thing, is that to output this data we don't have to change much from what we've already got in place. The only function that'll change is the onaudioprocess node and we'll create a slightly different analyser. analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0; analyser.fftSize = 1024; The enalyser we create here has an fftSize of 1024, this means we get 512 frequency buckets with strengths. So we can draw a spectrogram that has a height of 512 pixels. Also note that the smoothingTimeConstant is set to 0. This means we don't use any of the previous results in the analysis. We want to show the real information, not provide a smooth volume meter or frequency spectrum analysis. The easiest way to draw a spectrogram is by just start drawing the line at the left, and for each new set of frequencies increase the x-coordinate by one. The problem is that this will quickly fill up our canvas, and we'll only be able to see the first half a minute of the audio. To fix this, we need some creative canvas copying. The complete code for drawing the spectrogram is shown here: // create a temp canvas we use for copying and scrolling var tempCanvas = document.createElement("canvas"), tempCtx = tempCanvas.getContext("2d"); tempCanvas.width=800; tempCanvas.height=512; // used for color distribution var hot = new chroma.ColorScale({ colors:['#000000', '#ff0000', '#ffff00', '#ffffff'], positions:[0, .25, .75, 1], mode:'rgb', limits:[0, 300] }); ... // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function () { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); // draw the spectrogram if (sourceNode.playbackState == sourceNode.PLAYING_STATE) { drawSpectrogram(array); } } function drawSpectrogram(array) { // copy the current canvas onto the temp canvas var canvas = document.getElementById("canvas"); tempCtx.drawImage(canvas, 0, 0, 800, 512); // iterate over the elements from the array for (var i = 0; i < array.length; i++) { // draw each pixel with the specific color var value = array[i]; ctx.fillStyle = hot.getColor(value).hex(); // draw the line at the right side of the canvas ctx.fillRect(800 - 1, 512 - i, 1, 1); } // set translate on the canvas ctx.translate(-1, 0); // draw the copied image ctx.drawImage(tempCanvas, 0, 0, 800, 512, 0, 0, 800, 512); // reset the transformation matrix ctx.setTransform(1, 0, 0, 1, 0, 0); } To draw the spectrogram we do the following: We copy what is currently drawn to a hidden canvas Next we draw a line of the current values at the far right of the canvas We set the translate on the canvas to -1 We copy the copied information back to the original canvas (that is now drawn 1 pixel to the left) And reset the transformation matrix See a running example here: Example 4: Create a spectrogram Or view it here: Last thing I'd like to mention regarding the code is the chroma.js library I used for the colors. If you ever need to draw something color or gradient related (e.g maps, strengths, levels) you can easily create color scales with this library. Two final pointers, I know I'll get questions about: Volume could be represented as a magnitude, just didn't want to complicate matters for this. The spectogram doesn't use logarithmic scales. Once again, didn't want to complicate things

October 23, 2012

by Jos Dirksen

· 68,750 Views · 1 Like

Understanding JVM Internals, from Basic Structure to Java SE 7 Features

Learn about the structure of JVM, how it works, executes Java bytecode, the order of execution, examples of common mistakes and their solutions, new Java SE 7 features.

October 19, 2012

by Esen Sagynov

· 178,536 Views · 20 Likes

PartitionKey and RowKey in Windows Azure Table Storage

For the past few months, I’ve been coaching a “Microsoft Student Partner” (who has a great blog on Kinect for Windows by the way!) on Windows Azure. One of the questions he recently had was around PartitionKey and RowKey in Windows Azure Table Storage. What are these for? Do I have to specify them manually? Let’s explain… Windows Azure storage partitions All Windows Azure storage abstractions (Blob, Table, Queue) are built upon the same stack (whitepaper here). While there’s much more to tell about it, the reason why it scales is because of its partitioning logic. Whenever you store something on Windows Azure storage, it is located on some partition in the system. Partitions are used for scale out in the system. Imagine that there’s only 3 physical machines that are used for storing data in Windows Azure storage: Based on the size and load of a partition, partitions are fanned out across these machines. Whenever a partition gets a high load or grows in size, the Windows Azure storage management can kick in and move a partition to another machine: By doing this, Windows Azure can ensure a high throughput as well as its storage guarantees. If a partition gets busy, it’s moved to a server which can support the higher load. If it gets large, it’s moved to a location where there’s enough disk space available. Partitions are different for every storage mechanism: In blob storage, each blob is in a separate partition. This means that every blob can get the maximal throughput guaranteed by the system. In queues, every queue is a separate partition. In tables, it’s different: you decide how data is co-located in the system. PartitionKey in Table Storage In Table Storage, you have to decide on the PartitionKey yourself. In essence, you are responsible for the throughput you’ll get on your system. If you put every entity in the same partition (by using the same partition key), you’ll be limited to the size of the storage machines for the amount of storage you can use. Plus, you’ll be constraining the maximal throughput as there’s lots of entities in the same partition. Should you set the PartitionKey to the same value for every entity stored? No. You’ll end up with scaling issues at some point. Should you set the PartitionKey to a unique value for every entity stored? No. You can do this and every entity stored will end up in its own partition, but you’ll find that querying your data becomes more difficult. And that’s where our next concept kicks in… RowKey in Table Storage A RowKey in Table Storage is a very simple thing: it’s your “primary key” within a partition. PartitionKey + RowKey form the composite unique identifier for an entity. Within one PartitionKey, you can only have unique RowKeys. If you use multiple partitions, the same RowKey can be reused in every partition. So in essence, a RowKey is just the identifier of an entity within a partition. PartitionKey and RowKey and performance Before building your code, it’s a good idea to think about both properties. Don’t just assign them a guid or a random string as it does matter for performance. The fastest way of querying? Specifying both PartitionKey and RowKey. By doing this, table storage will immediately know which partition to query and can simply do an ID lookup on RowKey within that partition. Less fast but still fast enough will be querying by specifying PartitionKey: table storage will know which partition to query. Less fast: querying on only RowKey. Doing this will give table storage no pointer on which partition to search in, resulting in a query that possibly spans multiple partitions, possibly multiple storage nodes as well. Wihtin a partition, searching on RowKey is still pretty fast as it’s a unique index. Slow: searching on other properties (again, spans multiple partitions and properties). Note that Windows Azure storage may decide to group partitions in so-called "Range partitions" - see http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx. In order to improve query performance, think about your PartitionKey and RowKey upfront, as they are the fast way into your datasets. Deciding on PartitionKey and RowKey Here’s an exercise: say you want to store customers, orders and orderlines. What will you choose as the PartitionKey (PK) / RowKey (RK)? Let’s use three tables: Customer, Order and Orderline. An ideal setup may be this one, depending on how you want to query everything: Customer (PK: sales region, RK: customer id) – it enables fast searches on region and on customer id Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: order line id) – allows fast querying on both order id as well as order line id. Of course, depending on the system you are building, the following may be a better setup: Customer (PK: customer id, RK: display name) – it enables fast searches on customer id and display name Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: item id) – allows fast querying on both order id as well as the item bought, of course given that one order can only contain one order line for a specific item (PK + RK should be unique) You see? Choose them wisely, depending on your queries. And maybe an important sidenote: don’t be afraid of denormalizing your data and storing data twice in a different format, supporting more query variations. There’s one additional “index” That’s right! People have been asking Microsoft for a secondary index. And it’s already there… The table name itself! Take our customer – order – orderline sample again… Having a Customer table containing all customers may be interesting to search within that data. But having an Orders table containing every order for every customer may not be the ideal solution. Maybe you want to create an order table per customer? Doing that, you can easily query the order id (it’s the table name) and within the order table, you can have more detail in PK and RK. And there's one more: your account name. Split data over multiple storage accounts and you have yet another "partition". Conclusion In conclusion? Choose PartitionKey and RowKey wisely. The more meaningful to your application or business domain, the faster querying will be and the more efficient table storage will work in the long run.

October 19, 2012

by Maarten Balliauw

· 56,799 Views · 10 Likes

Debugging Hibernate Envers - Historical Data

recently in our project we reported a strange bug. in one report where we display historical data provided by hibernate envers , users encountered duplicated records in the dropdown used for filtering. we tried to find the source of this bug, but after spending a few hours looking at the code responsible for this functionality we had to give up and ask for a dump from production database to check what actually is stored in one table. and when we got it and started investigating, it turned out that there is a bug in hibernate envers 3.6 that is a cause of our problems. but luckily after some investigation and invaluable help from adam warski (author of envers) we were able to fix this issue. bug itself let’s consider following scenario: a transaction is started. we insert some audited entities during it and then it is rolled back. the same entitymanager is reused to start another transaction second transaction is committed but when we check audit tables for entities that were created and then rolled back in step one, we will notice that they are still there and were not rolled back as we expected. we were able to reproduce it in a failing test in our project, so the next step was to prepare failing test in envers so we could verify if our fix is working. failing test the simplest test cases already present in envers are located in simple.java class and they look quite straightforward: public class simple extends abstractentitytest { private integer id1; public void configure(ejb3configuration cfg) { cfg.addannotatedclass(inttestentity.class); } @test public void initdata() { entitymanager em = getentitymanager(); em.gettransaction().begin(); inttestentity ite = new inttestentity(10); em.persist(ite); id1 = ite.getid(); em.gettransaction().commit(); em.gettransaction().begin(); ite = em.find(inttestentity.class, id1); ite.setnumber(20); em.gettransaction().commit(); } @test(dependsonmethods = "initdata") public void testrevisionscounts() { assert arrays.aslist(1, 2).equals(getauditreader().getrevisions(inttestentity.class, id1)); } @test(dependsonmethods = "initdata") public void testhistoryofid1() { inttestentity ver1 = new inttestentity(10, id1); inttestentity ver2 = new inttestentity(20, id1); assert getauditreader().find(inttestentity.class, id1, 1).equals(ver1); assert getauditreader().find(inttestentity.class, id1, 2).equals(ver2); } } so preparing my failing test executing scenario described above wasn’t a rocket science: /** * @author tomasz dziurko (tdziurko at gmail dot com) */ public class transactionrollbackbehaviour extends abstractentitytest { public void configure(ejb3configuration cfg) { cfg.addannotatedclass(inttestentity.class); } @test public void testauditrecordsrollback() { // given entitymanager em = getentitymanager(); em.gettransaction().begin(); inttestentity itetorollback = new inttestentity(30); em.persist(itetorollback); integer rollbackediteid = itetorollback.getid(); em.gettransaction().rollback(); // when em.gettransaction().begin(); inttestentity ite2 = new inttestentity(50); em.persist(ite2); integer ite2id = ite2.getid(); em.gettransaction().commit(); // then list revisionsforsavedclass = getauditreader().getrevisions(inttestentity.class, ite2id); assertequals(revisionsforsavedclass.size(), 1, "there should be one revision for inserted entity"); list revisionsforrolledbackclass = getauditreader().getrevisions(inttestentity.class, rollbackediteid); assertequals(revisionsforrolledbackclass.size(), 0, "there should be no revisions for insert that was rolled back"); } } now i could verify that tests are failing on the forked 3.6 branch and check if the fix that we had is making this test green. the fix after writing a failing test in our project, i placed several breakpoints in envers code to understand better what is wrong there. but imagine being thrown in a project developed for a few years by many programmers smarter than you. i felt overwhelmed and had no idea where the fix should be applied and what exactly is not working as expected. luckily in my company we have adam warski on board. he is the initial author of envers and actually he pointed us the solution. the fix itself contains only one check that registers audit processes that will be executed on transaction completion only when such processes iare still in the map for the given transaction. it sounds complicated, but if you look at the class auditprocessmanager in this commit it should be more clear what is happening there. official path besides locating a problem and fixing it, there are some more official steps that must be performed to have fix included in envers. step 1. create jira issue with bug - https://hibernate.onjira.com/browse/hhh-7682 step 2: create local branch envers-bugfix-hhh-7682 of forked hibernate 3.6 step 3: commit and push failing test and fix to your local and remote repository on github step 4: create pull request - https://github.com/hibernate/hibernate-orm/pull/393 step 5: wait for merge and that’s all. now fix is merged into main repository and we have one bug less in the world of open source

October 17, 2012

by Tomasz Dziurko

· 7,528 Views

Create a Java App Server on a Virtual Machine

Curator's note: This tutorial originally appeared at the Windows Azure Java Developer Center. With Windows Azure, you can use a virtual machine to provide server capabilities. As an example, a virtual machine running on Windows Azure can be configured to host a Java application server, such as Apache Tomcat. On completing this guide, you will have an understanding of how to create a virtual machine running on Windows Azure and configure it to run a Java application server. You will learn: How to create a virtual machine. How to remotely log in to your virtual machine. How to install a JDK on your virtual machine. How to install a Java application server on your virtual machine. How to create an endpoint for your virtual machine. How to open a port in the firewall for your application server. For purposes of this tutorial, an Apache Tomcat application server will be installed on a virtual machine. The completed installation will result in a Tomcat installation such as the following. Note To complete this tutorial, you need a Windows Azure account that has the Windows Azure Virtual Machines feature enabled. You can create a free trial account and enable preview features in just a couple of minutes. For details, see Create a Windows Azure account and enable preview features. To create a virtual machine Log in to the Windows Azure Preview Management Portal. Click New. Click Virtual machine. Click Quick create. In the Create virtual machine screen, enter a value for DNS name. From the Image dropdown list, select an image, such as Windows Server 2008 R2 SP1. Enter a password in the New password field, and re-enter it in the Confirm field. This is the Administrator account password. Remember this password, you will use it when you remotely log in to the virtual machine. From the Location drop down list, select the data center location for your virtual machine; for example, West US. Your screen will look similar to the following. Click Create virtual machine. Your virtual machine will be created. You can monitor the status in the Virtual machines section of the management portal. To remotely log in to your virtual machine Log in to the Preview Management Portal. Click Virtual Machines, and then select the MyTestVM1 virtual machine that you previously created. On the command bar, click Connect. Click Open to use the remote desktop protocol file that was automatically created for the virtual machine Click Connect to proceed with the connection process. Type the password that you specified as the password of the Administrator account when you created the virtual machine, and then click OK. Click Yes to verify the identity of the virtual machine. To install a JDK on your virtual machine You can copy a Java Developer Kit (JDK) to your virtual machine, or install a JDK through an installer. For purposes of this tutorial, a JDK will be installed from Oracle's site. Log in to your virtual machine. Within your browser, open http://www.oracle.com/technetwork/java/javase/downloads/index.html. Click the Download button for the JDK that you want to download. For purposes of this tutorial, the Download button for the Java SE 6 Update 32 JDK was used. Accept the license agreement. Click the download executable for Windows x64 (64-bit). Follow the prompts and respond as needed to install the JDK to your virtual machine. To install a Java application server on your virtual machine You can copy a Java application server to your virtual machine, or install a Java application server through an installer. For purposes of this tutorial, a Java application server will be installed by copying a zip file from Apache's site. Log in to your virtual machine. Within your browser, open http://tomcat.apache.org/download-70.cgi. Double-click 64-bit Windows zip. (This tutorial used the zip for Tomcat Apache 7.0.27.) When prompted, choose to save the zip. When the zip is saved, open the folder that contains the zip and double-click the zip. Extract the zip. For purposes of this tutorial, the path used was C:\program files\apache-tomcat-7.0.27-windows-x64. To run the Java application server privately on your virtual machine The following steps show you how to run the Java application server and test it within the virtual machine's browser. It won't be usable by external computers until you create an endpoint and open a port (those steps are described later). Log in to your virtual machine. Add the JDK bin folder to the Pathenvironment variable: Click Windows Start. Right-click Computer. Click Properties. Click Advanced system settings. Click Advanced. Click Environment variables. In the System variables section, click the Path variable and then click Edit. Add a trailing ; to the Path variable value (if there is not one already) and then add c:\program files\java\jdk\bin to the end of the Path variable value (adjust the path as needed if you did not use c:\program files\java\jdk as the path for your JDK installation). Press OK on the opened dialogs to save your Path change. Set the JAVA_HOMEenvironment variable: Click Windows Start. Right-click Computer. Click Properties. Click Advanced system settings. Click Advanced. Click Environment variables. In the System variables section, click New. Create a variable named JRE_HOME and set its value to c:\program files\java\jdk\jre (adjust the path as needed if you did not use c:\program files\java\jdk as the path for your JDK installation). Press OK on the open dialogs to save your JRE_HOME environment variable. Start Tomcat: Open a command prompt. Change the current directory to the Apache Tomcat binfolder. For example: cd c:\program files\apache-tomcat-7.0.27-windows-x64\apache-tomcat-7.0.27\bin (Adjust the path as needed if you used a differrent installation path for Tomcat.) Run catalina.bat start. You should now see Tomcat running if you run the virtual machine's browser and open http://localhost:8080. To see Tomcat running from external machines, you'll need to create an endpoint and open a port. To create an endpoint for your virtual machine Log in to the Preview Management Portal. Click Virtual machines. Click the name of the virtual machine that is running your Java application server. Click Endpoints. Click Add endpoint. In the Add endpoint dialog, ensure Add endpoint is checked and click the Next button. In the New endpoint detailsdialog Specify a name for the endpoint; for example, HttpIn. Specify TCP for the protocol. Specify 80 for the public port. Specify 8080for the private port. Your screen should look similar to the following: Click the Check button to close the dialog. Your endpoint will now be created. To open a port in the firewall for your virtual machine Log in to your virtual machine. Click Windows Start. Click Control Panel. Click System and Security, click Windows Firewall, and then click Advanced Settings. Click Inbound Rules and then click New Rule. For the new rule, select Port for the Rule type and click Next. Select TCP for the protocol and specify 8080 for the port, and click Next. Choose Allow the connection and click Next. Ensure Domain, Private, and Public are checked for the profile and click Next. Specify a name for the rule, such as HttpIn (the rule name is not required to match the endpoint name, however), and then click Finish. At this point, your Tomcat web site should now be viewable from an external browser, using a URL of the form http://your_DNS_name.cloudapp.net, where your_DNS_name is the DNS name you specified when you created the virtual machine. Application lifecycle considerations You could create your own application web archive (WAR) and add it to the webapps folder. For example, create a basic Java Service Page (JSP) dynamic web project and export it as a WAR file, copy the WAR to the Apache Tomcat webapps folder on the virtual machine, then run it in a browser. This tutorial runs Tomcat through a command prompt where catalina.bat start was called. You may instead want to run Tomcat as a service, a key benefit being to have it automatically start if the virtual machine is rebooted. To run Tomcat as a service, you can install it as a service via the service.bat file in the Apache Tomcat bin folder, and then you could set it up to run automatically via the Services snap-in. You can start the Services snap-in by clicking Windows Start, Administrative Tools, and then Services. If you run service.bat install MyTomcat in the Apache Tomcat bin folder, then within the Services snap-in, your service name will appear as Apache Tomcat MyTomcat. By default when the service is installed, it will be set to start manually. To set it to start automatically, double-click the service in the Services snap-in and set Startup Type to Automatic, as shown in the following. You'll need to start the service the first time, which you can do through the Services snap-in (alternatively, you can reboot the virtual machine). Close the running occurrence of catalina.bat start if it is still running before starting the service.

October 15, 2012

by Eric Gregory

· 28,474 Views

Implementing Repository Pattern with Entity Framework

When working with Entity Framework - Code First model approach, a developer creates POCO entities for database tables. The benefit of using Code First model is to have POCO entity for each table that can be used as either WCF Data Contracts or you can apply your own custom attributes to handle Security, Logging, etc. and there is no mapping needed as we used to do in Entity Framework (Model First) approach if the application architecture is n-tier based. Considering the Data Access layer, we will implement a repository pattern that encapsulates the persistence logic in a separate class. This class will be responsible to perform database operations. Let’s suppose the application is based on n-tier architecture and having 3 tiers namely Presentation, Business and Data Access. Common library contains all our POCO entities that will be used by all the layers. Presentation Layer: Contains Views, Forms Business Layer: Managers that handle logic functionality Data Access Layer: Contains Repository class that handles CRUD operations Common Library: Contain POCO entities. We will implement an interface named “IRepository” that defines the signature of all the appropriate generic methods needed to perform CRUD operation and then implement the Repository class that defines the actual implementation of each method. We can also instantiate Repository object using Dependency Injection or apply Factory pattern. Code Snippet: IRepository public interface IRepository : IDisposable { /// /// Gets all objects from database /// /// IQueryable All() where T : class; /// /// Gets objects from database by filter. /// /// Specified a filter /// IQueryable Filter(Expression> predicate) where T : class; /// /// Gets objects from database with filting and paging. /// /// /// Specified a filter /// Returns the total records count of the filter. /// Specified the page index. /// Specified the page size /// IQueryable Filter(Expression> filter, out int total, int index = 0, int size = 50) where T : class; /// /// Gets the object(s) is exists in database by specified filter. /// /// Specified the filter expression /// bool Contains(Expression> predicate) where T : class; /// /// Find object by keys. /// /// Specified the search keys. /// T Find(params object[] keys) where T : class; /// /// Find object by specified expression. /// /// /// T Find(Expression> predicate) where T : class; /// /// Create a new object to database. /// /// Specified a new object to create. /// T Create(T t) where T : class; /// /// Delete the object from database. /// /// Specified a existing object to delete. int Delete(T t) where T : class; /// /// Delete objects from database by specified filter expression. /// /// /// int Delete(Expression> predicate) where T : class; /// /// Update object changes and save to database. /// /// Specified the object to save. /// int Update(T t) where T : class; /// /// Select Single Item by specified expression. /// /// /// /// T Single(Expression> expression) where T : class; void SaveChanges(); void ExecuteProcedure(String procedureCommand, params SqlParameter[] sqlParams); } Code Snippet: Repository public class Repository : IRepository { DbContext Context; public Repository() { Context = new DBContext(); } public Repository(DBContext context) { Context = context; } public void CommitChanges() { Context.SaveChanges(); } public T Single(Expression> expression) where T : class { return All().FirstOrDefault(expression); } public IQueryable All() where T : class { return Context.Set().AsQueryable(); } public virtual IQueryable Filter(Expression> predicate) where T : class { return Context.Set().Where(predicate).AsQueryable(); } public virtual IQueryable Filter(Expression> filter, out int total, int index = 0, int size = 50) where T : class { int skipCount = index * size; var _resetSet = filter != null ? Context.Set().Where(filter).AsQueryable() : Context.Set().AsQueryable(); _resetSet = skipCount == 0 ? _resetSet.Take(size) : _resetSet.Skip(skipCount).Take(size); total = _resetSet.Count(); return _resetSet.AsQueryable(); } public virtual T Create(T TObject) where T : class { var newEntry = Context.Set().Add(TObject); Context.SaveChanges(); return newEntry; } public virtual int Delete(T TObject) where T : class { Context.Set().Remove(TObject); return Context.SaveChanges(); } public virtual int Update(T TObject) where T : class { try { var entry = Context.Entry(TObject); Context.Set().Attach(TObject); entry.State = EntityState.Modified; return Context.SaveChanges(); } catch (OptimisticConcurrencyException ex) { throw ex; } } public virtual int Delete(Expression> predicate) where T : class { var objects = Filter(predicate); foreach (var obj in objects) Context.Set().Remove(obj); return Context.SaveChanges(); } public bool Contains(Expression> predicate) where T : class { return Context.Set().Count(predicate) > 0; } public virtual T Find(params object[] keys) where T : class { return (T)Context.Set().Find(keys); } public virtual T Find(Expression> predicate) where T : class { return Context.Set().FirstOrDefault(predicate); } public virtual void ExecuteProcedure(String procedureCommand, params SqlParameter[] sqlParams){ Context.Database.ExecuteSqlCommand(procedureCommand, sqlParams); } public virtual void SaveChanges() { Context.SaveChanges(); } public void Dispose() { if (Context != null) Context.Dispose(); } } The benefit of using Repository pattern is that all the database operations will be managed centrally and in future if you want to change the underlying database connector you can add another Repository class and defines its own implementation or change the existing one.

October 13, 2012

by Ovais Mehboob Ahmed Khan

· 31,503 Views

Bug Fixing: To Estimate, or Not to Estimate: That is The Question

According to Steve McConnell in Code Complete (data from 1975-1992) most bugs don’t take long to fix. About 85% of errors can be fixed in less than a few hours. Some more can be fixed in a few hours to a few days. But the rest take longer, sometimes much longer – as I talked about in an earlier post. Given all of these factors and uncertainty, how to you estimate a bug fix? Or should you bother? Block out some time for bug fixing Some teams don’t estimate bug fixes upfront. Instead they allocate a block of time, some kind of buffer for bug fixing as a regular part of the team’s work, especially if they are working in time boxes. Developers come back with an estimate only if it looks like the fix will require a substantial change – after they’ve dug into the code and found out that the fix isn’t going to be easy, that it may require a redesign or require changes to complex or critical code that needs careful review and testing. Use a rule of thumb placeholder for each bug fix Another approach is to use a rough rule of thumb, a standard place holder for every bug fix. Estimate ½ day of development work for each bug, for example. According to this post on Stack Overflow the ½ day suggestion comes from Jeff Sutherland, one of the inventors of Scrum. This place holder should work for most bugs. If it takes a developer more than ½ day to come up with a fix, then they probably need help and people need to know anyways. Pick a place holder and use it for a while. If it seems too small or too big, change it. Iterate. You will always have bugs to fix. You might get better at fixing them over time, or they might get harder to find and fix once you’ve got past the obvious ones. Or you could use the data earlier from Capers Jones on how long it takes to fix a bug by the type of bug. A day or half day works well on average, especially since most bugs are coding bugs (on average 3 hours) or data bugs (6.5 hours). Even design bugs on average only take little more than a day to resolve. Collect some data – and use it Steve McConnell, In Software Estimation: Demystifying the Black Art says that it’s always better to use data than to guess. He suggests collecting time data for as little as a few weeks or maybe a couple of months on how long on average it takes to fix a bug, and use this as a guide for estimating bug fixes going forward. If you have enough defect data, you can be smarter about how to use it. If you are tracking bugs in a bug database like Jira, and if programmers are tracking how much time they spend on fixing each bug for billing or time accounting purposes (which you can also do in Jira), then you can mine the bug database for similar bugs and see how long they took to fix – and maybe get some ideas on how to fix the bug that you are working on by reviewing what other people did on other bugs before you. You can group different bugs into buckets (by size – small, medium, large, x-large – or type) and then come up with an average estimate, and maybe even a best case, worst case and most likely for each type. Use Benchmarks For a maintenance team (a sustaining engineering or break/fix team responsible for software repairs only), you could use industry productivity benchmarks to project how many bugs your team can handle. Capers Jones in Estimating Software Costs says that the average programmer (in the US, in 2009), can fix 8-10 bugs per month (of course, if you’re an above-average programmer working in Canada in 2012, you’ll have to set these numbers much higher). Inexperienced programmers can be expected to fix 6 a month, while experienced developers using good tools can fix up to 20 per month. If you’re focusing on fixing security vulnerabilities reported by a pen tester or a scan, check out the remediation statistical data that Denim Group has started to collect, to get an idea on how long it might take to fix a SQL injection bug or an XSS vulnerability. So, do you estimate bug fixes, or not? Because you can’t estimate how long it will take to fix a bug until you’ve figured out what’s wrong, and most of the work in fixing a bug involves figuring out what’s wrong, it doesn’t make sense to try to do an in-depth estimate of how long it will take to fix each bug as they come up. Using simple historical data, a benchmark, or even a rough guess place holder as a rule-of-thumb all seem to work just as well. Whatever you do, do it in the simplest and most efficient way possible, don’t waste time trying to get it perfect – and realize that you won’t always be able to depend on it. Remember the 10x rule – some outlier bugs can take up to 10x as long to find and fix than an average bug. And some bugs can’t be found or fixed at all – or at least not with the information that you have today. When you’re wrong (and sometimes you’re going to be wrong), you can be really wrong, and even careful estimating isn’t going to help. So stick with a simple, efficient approach, and be prepared when you hit a hard problem, because it's gonna happen.

October 12, 2012

by Jim Bird

· 22,499 Views

MongoDB Aggregation Framework Examples in C#

MongoDB version 2.2 was released in late August and the biggest change it brought was the addition of the Aggregation Framework. Previously the aggregations required the usage of map/reduce, which in MongoDB doesn’t perform that well, mainly because of the single-threaded Javascript-based execution. The aggregation framework steps away from the Javascript and is implemented in C++, with an aim to accelerate performance of analytics and reporting up to 80 percent compared to using MapReduce. The aim of this post is to show examples of running the MongoDB Aggregation Framework with the official MongoDB C# drivers. Aggregation Framework and Linq Even though the current version of the MongoDB C# drivers (1.6) supports Linq, the support doesn’t extend to the aggregation framework. It’s highly probable that the Linq-support will be added later on and there’s already some hints about this in the driver’s source code. But at this point the execution of the aggregations requires the usage of the BsonDocument-objects. Aggregation Framework and GUIDs If you use GUIDs in your documents, the aggregation framework doesn’t work. This is because by default the GUIDs are stored in binary format and the aggregations won’t work against documents which contain binary data.. The solution is to store the GUIDs as strings. You can force the C# drivers to make this conversion automatically by configuring the mapping. Given that your C# class has Id-property defined as a GUID, the following code tells the driver to serialize the GUID as a string: BsonClassMap.RegisterClassMap(cm => { cm.AutoMap(); cm.GetMemberMap(c => c.Id) .SetRepresentation( BsonType.String); }); The example data These examples use the following documents: > db.examples.find() { "_id" : "1", "User" : "Tom", "Country" : "Finland", "Count" : 1 } { "_id" : "2", "User" : "Tom", "Country" : "Finland", "Count" : 3 } { "_id" : "3", "User" : "Tom", "Country" : "Finland", "Count" : 2 } { "_id" : "4", "User" : "Mary", "Country" : "Sweden", "Count" : 1 } { "_id" : "5", "User" : "Mary", "Country" : "Sweden", "Count" : 7 } Example 1: Aggregation Framework Basic usage This example shows how the aggregation framework can be executed through C#. We’re not going run any calculations to the data, we’re just going to filter it by the User. To run the aggregations, you can use either the MongoDatabase.RunCommand –method or the helper MongoCollection.Aggregate. We’re going to use the latter: var coll = localDb.GetCollection("examples"); ... coll.Aggregate(pipeline); The hardest part when working with Aggregation Framework through C# is building the pipeline. The pipeline is similar concept to the piping in PowerShell. Each operation in the pipeline will make modifications to the data: the operations can for example filter, group and project the data. In C#, the pipeline is a collection of BsonDocument object. Each document represents one operation. In our first example we need to do only one operation: $match. This operator will filter out the given documents. The following BsonDocument is a pipeline operation which filters out all the documents which don’t have User-field set to “Tom”. var match = new BsonDocument { { "$match", new BsonDocument { {"User", "Tom"} } } }; To execute this operation we add it to an array and pass the array to the MongoCollection.Aggregate-method: var pipeline = new[] { match }; var result = coll.Aggregate(pipeline); The MongoCollection.Aggregate-method returns an AggregateResult-object. It’s ResultDocuments-property (IEnumarable) contains the documents which are the output of the aggregation. To check how many results there were, we can get the Count: var result = coll.Aggregate(pipeline); Console.WriteLine(result.ResultDocuments.Count()); The result documents are BsonDocument-objects. If you have a C#-class which represent the documents, you can cast the results: var matchingExamples = result.ResultDocuments .Select(BsonSerializer.Deserialize) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example.User, example.Count); Console.WriteLine(message); } Another alternative is to use C#’s dynamic type. The following extension method uses JSON.net to convert a BsonDocument into a dynamic: public static class MongoExtensions { public static dynamic ToDynamic(this BsonDocument doc) { var json = doc.ToJson(); dynamic obj = JToken.Parse(json); return obj; } } Here’s a way to convert all the result documents into dynamic objects: var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); Example 2: Multiple filters & comparison operators This example filters the data with the following criteria: User: Tom Count: >= 2 var match = new BsonDocument { { "$match", new BsonDocument { {"User", "Tom"}, {"Count", new BsonDocument { { "$gte", 2 } } } } }; The execution of this operation is identical to the first example: var pipeline = new[] { match }; var result = coll.Aggregate(pipeline); var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); Also the result are as expected: foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example.User, example.Count); Console.WriteLine(message); } Example 3: Multiple operations In our first two examples, the pipeline was as simple as possible: It contained only one operation. This example will filter the data with the same exact criteria as the second example, but this time using two $match operations: User: Tom Count: >= 2 var match = new BsonDocument { { "$match", new BsonDocument { {"User", "Tom"} } } }; var match2 = new BsonDocument { { "$match", new BsonDocument { {"Count", new BsonDocument { { "$gte", 2 } } } } }; var pipeline = new[] { match, match2 }; The output stays the same: The first operation “match” takes all the documents from the examples collection and removes every document which doesn’t match the criteria User = Tom. The output of this operation (3 documents) then moves to the second operation “match2” of the pipeline. This operation only sees those 3 documents, not the original collection. The operation filters out these documents based on its criteria and moves the result (2 documents) forward. This is where our pipeline ends and this is also our result. Example 4: Group and sum Thus far we’ve used the aggregation framework to just filter out the data. The true strength of the framework is its ability to run calculations on the documents. This example shows how we can calculate how many documents there are in the collection, grouped by the user. This is done using the $group-operator: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "MyUser","$User" } } }, { "Count", new BsonDocument { { "$sum", 1 } } } } } }; The grouping key (in our case the User-field) is defined with the _id. The above example states that the grouping key has one field (“MyUser”) and the value for that field comes from the document’s User-field ($User). In the $group operation the other fields are aggregate functions. This example defines the field “Count” and adds 1 to it for every document that matches the group key (_id). var pipeline = new[] { group }; var result = coll.Aggregate(pipeline); var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example._id.MyUser, example.Count); Console.WriteLine(message); } Note the format in which the results are outputted: The user’s name is accessed through _id.MyUser-property. Example 5: Group and sum by field This example is similar to example 4. But instead of calculating the amount of documents, we calculate the sum of the Count-fields by the user: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "MyUser","$User" } } }, { "Count", new BsonDocument { { "$sum", "$Count" } } } } } }; The only change is that instead of adding 1, we add the value from the Count-field (“$Count”). Example 6: Projections This example shows how the $project operator can be used to change the format of the output. The grouping in example 5 works well, but to access the user’s name we currently have to point to the _id.MyUser-property. Let’s change this so that user’s name is available directly through UserName-property: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "MyUser","$User" } } }, { "Count", new BsonDocument { { "$sum", "$Count" } } } } } }; var project = new BsonDocument { { "$project", new BsonDocument { {"_id", 0}, {"UserName","$_id.MyUser"}, {"Count", 1}, } } }; var pipeline = new[] { group, project }; The code removes the _id –property from the output. It adds the UserName-property, which value is accessed from field _id.MyUser. The projection operations also states that the Count-value should stay as it is. var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example.UserName, example.Count); Console.WriteLine(message); } Example 7: Group with multiple fields in the keys For this example we add a new row into our document collection, leaving us with the following: { "_id" : "1", "User" : "Tom", "Country" : "Finland", "Count" : 1 } { "_id" : "2", "User" : "Tom", "Country" : "Finland", "Count" : 3 } { "_id" : "3", "User" : "Tom", "Country" : "Finland", "Count" : 2 } { "_id" : "4", "User" : "Mary", "Country" : "Sweden", "Count" : 1 } { "_id" : "5", "User" : "Mary", "Country" : "Sweden", "Count" : 7 } { "_id" : "6", "User" : "Tom", "Country" : "England", "Count" : 3 } This example shows how you can group the data by using multiple fields in the grouping key: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "MyUser","$User" }, { "Country","$Country" }, } }, { "Count", new BsonDocument { { "$sum", "$Count" } } } } } }; var project = new BsonDocument { { "$project", new BsonDocument { {"_id", 0}, {"UserName","$_id.MyUser"}, {"Country", "$_id.Country"}, {"Count", 1}, } } }; var pipeline = new[] { group, project }; var result = coll.Aggregate(pipeline); var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1} - {2}", example.UserName, example.Country, example.Count); Console.WriteLine(message); } Example 8: Match, group and project This example shows how you can combine many different pipeline operations. The data is first filtered ($match) by User=Tom, then grouped by the Country (“$group”) and finally the output is formatted into a readable format ($project). Match: var match = new BsonDocument { { "$match", new BsonDocument { {"User", "Tom"} } } }; Group: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "Country","$Country" }, } }, { "Count", new BsonDocument { { "$sum", "$Count" } } } } } }; Project: var project = new BsonDocument { { "$project", new BsonDocument { {"_id", 0}, {"Country", "$_id.Country"}, {"Count", 1}, } } }; Result: var pipeline = new[] { match, group, project }; var result = coll.Aggregate(pipeline); var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example.Country, example.Count); Console.WriteLine(message); } More There are many other interesting operators in the MongoDB Aggregation Framework, like $unwind and $sort. The usage of these operators is identical to ones we used above so it should be possible to copy-paste one of the examples and use it as a basis for these other operations. Links MongoDB C# Language Center MongoDB Aggregation Framework Easy to follow blog post about the aggregation framework

October 11, 2012

by Mikael Koskinen

· 47,403 Views · 2 Likes

How to Create and Deploy a Website with Windows Azure

Curator's note: This article originally appeared at WindowsAzure.com. To use this feature and other new Windows Azure capabilities, sign up for the free preview. Just as you can quickly create and deploy a web application created from the gallery, you can also deploy a website created on a workstation with traditional developer tools from Microsoft or other companies. Table of Contents Deployment Options How to: Create a Website Using the Management Portal How to: Create a Website from the Gallery How to: Delete a Website Next Steps Deployment Options Windows Azure supports deploying websites from remote computers using WebDeploy, FTP, GIT or TFS. Many development tools provide integrated support for publication using one or more of these methods and may only require that you provide the necessary credentials, site URL and hostname or URL for your chosen deployment method. Credentials and deployment URLs for all enabled deployment methods are stored in the website's publish profile, a file which can be downloaded in the Windows Azure (Preview) Management Portal from the Quick Start page or the quick glance section of the Dashboard page. If you prefer to deploy your website with a separate client application, high quality open source GIT and FTP clients are available for download on the Internet for this purpose. How to: Create a Website Using the Management Portal Follow these steps to create a website in Windows Azure. Login to the Windows Azure (Preview) Management Portal. Click the Create New icon on the bottom left of the Management Portal. Click the Web Site icon, click the Quick Create icon, enter a value for URL and then click the check mark next to create web site on the bottom right corner of the page. When the website has been created you will see the text Creation of Web Site '[SITENAME]' Completed. Click the name of the website displayed in the list of websites to open the website's Quick Start management page. On the Quick Start page you are provided with options to set up TFS or GIT publishing if you would like to deploy your finished website to Windows Azure using these methods. FTP publishing is set up by default for websites and the FTP Host name is displayed under FTP Hostname on the Quick Start and Dashboard pages. Before publishing with FTP or GIT choose the option to Reset deployment credentials on the Dashboard page. Then specify the new credentials (username and password) to authenticate against the FTP Host or the Git Repository when deploying content to the website. The Configure management page exposes several configurable application settings in the following sections: Framework: Set the version of .NET framework or PHP required by your web application. Diagnostics: Set logging options for gathering diagnostic information for your website in this section. App Settings: Specify name/value pairs that will be loaded by your web application on start up. For .NET sites, these settings will be injected into your .NET configuration AppSettings at runtime, overriding existing settings. For PHP and Node sites these settings will be available as environment variables at runtime. Connection Strings: View connection strings for linked resources. For .NET sites, these connection strings will be injected into your .NET configuration connectionStrings settings at runtime, overriding existing entries where the key equals the linked database name. For PHP and Node sites these settings will be available as environment variables at runtime. Default Documents: Add your web application's default document to this list if it is not already in the list. If your web application contains more than one of the files in the list then make sure your website's default document appears at the top of the list. How to: Create a Website from the Gallery The gallery makes available a wide range of popular web applications developed by Microsoft, third party companies, and open source software initiatives. Web applications created from the gallery do not require installation of any software other than the browser used to connect to the Windows Azure Management Portal. In this tutorial, you'll learn: How to create a new site through the gallery. How to deploy the site through the Windows Azure Portal. You'll build a Word press blog that uses a default template. The following illustration shows the completed application: Note To complete this tutorial, you need a Windows Azure account that has the Windows Azure Web Sites feature enabled. You can create a free trial account and enable preview features in just a couple of minutes. For details, see Create a Windows Azure account and enable preview features. Create a web site in the portal Login to the Windows Azure Management Portal. Click the New icon on the bottom left of the dashboard. Click the Web Site icon, and click From Gallery. Locate and click the WordPress icon in list, and then click Next. On the Configure Your App page, enter or select values for all fields: Enter a URL name of your choice Leave Create a new MySQL database selected in the Database field Select the region closest to you Then click Next. On the Create New Database page, you can specify a name for your new MySQL database or use the default name. Select the region closest to you as the hosting location. Select the box at the bottom of the screen to agree to ClearDB's usage terms for your hosted MySQL database. Then click the check to complete the site creation. After you click Complete Windows Azure will initiate build and deploy operations. While the web site is being built and deployed the status of these operations is displayed at the bottom of the Web Sites page. After all operations are performed, A final status message when the site has been successfully deployed. Launch and manage your WordPress site Click on your new site from the Web Sites page to open the dashboard for the site. On the Dashboard management page, scroll down and click the link on the left under Site Url to open the site’s welcome page. Enter appropriate configuration information required by WordPress and click Install WordPress to finalize configuration and open the web site’s login page. Login to the new WordPress web site by entering the username and password that you specified on the Welcome page. You'll have a new WordPress site that looks similar to the site below. How to: Delete a Website Websites are deleted using the Delete icon in the Windows Azure Management Portal. The Delete icon is available in the Windows Azure Portal when you click Web Sites to list all of your websites and at the bottom of each of the website management pages. Next Steps For more information about Websites, see the following: Walkthrough: Troubleshooting a Website on Windows Azure

October 9, 2012

by Eric Gregory

· 85,012 Views

Spring 3.1: Caching and EhCache

If you look around the web for examples of using Spring 3.1’s built in caching then you’ll usually bump into Spring’s SimpleCacheManager, which the Guys at Spring say is “Useful for testing or simple caching declarations”. I actually prefer to think of SimpleCacheManager as lightweight rather than simple; useful in those situations where you want a small in memory cache on a per JVM basis. If the Guys at Spring were running a supermarket then SimpleCacheManagerwould be in their own brand ‘basics’ product range. If, on the other hand, you need a heavy duty cache, one that’s scalable, persistent and distributed, then Spring also comes with a built in ehCache wrapper. The good news is that swapping between Spring's caching implementations is easy. In theory it’s all a matter of configuration and, to prove the theory correct, I took the sample code from my Caching and @Cacheable blog and ran it using an EhCache implementation. The configuration steps are similar to those described in my last blog Caching and Config in that you still need to specify: ...in your Spring config file to switch caching on. You also need to define a bean with an id of cacheManager, only this time you reference Spring’s EhCacheCacheManager class instead of SimpleCacheManager. The example above demonstrates an EhCacheCacheManager configuration. Notice that it references a second bean with an id of 'ehcache'. This is configured as follows: "ehcache" has two properties: configLocation and shared. 'configLocation' is an optional attribute that’s used to specify the location of an ehcache configuration file. In my test code I used the following example file: ...which creates two caches: a default cache and one named “employee”. If this file is missing then the EhCacheManagerFactoryBean simply picks up a default ehcache config file: ehcache-failsafe.xml, which is located in ehcache’s ehcache-core jar file. The other EhCacheManagerFactoryBean attribute is 'shared'. This is supposed to be optional as the documentation states that it defines "whether the EHCache CacheManager should be shared (as a singleton at the VM level) or independent (typically local within the application). Default is 'false', creating an independent instance.” However, if this is set to false then you’ll get the following exception: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.cache.interceptor.CacheInterceptor#0': Cannot resolve reference to bean 'cacheManager' while setting bean property 'cacheManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cacheManager' defined in class path resource [ehcache-example.xml]: Cannot resolve reference to bean 'ehcache' while setting bean property 'cacheManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ehcache' defined in class path resource [ehcache-example.xml]: Invocation of init method failed; nested exception is net.sf.ehcache.CacheException: Another unnamed CacheManager already exists in the same VM. Please provide unique names for each CacheManager in the config or do one of following: 1. Use one of the CacheManager.create() static factory methods to reuse same CacheManager with same name or create one if necessary 2. Shutdown the earlier cacheManager before creating new one with same name. The source of the existing CacheManager is: InputStreamConfigurationSource [stream=java.io.BufferedInputStream@424c414] at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:328) at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:106) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1360) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1118) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:517) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456) ... stack trace shortened for clarity at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cacheManager' defined in class path resource [ehcache-example.xml]: Cannot resolve reference to bean 'ehcache' while setting bean property 'cacheManager'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ehcache' defined in class path resource [ehcache-example.xml]: Invocation of init method failed; nested exception is net.sf.ehcache.CacheException: Another unnamed CacheManager already exists in the same VM. Please provide unique names for each CacheManager in the config or do one of following: 1. Use one of the CacheManager.create() static factory methods to reuse same CacheManager with same name or create one if necessary 2. Shutdown the earlier cacheManager before creating new one with same name. The source of the existing CacheManager is: InputStreamConfigurationSource [stream=java.io.BufferedInputStream@424c414] at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:328) at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:106) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1360) ... stack trace shortened for clarity at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193) at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:322) ... 38 more Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ehcache' defined in class path resource [ehcache-example.xml]: Invocation of init method failed; nested exception is net.sf.ehcache.CacheException: Another unnamed CacheManager already exists in the same VM. Please provide unique names for each CacheManager in the config or do one of following: 1. Use one of the CacheManager.create() static factory methods to reuse same CacheManager with same name or create one if necessary 2. Shutdown the earlier cacheManager before creating new one with same name. The source of the existing CacheManager is: InputStreamConfigurationSource [stream=java.io.BufferedInputStream@424c414] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1455) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456) at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:294) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:225) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:291) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193) at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveReference(BeanDefinitionValueResolver.java:322) ... 48 more Caused by: net.sf.ehcache.CacheException: Another unnamed CacheManager already exists in the same VM. Please provide unique names for each CacheManager in the config or do one of following: 1. Use one of the CacheManager.create() static factory methods to reuse same CacheManager with same name or create one if necessary 2. Shutdown the earlier cacheManager before creating new one with same name. The source of the existing CacheManager is: InputStreamConfigurationSource [stream=java.io.BufferedInputStream@424c414] at net.sf.ehcache.CacheManager.assertNoCacheManagerExistsWithSameName(CacheManager.java:521) at net.sf.ehcache.CacheManager.init(CacheManager.java:371) at net.sf.ehcache.CacheManager.(CacheManager.java:339) at org.springframework.cache.ehcache.EhCacheManagerFactoryBean.afterPropertiesSet(EhCacheManagerFactoryBean.java:104) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1514) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1452) ... 55 more ...when you try to run a bunch of unit tests. I think that this comes down to a simple bug Spring’s the ehcache manager factory as it’s trying to create multiple cache instances using new() rather than using, as the exception states, “one of the CacheManager.create() static factory methods" which allows it to reuse same CacheManager with same name. Hence, my first JUnit test works okay, but all others fail. The offending line of code is: this.cacheManager = (this.shared ? CacheManager.create() : new CacheManager()); My full XML config file is listed below for completeness: In using ehcache, the only other configuration details to consider are the Maven dependencies. These are pretty straight forward as the Guys at Ehcache have combined all the various ehcache jars into one Maven POM module. This POM module can be added to your project's POM file using the XML below: net.sf.ehcache ehcache 2.6.0 pom test Finally, the ehcache Jar files are available from both the Maven Central and Sourceforge repositories: sourceforge http://oss.sonatype.org/content/groups/sourceforge/ true true

October 5, 2012

by Roger Hughes

· 105,636 Views

Record Audio Using webrtc in Chrome and Speech Recognition With Websockets

there are many different web api standards that are turning the web browser into a complete application platform. with websockets we get nice asynchronous communication, various standards allow us access to sensors in laptops and mobile devices and we can even determine how full the battery is. one of the standards i'm really interested in is webrtc. with webrtc we can get real-time audio and video communication between browsers without needing plugins or additional tools. a couple of months ago i wrote about how you can use webrtc to access the webcam and use it for face recognition . at that time, none of the browser allowed you to access the microphone. a couple of months later though, and both the developer version of firefox and developer version of chrome, allow you to access the microphone! so let's see what we can do with this. most of the examples i've seen so far focus on processing the input directly, within the browser, using the web audio api . you get synthesizers, audio visualizations, spectrometers etc. what was missing, however, was a means of recording the audio data and storing it for further processing at the server side. in this article i'll show you just that. i'm going to show you how you can create the following (you might need to enlarge it to read the response from the server): in this screencast you can see the following: a simple html page that access your microphone the speech is recorded and using websockets is sent to a backend the backend combines the audio data and sends it to google's speech to text api the result from this api call is returned to the browser and all this is done without any plugins in the browser! so what's involved to accomplish all this. allowing access to your microphone the first thing you need to do is make sure you've got an up to date version of chrome. i use the dev build, and am currently on this version: since this is still an experimental feature we need to enable this using the chrome flags. make sure the "web audio input" flag is enabled. with this configuration out of the way we can start to access our microphone. access the audio stream from the microphone this is actually very easy: function callback(stream) { var context = new webkitaudiocontext(); var mediastreamsource = context.createmediastreamsource(stream); ... } $(document).ready(function() { navigator.webkitgetusermedia({audio:true}, callback); ... } as you can see i use the webkit prefix functions directly, you could, of course, also use a shim so it is browser independent. what happens in the code above is rather straightforward. we ask, using getusermedia, for access to the microphone. if this is successful our callback gets called with the audio stream as its parameter. in this callback we use the web audio specification to create a mediastreamsource from our microphone. with this mediastreamsource we can do all the nice web audio tricks you can see here . but we don't want that, we want to record the stream and send it to a backend server for further processing. in future versions this will probably be possible directly from the webrtc api, at this time, however, this isn't possible yet. luckily, though, we can use a feature from the web audio api to get access to the raw data. with the javascriptaudionode we can create a custom node, which we can use to access the raw data (which is pcm encoded). before i started my own work on this i searched around a bit and came across the recoder.js project from here: https://github.com/mattdiamond/recorderjs . matt created a recorder that can record the output from web audio nodes, and that's exactly what i needed. all i needed to do now was connect the stream we just created to the recorder library: function callback(stream) { var context = new webkitaudiocontext(); var mediastreamsource = context.createmediastreamsource(stream); rec = new recorder(mediastreamsource); } with this code, we create a recorder from our stream. this recorder provides the following functions: record: start recording from the input stop: stop recording clear: clear the current recording exportwav: export the data as a wav file connect the recorder to the buttons i've created a simple webpage with an output for the text and two buttons to control the recording: the 'record' button starts the recording, and once you hit the 'export' button the recording stops, and is sent to the backend for processing. record button: $('#record').click(function() { rec.record(); ws.send("start"); $("#message").text("click export to stop recording and analyze the input"); // export a wav every second, so we can send it using websockets intervalkey = setinterval(function() { rec.exportwav(function(blob) { rec.clear(); ws.send(blob); }); }, 1000); }); this function (using jquery to connect it to the button) when clicked starts the recording. it also uses a websocket (ws), see further down on how to setup the websocket, to indicate to the backend server to expect a new recording (more on this later). finally when the button is clicked an interval is created that passes the data to the backend, encoded as wav file, every second. we do this to avoid sending too large chunks of data to the backend and improve performance. export button: $('#export').click(function() { // first send the stop command rec.stop(); ws.send("stop"); clearinterval(intervalkey); ws.send("analyze"); $("#message").text(""); }); the export button, bad naming i think when i'm writing this, stops the recording, the interval and informs the backend server that it can send the received data to the google api for further processing. connecting the frontend to the backend to connect the webapplication to the backend server we use websockets. in the previous code fragments you've already seen how they are used. we create them with the following: var ws = new websocket("ws://127.0.0.1:9999"); ws.onopen = function () { console.log("openened connection to websocket"); }; ws.onmessage = function(e) { var jsonresponse = jquery.parsejson(e.data ); console.log(jsonresponse); if (jsonresponse.hypotheses.length > 0) { var bestmatch = jsonresponse.hypotheses[0].utterance; $("#outputtext").text(bestmatch); } } we create a connection, and when we receive a message from the backend we just assume it contains the response to our speech analysis. and that's it for the complete front end of the application. we use getusermedia to access the microphone, use the web audio api to get access to the raw data and communicate with websockets with the backend server. the backend server our backend server needs to do a couple of things. it first needs to combine the incoming chunks to a single audio file, next it needs to convert this to a format google apis expect, which is flac. finally we make a call to the google api and return the response. i've used jetty as the websocket server for this example. if you want to know the details about setting this up, look at the facedetection example. in this article i'll only show the code to process the incoming messages. first step, combine the incoming data the data we receive is encoded as wav (thanks to the recorder.js library we don't have to do this ourselves). in our backend we thus receive sound fragments with a length of one second. we can't just concatenate these together, since wav files have a header that tells how long the fragment is (amongst other things), so we have to combine them, and rewrite the header. lets first look at the code (ugly code, but works good enough for now :) public void onmessage(byte[] data, int offset, int length) { if (currentcommand.equals("start")) { try { // the temporary file that contains our captured audio stream file f = new file("out.wav"); // if the file already exists we append it. if (f.exists()) { log.info("adding received block to existing file."); // two clips are used to concat the data audioinputstream clip1 = audiosystem.getaudioinputstream(f); audioinputstream clip2 = audiosystem.getaudioinputstream(new bytearrayinputstream(data)); // use a sequenceinput to cat them together audioinputstream appendedfiles = new audioinputstream( new sequenceinputstream(clip1, clip2), clip1.getformat(), clip1.getframelength() + clip2.getframelength()); // write out the output to a temporary file audiosystem.write(appendedfiles, audiofileformat.type.wave, new file("out2.wav")); // rename the files and delete the old one file f1 = new file("out.wav"); file f2 = new file("out2.wav"); f1.delete(); f2.renameto(new file("out.wav")); } else { log.info("starting new recording."); fileoutputstream fout = new fileoutputstream("out.wav",true); fout.write(data); fout.close(); } } catch (exception e) { ...} } } this method gets called for each chunk of audio we receive from the browser. what we do here is the following: first, we check whether we have a temp audio file, if not we create it if the file exists we use java's audiosystem to create an audio sequence this sequence is then written to another file the original is deleted and the new one is renamed. we repeat this for each chunk so at this point we have a wav file that keeps on growing for each added chunk. now before we convert this, lets look at the code we use to control the backend. public void onmessage(string data) { if (data.startswith("start")) { // before we start we cleanup anything left over cleanup(); currentcommand = "start"; } else if (data.startswith("stop")) { currentcommand = "stop"; } else if (data.startswith("clear")) { // just remove the current recording cleanup(); } else if (data.startswith("analyze")) { // convert to flac ... // send the request to the google speech to text service ... } } the previous method responded to binary websockets messages. the one shown above responds to string messages. we use this to control, from the browser, what the backend should do. let's look at the analyze command, since that is the interesting one. when this command is issued from the frontend the backend needs to convert the wav file to flac and send it to the google service. convert to flac for the conversion to flac we need an external library since java standard has no support for this. i used the javaflacencoder from here for this. // get an encoder flac_fileencoder flacencoder = new flac_fileencoder(); // point to the input file file inputfile = new file("out.wav"); file outputfile = new file("out2.flac"); // encode the file log.info("start encoding wav file to flac."); flacencoder.encode(inputfile, outputfile); log.info("finished encoding wav file to flac."); easy as that. now we got a flac file that we can send to google for analysis. send to google for analysis a couple of weeks ago i ran across an article that explained how someone analyzed chrome and found out about an undocumented google api you can use for speech to text. if you post a flac file to this url: https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&l... you receive a response like this: { "status": 0, "id": "ae466ffa24a1213f5611f32a17d5a42b-1", "hypotheses": [ { "utterance": "the quick brown fox", "confidence": 0.857393 }] } to do this from java code, using httpclient, you do the following: // send the request to the google speech to text service log.info("sending file to google for speech2text"); httpclient client = new defaulthttpclient(); httppost p = new httppost(url); p.addheader("content-type", "audio/x-flac; rate=44100"); p.setentity(new fileentity(outputfile, "audio/x-flac; rate=44100")); httpresponse response = client.execute(p); f (response.getstatusline().getstatuscode() == 200) { log.info("received valid response, sending back to browser."); string result = new string(ioutils.tobytearray(response.getentity().getcontent())); this.connection.sendmessage(result); } and that are all the steps that are needed.

October 5, 2012

by Jos Dirksen

· 20,079 Views · 1 Like