DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Composite Keys in Cassandra
Introduction A composite key consists of one or more primary key fields. Each field must be of data type supported by underlying data-store. In JPA (Java Persistence API), there are two ways of specifying composite keys: 1. Composite Primary Key: @Entity @IdClass(TimelineId.class) public class Timeline { @Id int userId; @Id long tweetId; //Other non-primary key fields } Class TimelineId { int userId; long tweetId; } 2. Embedded Primary Key: @Entity public class Timeline { @EmbeddedId TimelineId id; //Other non-primary key fields } @Embeddable Class TimelineId { int userId; long tweetId; } Above Timeline entity is inspired from famous twissandra example. Starting 1.1 release, Cassandra supports composite keys. Cassandra Composite Keys in Action Visit this page in order to understand Cassandra Schema in general. In this section I will give you a feel of how composite keys are stored in Cassandra. Let's start Cassandra 1.1.x server and run following commands from Cassandra/bin directory: CQL: ./cqlsh -3 localhost 9160 CREATE KEYSPACE twissandra with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1; use twissandra; CREATE TABLE timeline( user_id varchar, tweet_id varchar, tweet_device varchar, author varchar, body varchar, PRIMARY KEY(user_id,tweet_id,tweet_device)); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't1', 'web', 'Amresh', 'Here is my first tweet'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't2', 'sms', 'Saurabh', 'Howz life Xamry'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't1', 'iPad', 'Kuldeep', 'You der?'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't2', 'mobile', 'Vivek', 'Yep, I suppose'); cqlsh:twissandra> select * from timeline; user_id | tweet_id | author | body ---------+----------+---------+------------------------ xamry | t1 | Amresh | Here is my first tweet xamry | t2 | Saurabh | Howz life Xamry mevivs | t1 | Kuldeep | You der? mevivs | t2 | Vivek | Yep, I suppose cqlsh:twissandra> SELECT * FROM timeline WHERE user_id='xamry'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+---------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet xamry | t2 | sms | Saurabh | Howz life Xamry cqlsh:twissandra> select * from timeline where tweet_id = 't1'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+---------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet mevivs | t1 | iPad | Kuldeep | You der? cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id='t1'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+--------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet cqlsh:twissandra> select * from timeline where user_id = 'xamry' and author='Amresh'; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_device='web'; Bad Request: PRIMARY KEY part tweet_device cannot be restricted (preceding part tweet_id is either not restricted or by a non-EQ relation) cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id = 't1' and tweet_device='web'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+--------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet Cassandra-cli: impadmin@impetus-ubuntu:/usr/local/apache-cassandra-1.1.2/bin$ ./cassandra-cli -h localhost -p 9160 Connected to: "Test Cluster" on localhost/9160 Welcome to Cassandra CLI version 1.1.2 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use twissandra; Authenticated to keyspace: twissandra [default@twissandra] list timeline; Using default limit of 100 Using default column limit of 100 ------------------- RowKey: xamry => (column=t1:web:author, value=Amresh, timestamp=1343729388951000) => (column=t1:web:body, value=Here is my first tweet, timestamp=1343729388951001) => (column=t2:sms:author, value=Saurabh, timestamp=1343729388973000) => (column=t2:sms:body, value=Howz life Xamry, timestamp=1343729388973001) ------------------- RowKey: mevivs => (column=t1:iPad:author, value=Kuldeep, timestamp=1343729388991000) => (column=t1:iPad:body, value=You der?, timestamp=1343729388991001) => (column=t2:mobile:author, value=Vivek, timestamp=1343729389941000) => (column=t2:mobile:body, value=Yep, I suppose, timestamp=1343729389941001) Observations First part of composite key (user_id) is called "Partition Key", rest (tweet_id, tweet_device) are remaining keys. Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (":" as separator) to form column names. Column values remain unchanged. Remaining keys (other than partition keys) are ordered, and it's not allowed to search on any random column, you have to start with the first one and then you can move to the second one and so on. This is evident from "Bad Request" error.
November 14, 2012
by Amresh Singh
· 20,364 Views · 1 Like
article thumbnail
Integration Testing with MongoDB & Spring Data
Integration Testing is an often overlooked area in enterprise development. This is primarily due to the associated complexities in setting up the necessary infrastructure for an integration test. For applications backed by databases, it’s fairly complicated and time-consuming to setup databases for integration tests, and also to clean those up once test is complete (ex. data files, schemas etc.), to ensure repeatability of tests. While there have been many tools (ex. DBUnit) and mechanisms (ex. rollback after test) to assist in this, the inherent complexity and issues have been there always. But if you are working with MongoDB, there’s a cool and easy way to do your unit tests, with almost the simplicity of writing a unit test with mocks. With ‘EmbedMongo’, we can easily setup an embedded MongoDB instance for testing, with in-built clean up support once tests are complete. In this article, we will walkthrough an example where EmbedMongo is used with JUnit for integration testing a Repository Implementation. Here’s the technology stack that we will be using. MongoDB 2.2.0 EmbedMongo 1.26 Spring Data – Mongo 1.0.3 Spring Framework 3.1 The Maven POM for the above setup looks like this. 4.0.0 com.yohanliyanage.blog.mongoit mongo-it 1.0 org.springframework.data spring-data-mongodb 1.0.3.RELEASE compile junit junit 4.10 test org.springframework spring-context 3.1.3.RELEASE compile de.flapdoodle.embed de.flapdoodle.embed.mongo 1.26 test Or if you prefer Gradle (by the way, Gradle is an awesome build tool which you should check out if you haven’t done so already). apply plugin: 'java' apply plugin: 'eclipse' sourceCompatibility = 1.6 group = "com.yohanliyanage.blog.mongoit" version = '1.0' ext.springVersion = '3.1.3.RELEASE' ext.junitVersion = '4.10' ext.springMongoVersion = '1.0.3.RELEASE' ext.embedMongoVersion = '1.26' repositories { mavenCentral() maven { url 'http://repo.springsource.org/release' } } dependencies { compile "org.springframework:spring-context:${springVersion}" compile "org.springframework.data:spring-data-mongodb:${springMongoVersion}" testCompile "junit:junit:${junitVersion}" testCompile "de.flapdoodle.embed:de.flapdoodle.embed.mongo:${embedMongoVersion}" } To begin with, here’s the document that we will be storing in Mongo. package com.yohanliyanage.blog.mongoit.model; import org.springframework.data.mongodb.core.index.Indexed; import org.springframework.data.mongodb.core.mapping.Document; /** * A Sample Document. * * @author Yohan Liyanage * */ @Document public class Sample { @Indexed private String key; private String value; public Sample(String key, String value) { super(); this.key = key; this.value = value; } public String getKey() { return key; } public void setKey(String key) { this.key = key; } public String getValue() { return value; } public void setValue(String value) { this.value = value; } } To assist with storing and managing this document, let’s write up a simple Repository implementation. The Repository Interface is as follows. package com.yohanliyanage.blog.mongoit.repository; import java.util.List; import com.yohanliyanage.blog.mongoit.model.Sample; /** * Sample Repository API. * * @author Yohan Liyanage * */ public interface SampleRepository { /** * Persists the given Sample. * @param sample */ void save(Sample sample); /** * Returns the list of samples with given key. * @param sample * @return */ List findByKey(String key); } And the implementation… package com.yohanliyanage.blog.mongoit.repository; import java.util.List; import static org.springframework.data.mongodb.core.query.Query.query; import static org.springframework.data.mongodb.core.query.Criteria.*; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.mongodb.core.MongoOperations; import org.springframework.stereotype.Repository; import com.yohanliyanage.blog.mongoit.model.Sample; /** * Sample Repository MongoDB Implementation. * * @author Yohan Liyanage * */ @Repository public class SampleRepositoryMongoImpl implements SampleRepository { @Autowired private MongoOperations mongoOps; /** * {@inheritDoc} */ public void save(Sample sample) { mongoOps.save(sample); } /** * {@inheritDoc} */ public List findByKey(String key) { return mongoOps.find(query(where("key").is(key)), Sample.class); } /** * Sets the MongoOps implementation. * * @param mongoOps the mongoOps to set */ public void setMongoOps(MongoOperations mongoOps) { this.mongoOps = mongoOps; } } To wire this up, we need a Spring Bean Configuration. Note that we do not need this for testing. But for the sake of completion, I have included this. The XML configuration is as follows. And now we are ready to write the Integration Test for our Repository Implementation using Embed Mongo. Ideally, the integration tests should be placed in a separate source directory, just like we place our unit tests (ex. src/test/java => src/integration-test/java). However, neither Maven nor Gradle supports this out of the box (yet – v1.2. For Gradle, there’s an on going discussion for this facility). Nevertheless, both Maven and Gradle are flexible, so you can configure the POM / build.gradle to handle this. However, to keep this discussion simple and focused, I will be placing the Integration Tests in the ‘src/test/java’, but I do not recommend this for a real application. Let’s start writing up the Integration Test. First, let’s begin with a simple JUnit based Test for the methods. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.fail; import org.junit.After; import org.junit.Before; import org.junit.Test; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private SampleRepositoryMongoImpl repoImpl; @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); } @After public void tearDown() throws Exception { } @Test public void testSave() { fail("Not yet implemented"); } @Test public void testFindByKey() { fail("Not yet implemented"); } } When this JUnit Test Case initializes, we need to fire up EmbedMongo to start an embedded Mongo server. Also, when the Test Case ends, we need to cleanup the DB. The below code snippet does this. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.fail; import java.io.IOException; import org.junit.*; import org.springframework.data.mongodb.core.MongoTemplate; import com.mongodb.Mongo; import com.yohanliyanage.blog.mongoit.model.Sample; import de.flapdoodle.embed.mongo.MongodExecutable; import de.flapdoodle.embed.mongo.MongodProcess; import de.flapdoodle.embed.mongo.MongodStarter; import de.flapdoodle.embed.mongo.config.MongodConfig; import de.flapdoodle.embed.mongo.config.RuntimeConfig; import de.flapdoodle.embed.mongo.distribution.Version; import de.flapdoodle.embed.process.extract.UserTempNaming; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private static final String LOCALHOST = "127.0.0.1"; private static final String DB_NAME = "itest"; private static final int MONGO_TEST_PORT = 27028; private SampleRepositoryMongoImpl repoImpl; private static MongodProcess mongoProcess; private static Mongo mongo; private MongoTemplate template; @BeforeClass public static void initializeDB() throws IOException { RuntimeConfig config = new RuntimeConfig(); config.setExecutableNaming(new UserTempNaming()); MongodStarter starter = MongodStarter.getInstance(config); MongodExecutable mongoExecutable = starter.prepare(new MongodConfig(Version.V2_2_0, MONGO_TEST_PORT, false)); mongoProcess = mongoExecutable.start(); mongo = new Mongo(LOCALHOST, MONGO_TEST_PORT); mongo.getDB(DB_NAME); } @AfterClass public static void shutdownDB() throws InterruptedException { mongo.close(); mongoProcess.stop(); } @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); template = new MongoTemplate(mongo, DB_NAME); repoImpl.setMongoOps(template); } @After public void tearDown() throws Exception { template.dropCollection(Sample.class); } @Test public void testSave() { fail("Not yet implemented"); } @Test public void testFindByKey() { fail("Not yet implemented"); } } The initializeDB() method is annotated with @BeforeClass to start this before test case beings. This method fires up an embedded MongoDB instance which is bound to the given port, and exposes a Mongo object which is set to use the given database. Internally, EmbedMongo creates the necessary data files in temporary directories. When this method executes for the first time, EmbedMongo will download the necessary Mongo implementation (denoted by Version.V2_2_0 in above code) if it does not exist already. This is a nice facility specially when it comes to Continuous Integration servers. You don’t have to manually setup Mongo in each of the CI servers. That’s one less external dependency for the tests. In the shutdownDB() method, which is annotated with @AfterClass, we stop the EmbedMongo process. This triggers the necessary cleanups in EmbedMongo to remove the temporary data files, restoring the state to where it was before Test Case was executed. We have now updated setUp() method to build a Spring MongoTemplate object which is backed by the Mongo instance exposed by EmbedMongo, and to setup our RepoImpl with that template. The tearDown() method is updated to drop the ‘Sample’ collection to ensure that each of our test methods start with a clean state. Now it’s just a matter of writing the actual test methods. Let’s start with the save method test. @Test public void testSave() { Sample sample = new Sample("TEST", "2"); repoImpl.save(sample); int samplesInCollection = template.findAll(Sample.class).size(); assertEquals("Only 1 Sample should exist collection, but there are " + samplesInCollection, 1, samplesInCollection); } We create a Sample object, pass it to repoImpl.save(), and assert to make sure that there’s only one Sample in the Sample collection. Simple, straight-forward stuff. And here’s the test method for findByKey method. @Test public void testFindByKey() { // Setup Test Data List samples = Arrays.asList( new Sample("TEST", "1"), new Sample("TEST", "25"), new Sample("TEST2", "66"), new Sample("TEST2", "99")); for (Sample sample : samples) { template.save(sample); } // Execute Test List matches = repoImpl.findByKey("TEST"); // Note: Since our test data (populateDummies) have only 2 // records with key "TEST", this should be 2 assertEquals("Expected only two samples with key TEST, but there are " + matches.size(), 2, matches.size()); } Initially, we setup the data by adding a set of Sample objects into the data store. It’s important that we directly use template.save() here, because repoImpl.save() is a method under-test. We are not testing that here, so we use the underlying “trusted” template.save() during data setup. This is a basic concept in Unit / Integration testing. Then we execute the method under test ‘findByKey’, and assert to ensure that only two Samples matched our query. Likewise, we can continue to write more tests for each of the repository methods, including negative tests. And here’s the final Integration Test file. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.*; import java.io.IOException; import java.util.Arrays; import java.util.List; import org.junit.*; import org.springframework.data.mongodb.core.MongoTemplate; import com.mongodb.Mongo; import com.yohanliyanage.blog.mongoit.model.Sample; import de.flapdoodle.embed.mongo.MongodExecutable; import de.flapdoodle.embed.mongo.MongodProcess; import de.flapdoodle.embed.mongo.MongodStarter; import de.flapdoodle.embed.mongo.config.MongodConfig; import de.flapdoodle.embed.mongo.config.RuntimeConfig; import de.flapdoodle.embed.mongo.distribution.Version; import de.flapdoodle.embed.process.extract.UserTempNaming; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private static final String LOCALHOST = "127.0.0.1"; private static final String DB_NAME = "itest"; private static final int MONGO_TEST_PORT = 27028; private SampleRepositoryMongoImpl repoImpl; private static MongodProcess mongoProcess; private static Mongo mongo; private MongoTemplate template; @BeforeClass public static void initializeDB() throws IOException { RuntimeConfig config = new RuntimeConfig(); config.setExecutableNaming(new UserTempNaming()); MongodStarter starter = MongodStarter.getInstance(config); MongodExecutable mongoExecutable = starter.prepare(new MongodConfig(Version.V2_2_0, MONGO_TEST_PORT, false)); mongoProcess = mongoExecutable.start(); mongo = new Mongo(LOCALHOST, MONGO_TEST_PORT); mongo.getDB(DB_NAME); } @AfterClass public static void shutdownDB() throws InterruptedException { mongo.close(); mongoProcess.stop(); } @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); template = new MongoTemplate(mongo, DB_NAME); repoImpl.setMongoOps(template); } @After public void tearDown() throws Exception { template.dropCollection(Sample.class); } @Test public void testSave() { Sample sample = new Sample("TEST", "2"); repoImpl.save(sample); int samplesInCollection = template.findAll(Sample.class).size(); assertEquals("Only 1 Sample should exist in collection, but there are " + samplesInCollection, 1, samplesInCollection); } @Test public void testFindByKey() { // Setup Test Data List samples = Arrays.asList( new Sample("TEST", "1"), new Sample("TEST", "25"), new Sample("TEST2", "66"), new Sample("TEST2", "99")); for (Sample sample : samples) { template.save(sample); } // Execute Test List matches = repoImpl.findByKey("TEST"); // Note: Since our test data (populateDummies) have only 2 // records with key "TEST", this should be 2 assertEquals("Expected only two samples with key TEST, but there are " + matches.size(), 2, matches.size()); } } On a side note, one of the key concerns with Integration Tests is the execution time. We all want to keep our test execution times as low as possible, ideally a couple of seconds to make sure that we can run all the tests during CI, with minimal build and verification times. However, since Integration Tests rely on underlying infrastructure, usually Integration Tests take time to run. But with EmbedMongo, this is not the case. In my machine, above test suite runs in 1.8 seconds, and each test method takes only .166 seconds max. See the screenshot below. I have uploaded the code for above project into GitHub. You can download / clone it from here: https://github.com/yohanliyanage/blog-mongo-integration-tests. For more information regarding EmbedMongo, refer to their site at GitHub https://github.com/flapdoodle-oss/embedmongo.flapdoodle.de.
November 11, 2012
by Yohan Liyanage
· 26,478 Views
article thumbnail
Applying a Namespace During JAXB Unmarshal
For some an XML schema is a strict set of rules for how the XML document must be structured. But for others it is a general guideline to indicate what the XML should look like. This means that sometimes people want to accept input that doesn't conform to the XML schema for some reason. In this example I will demonstrate how this can be done by leveraging a SAX XMLFilter. Java Model Below is the Java model that will be used for this example. Customer package blog.namespace.sax; import javax.xml.bind.annotation.XmlRootElement; @XmlRootElement public class Customer { private String name; public String getName() { return name; } public void setName(String name) { this.name = name; } } package-info We will use the package level @XmlSchema annotation to specify the namespace qualification for our model. @XmlSchema( namespace="http://www.example.com/customer", elementFormDefault=XmlNsForm.QUALIFIED) package blog.namespace.sax; import javax.xml.bind.annotation.*; XML Input (input.xml) Even though our metadata specified that all the elements should be qualified with a namespace (http://www.example.com/customer) our input document is not namespace qualified. An XMLFilter will be used to add the namespace during the unmarshal operation. Jane Doe XMLFilter (NamespaceFilter) The easiest way to create an XMLFilter is to extend XMLFilterImpl. For our use case we will override the startElement and endElement methods. In each of these methods we will call the corresponding super method passing in the default namespace as the URI parameter. package blog.namespace.sax; import org.xml.sax.*; import org.xml.sax.helpers.XMLFilterImpl; public class NamespaceFilter extends XMLFilterImpl { private static final String NAMESPACE = "http://www.example.com/customer"; @Override public void endElement(String uri, String localName, String qName) throws SAXException { super.endElement(NAMESPACE, localName, qName); } @Override public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { super.startElement(NAMESPACE, localName, qName, atts); } } Demo In the demo code below we will do a SAX parse of the XML document. The XMLReader will be wrapped in our XMLFilter. We will leverage JAXB's UnmarshallerHandler as the ContentHandler. Once the parse has been done we can ask the UnmarshallerHandler for the resulting Customer object. package blog.namespace.sax; import javax.xml.bind.*; import javax.xml.parsers.*; import org.xml.sax.*; public class Demo { public static void main(String[] args) throws Exception { // Create the JAXBContext JAXBContext jc = JAXBContext.newInstance(Customer.class); // Create the XMLFilter XMLFilter filter = new NamespaceFilter(); // Set the parent XMLReader on the XMLFilter SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); XMLReader xr = sp.getXMLReader(); filter.setParent(xr); // Set UnmarshallerHandler as ContentHandler on XMLFilter Unmarshaller unmarshaller = jc.createUnmarshaller(); UnmarshallerHandler unmarshallerHandler = unmarshaller .getUnmarshallerHandler(); filter.setContentHandler(unmarshallerHandler); // Parse the XML InputSource xml = new InputSource("src/blog/namespace/sax/input.xml"); filter.parse(xml); Customer customer = (Customer) unmarshallerHandler.getResult(); // Marshal the Customer object back to XML Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); marshaller.marshal(customer, System.out); } } Output Below is the output from running the demo code. Note how the output contains the namespace qualification based on the metadata. Jane Doe Further Reading If you enjoyed this post then you may also be interested in: JAXB & Namespaces Preventing Entity Expansion Attacks in JAXB
November 10, 2012
by Blaise Doughan
· 64,594 Views · 1 Like
article thumbnail
20 Database Design Best Practices
Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID ...). Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names. Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you'll write “Student Course”. StudentCourse is much better). Don’t use unnecessary prefixes or suffixes for table names (i.e. use School instead of TblSchool, SchoolTable etc.). Keep passwords as encrypted for security. Decrypt them in application when required. Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing ...). Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems. Use bit fields for boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”. Provide authentication for database access. Don’t give admin role to each user. Avoid “select *” queries until it is really needed. Use "select [required_columns_list]" for better performance. Use an ORM (object relational mapping) framework (i.e. hibernate, iBatis ...) if application code is big enough. Performance issues of ORM frameworks can be handled by detailed configuration parameters. Partition big and unused/rarely used tables/table parts to different physical storages for better query performance. For big, sensitive and mission critic database systems, use disaster recovery and security services like failover clustering, auto backups, replication etc. Use constraints (foreign key, check, not null ...) for data integrity. Don’t give whole control to application code. Lack of database documentation is evil. Document your database design with ER schemas and instructions. Also write comment lines for your triggers, stored procedures and other scripts. Use indexes for frequently used queries on big tables. Analyser tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better. Database server and the web server must be placed in different machines. This will provide more security (attackers can’t access data directly) and server CPU and memory performance will be better because of reduced request number and process usage. Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables. Normalization must be used as required, to optimize the performance. Under-normalization will cause excessive repetition of data, over-normalization will cause excessive joins across too many tables. Both of them will get worse performance. Spend time for database modeling and design as much as required. Otherwise saved(!) design time will cause (saved(!) design time) * 10/100/1000 maintenance and re-design time.
November 4, 2012
by Cagdas Basaraner
· 257,202 Views · 13 Likes
article thumbnail
What's New in JAX-RS 2.0
JAX-RS is a framework designed to help you write RESTful applications both on the client and server side. With Java EE 7 is being slated to be released next year, 2013, JAX-RS is one of the specifications getting a deep revision. JAX-RS 2.0 is currently in the Public Draft phase at the JCP, so now is a good time to discuss some of the key new features so that you can start playing with your favorite JAX-RS implementation and give some valuable feedback the expert group needs to finalize the specification. Key features in 2.0: Client API Server-side Asynchronous HTTP Filters and Interceptors This article gives a brief overview of each of these features. Client Framework One huge thing missing from JAX-RS 1.0 was a client API. While it was easy to write a portable JAX-RS service, each JAX-RS implementation defined their own proprietary API. JAX-RS 2.0 fills in this gap with a fluent, low-level, request building API. Here's a simple example: Client client = ClientFactory.newClient(); WebTarget target = client.target("http://example.com/shop"); Form form = new Form().param("customer", "Bill") .param("product", "IPhone 5") .param("CC", "4444 4444 4444 4444"); Response response = target.request().post(Entity.form(form)); assert response.getStatus() == 200; Order order = response.readEntity(Order.class); Let's dissect this example code. The Client interface manages and configures HTTP connections. It is also a factory for WebTargets. WebTargets represent a specific URI. You build and execute requests from a WebTarget instance. Response is the same class defined in JAX-RS 1.0, but it has been expanded to support the client side. The example client, allocates an instance of a Client, creates a WebTarget, then posts form data to the URI represented by the WebTarget. The Response object is tested to see if the status is 200, then the application class Order is extracted from the response using the readEntity() method. The MessageBodyReader and MessageBodyWriter content handler interfaces defined in JAX-RS 1.0 are reused on the client side. When the readEntity() method is invoked in the example code, a MessageBodyReader is matched with the response's Content-Type and the Java type (Order) passed in as a parameter to the readEntity() method. If you are optimistic that your service will return a successful response, there are some nice helper methods that allow you to get the Java object directly without having to interact with and write additional code around a Response object. Customer cust = client.target("http://example.com/customers") .queryParam("name", "Bill Burke") .request().get(Customer.class); In this example we target a URI and specify an additional query parameter we want appended to the request URI. The get() method has an additional parameter of the Java type we want to unmarshal the HTTP response to. If the HTTP response code is something other than 200, OK, JAX-RS picks an exception that maps to the error code from a defined exception hierarchy in the JAX-RS client API. Asynchronous Client API The JAX-RS 2.0 client framework also supports an asynchronous API and a callback API. This allows you to execute HTTP requests in the background and either poll for a response or receive a callback when the request finishes. Future future = client.target("http://e.com/customers") .queryParam("name", "Bill Burke") .request() .async() .get(Customer.class); try { Customer cust = future.get(1, TimeUnit.MINUTES); } catch (TimeoutException ex) { System.err.println("timeout"); } The Future interface is a JDK interface that has been around since JDK 5.0. The above code executes an HTTP request in the background, then blocks for one minute while it waits for a response. You could also use the Future to poll to see if the request is finished or not. Here's an example of using a callback interface. InvocationCallback callback = new InvocationCallback { public void completed(Response res) { System.out.println("Request success!"); } public void failed(ClientException e) { System.out.println("Request failed!");n } }; client.target("http://example.com/customers") .queryParam("name", "Bill Burke") .request() .async() .get(callback); In this example, we instantiate an implementation of the InvocationCallback interface. We invoke a GET request in the background and register this callback instance with the request. The callback interface will output a message on whether the request executed successfully or not. Those are the main features of the client API. I suggest browsing the specification and Javadoc to learn more. Server-Side Asynchronous HTTP On the server-side, JAX-RS 2.0 provides support for asynchronous HTTP. Asynchronous HTTP is generally used to implement long-polling interfaces or server-side push. JAX-RS 2.0 support for Asynchronous HTTP is annotation driven and is very analogous with how the Servlet 3.0 specification handles asynchronous HTTP support through the AsyncContext interface. Here's an example of writing a crude chat program. @Path("/listener") public class ChatListener { List listeners = ...some global list...; @GET public void listen(@Suspended AsyncResponse res) { list.add(res); } } For those of you who have used the Servlet 3.0 asynchronous interfaces, the above code may look familiar to you. An AsyncResponse is injected into the JAX-RS resource method via the @Suspended annotation. This act disassociates the calling thread to the HTTP socket connection. The example code takes the AsyncResponse instance and adds it to a application-defined global List object. When the JAX-RS method returns, the JAX-RS runtime will do no response processing. A different thread will handle response processing. @Path("/speaker") public class ChatSpeaker { List listeners = ...some global list...; @POST @Consumes("text/plain") public void speak(String speech) { for (AsyncResponse res : listeners) { res.resume(Response.ok(speech, "text/plain").build());n } } } When a client posts text to this ChatSpeaker interface, the speak() method loops through the list of registered AsyncResponses and sends back an 200, OK response with the posted text. Those are the main features of the asynchronous HTTP interface, check out the Javadocs for a deeper detail. Filters and Entity Interceptors JAX-RS 2.0 has an interceptor API that allows framework developers to intercept request and response processing. This powerful API allows framework developers to transparently add orthogonal concerns like authentication, caching, and encoding without polluting application code. Prior to JAX-RS 2.0 many JAX-RS providers like Resteasy, Jersey, and Apache CXF wrote their own proprietary interceptor frameworks to deliver various features in their implementations. So, while JAX-RS 2.0 filters and interceptors can be a bit complex to understand please note that it is very use-case driven based on real-world examples. I wrote a blog on JAX-RS interceptor requirements awhile back to help guide the JAX-RS 2.0 JSR Expert Group on defining such an API. The blog is a bit dated, but hopefully you can get the gist of why we did what we did. JAX-RS 2.0 has two different concepts for interceptions: Filters and Entity Interceptors. Filters are mainly used to modify or process incoming and outgoing request headers or response headers. They execute before and after request and response processing. Entity Interceptors are concerned with marshaling and unmarshalling of HTTP message bodies. They wrap around the execution of MessageBodyReader and MessageBodyWriter instances. Server Side Filters On the server-side you have two different types of filters. ContainerRequestFilters run before your JAX-RS resource method is invoked. ContainerResponseFilters run after your JAX-RS resource method is invoked. As an added caveat, ContainerRequestFilters come in two flavors: pre-match and post-matching. Pre-matching ContainerRequestFilters are designated with the @PreMatching annotation and will execute before the JAX-RS resource method is matched with the incoming HTTP request. Pre-matching filters often are used to modify request attributes to change how it matches to a specific resource. For example, some firewalls do not allow PUT and/or DELETE invocations. To circumvent this limitation many applications tunnel the HTTP method through the HTTP header X-Http-Method-Override. A pre-matching ContainerRequestFilter could implement this behavior. @Provider public class HttpOverride implements ContainerRequestFilter { public void filter(ContainerRequestContext ctx) { String method = ctx.getHeaderString("X-Http-Method-Override"); if (method != null) ctx.setMethod(method); } } Post matching ContainerRequestFilters execute after the Java resource method has been matched. These filters can implement a range of features for example, annotation driven security protocols. After the resource class method is executed, JAX-RS will run all ContainerResponseFilters. These filters allow you to modify the outgoing response before it is marshalled and sent to the client. One example here is a filter that automatically sets a Cache-Control header. @Provider public class CacheControlFilter implements ContainerResponseFilter { public void filter(ContainerRequestContext req, ContainerResponseContext res) { if (req.getMethod().equals("GET")) { req.getHeaders().add("Cache-Control", cacheControl); } } } Client Side Filters On the client side you also have two types of filters: ClientRequestFilter and ClientResponseFilter. ClientRequestFilters run before your HTTP request is sent over the wire to the server. ClientResponseFilters run after a response is received from the server, but before the response body is unmarshalled. A good example of client request and response filters working together is a client-side cache that supports conditional GETs. The ClientRequestFilter would be responsible for setting the If-None-Match or If-Modified-Since headers if the requested URI is already cached. Here's what that code might look like. @Provider public class ConditionalGetFilter implements ClientRequestFilter { public void filter(ClientRequestContext req) { if (req.getMethod().equals("GET")) { CacheEntry entry = cache.getEntry(req.getURI()); if (entry != null) { req.getHeaders().putSngle("If-Modified-Since", entry.getLastModified()); } } } } The ClientResponseFilter would be responsible for either buffering and caching the response, or, if a 302, Not Modified response was sent back, to edit the Response object to change its status to 200, set the appropriate headers and buffer to the currently cached entry. This code would be a bit more complicated, so for brevity, we're not going to illustrate it within this article. Reader and Writer Interceptors While filters modify request or response headers, interceptors deal with message bodies. Interceptors are executed in the same call stack as their corresponding reader or writer. ReaderInterceptors wrap around the execution of MessageBodyReaders. WriterInterceptors wrap around the execution of MessageBodyWriters. They can be used to implement a specific content-encoding. They can be used to generate digital signatures or to post or pre-process a Java object model before or after it is marshalled. Here's an example of a GZIP encoding WriterInterceptor. @Provider public class GZIPEndoer implements WriterInterceptor { public void aroundWriteTo(WriterInterceptorContext ctx) throws IOException, WebApplicationException { GZIPOutputStream os = new GZIPOutputStream(ctx.getOutputStream()); try { ctx.setOutputStream(os); return ctx.proceed(); } finally { os.finish(); } } } Resource Method Filters and Interceptors Sometimes you want a filter or interceptor to only run for a specific resource method. You can do this in two different ways: register an implementation of DynamicFeature or use the @NameBinding annotation. The DynamicFeature interface is executed at deployment time for each resource method. You just use the Configurable interface to register the filters and interceptors you want for the specific resource method. @Provider public class ServerCachingFeature implements DynamicFeature { public void configure(ResourceInfo resourceInfo, Configurable configurable) { if (resourceInfo.getMethod().isAnnotationPresent(GET.class)) { configurable.register(ServerCacheFilter.class); } } } On the other hande, @NameBinding works a lot like CDI interceptors. You annotate a custom annotation with @NameBinding and then apply that custom annotation to your filter and resource method @NameBinding public @interface DoIt {} @DoIt public class MyFilter implements ContainerRequestFilter {...} @Path public class MyResource { @GET @DoIt public String get() {...} Wrapping Up Well, those are the main features of JAX-RS 2.0. There's also a bunch of minor features here and there, but youll have to explore them yourselves. If you want to testdrive JAX-RS 2.0 (and hopefully also give feedback to the expert group), Red Hat's Resteasy 3.0 and Oracle's Jersey project have implementations you can download and use. Useful Links Below are some useful links. I've also included links to some features in Resteasy that make use of filters and interceptors. This code might give you a more in-depth look into what you can do with this new JAX-RS 2.0 feature. JAX-RS 2.0 Public Draft Specification Resteasy 3.0 Download Jersey Resteasy 3.0 client cache implementation code (to see how filters interceptors work on client side) Doseta digital signature headers (good use case or interceptors) File suffix content negotiation implementation (server-side filter example) Other server-side examples (cache-control annotations, gzip encoding, role-based security)
November 1, 2012
by Bill Burke
· 94,206 Views · 3 Likes
article thumbnail
Exporting and Importing VM Settings with Azure Command-Line Tools
We've talked previously about the Windows Azure command-line tools, and have used them in a few posts such as Brian's Migrating Drupal to a Windows Azure VM. While the tools are generally useful for tons of stuff, one of the things that's been painful to do with the command-line is export the settings for a VM, and then recreate the VM from those settings. You might be wondering why you'd want to export a VM and then recreate it. For me, cost is the first thing that comes to mind. It costs more to keep a VM running than it does to just keep the disk in storage. So if I had something in a VM that I'm only using a few hours a day, I'd delete the VM when I'm not using it and recreate it when I need it again. Another potential reason is that you want to create a copy of the disk so that you can create a duplicate virtual machine. The export process used to be pretty arcane stuff; using the azure vm show command with a --json parameter and piping the output to file. Then hacking the .json file to fix it up so it could be used with the azure vm create-from command. It was bad. It was so bad, the developers added a new export command to create the .json file for you. Here's the basic process: Create a VM VM creation has been covered multiple ways already; you're either going to use the portal or command line tools, and you're either going to select an image from the library or upload a VHD. In my case, I used the following command: azure vm create larryubuntu CANONICAL__Canonical-Ubuntu-12-04-amd64-server-20120528.1.3-en-us-30GB.vhd larry NotaRe This command creates a new VM in the East US data center, enables SSH on port 22 and then stores a disk image for this VM in a blob. You can see the new disk image in blob storage by running: azure vm disk list The results should return something like: info: Executing command vm disk list + Fetching disk images data: Name OS data: ---------------------------------------- ------- data: larryubuntu-larryubuntu-0-20121019170709 Linux info: vm disk list command OK That's the actual disk image that is mounted by the VM. Export and Delete the VM Alright, I've done my work and it's the weekend. I need to export the VM settings so I can recreate it on Monday, then delete the VM so I won't get charged for the next 48 hours of not working. To export the settings for the VM, I use the following command: azure vm export larryubuntu c:\stuff\vminfo.json This tells Windows Azure to find the VM named larryubuntu and export its settings to c:\stuff\vminfo.json. The .json file will contain something like this: { "RoleName":"larryubuntu", "RoleType":"PersistentVMRole", "ConfigurationSets": [ { "ConfigurationSetType":"NetworkConfiguration", "InputEndpoints": [ { "LocalPort":"22", "Name":"ssh", "Port":"22", "Protocol":"tcp", "Vip":"168.62.177.227" } ], "SubnetNames":[] } ], "DataVirtualHardDisks":[], "OSVirtualHardDisk": { "HostCaching":"ReadWrite", "DiskName":"larryubuntu-larryubuntu-0-20121024155441", "OS":"Linux" }, "RoleSize":"Small" } If you're like me, you'll immediately start thinking "Hrmmm, I wonder if I can mess around with things like RoleSize." And yes, you can. If you wanted to bump this up to medium, you'd just change that parameter to medium. If you want to play around more with the various settings, it looks like the schema is maintained at https://github.com/WindowsAzure/azure-sdk-for-node/blob/master/lib/services/serviceManagement/models/roleschema.json. Once I've got the file, I can safely delete the VM by using the following command. azure vm delete larryubuntu It spins a bit and then no more VM. Recreate the VM Ugh, Monday. Time to go back to work, and I need my VM back up and running. So I run the following command: azure vm create-from larryubuntu c:\stuff\vminfo.json --location "East US" It takes only a minute or two to spin up the VM and it's ready for work. That's it - fast, simple, and far easier than the old process of generating the .json settings file. Note that I haven't played around much with the various settings described in the schema for the json file that I linked above. If you find anything useful or interesting that can be accomplished by hacking around with the .json, leave a comment about it.
October 29, 2012
by Larry Franks
· 6,429 Views
article thumbnail
Exploring the HTML5 Web Audio: Visualizing Sound
If you've read some of my other articles on this blog you probably know I'm a fan of HTML5. With HTML5 we get all this interesting functionality, directly in the browser, in a way that, eventually, is standard across browsers. One of the new HTML5 APIs that is slowly moving through the standardization process is the Web Audio API. With this API, currently only supported in Chrome, we get access to all kinds of interesting audio components you can use to create, modify and visualize sounds (such as the following spectrogram). So why do I start with visualizations? It looks nice, that's one reason, but not the important one. This API provides a number of more complex components, whose behavior is much easier to explain when you can see what happens. With a filter you can instantly see whether some frequencies are filtered, instead of trying to listen to the resulting audio for thse changes. There are many interesting examples that use this API. The problem is, though, that getting started with this API and with digital signal processing (DSP) usually isn't explained. In this article I'll walk you through a couple of steps that shows how to do the following: Create a signal volume meter Visualize the frequencies using a spectrum analyzer And show a time based spectrogram We start with the basic setup that we can use as the basis for the components we'll create. Setting up the basic If we want to experiment with sound, we need some sound source. We could use the microphone (as we'll do later in this series), but to keep it simple, for now we'll just use an mp3 as our input. To get this working using web audio we have to take the following steps: Load the data Read it in a buffer node and play the sound Load the data With the web audio we can use different types of audio sources. We've got a MediaElementAudioSourceNode that can be used to use the audio provided by a media element. There's also a MediaStreamAudioSourceNode. With this audio source node we can use the microphone as input (see my previous article on sound recognition). Finally there is the AudioBufferSourceNode. With this node we can load the data from an existing audio file (e.g mp3) and use that as input. For this example we'll use this last approach. // create the audio context (chrome only for now) var context = new webkitAudioContext(); var audioBuffer; var sourceNode; // load the sound setupAudioNodes(); loadSound("wagner-short.ogg"); function setupAudioNodes() { // create a buffer source node sourceNode = context.createBufferSource(); // and connect to destination sourceNode.connect(context.destination); } // load the specified sound function loadSound(url) { var request = new XMLHttpRequest(); request.open('GET', url, true); request.responseType = 'arraybuffer'; // When loaded decode the data request.onload = function() { // decode the data context.decodeAudioData(request.response, function(buffer) { // when the audio is decoded play the sound playSound(buffer); }, onError); } request.send(); } function playSound(buffer) { sourceNode.buffer = buffer; sourceNode.noteOn(0); } // log if an error occurs function onError(e) { console.log(e); } In this example you can see a couple of functions. The setupAudioNodes function creates a BufferSource audio node and connects it to the destination. The loadSound function shows how you can load an audio file. The buffer which is passed into the playSound function contains decoded audio that can be used by the web audio API. In this example I use an .ogg file, for a complete overview of the formats supported look at: https://sites.google.com/a/chromium.org/dev/audio-video Play the sound To play this audio file, all we have to do is turn the source node on, this is done in the playSound function: function playSound(buffer) { sourceNode.buffer = buffer; sourceNode.noteOn(0); } You can test this out at the following page: Example 1: Loading and playing a sound with Web Audio API. When you open that page, you'll hear some music. Nothing to spectacular for now, but nevertheless an easy way to load audio that'll use for the rest of this article. The first item on our list was the volume meter. Create a volume meter One of the basic scenario's, and often one of the first steps someone new to this API tries to create, is a simple signal volume meter (or an UV meter). I expected this to be a standard component in this API, where I could just read off the signal strength as a property. But, no such node exists. But not to worry, with the components that are available, it's pretty easy (not straightforward, but easy nevertheless) to get an indication of the signal strength of your audio file. Int this section we'll create the following simple volume meter: As you can see this is a simple volume meter where we measure the signal strength for the left and the right audio channel. This is drawn on the canvas, but you could have also used divs or svg to visualize this. Lets start with a single volume meter, instead of one for each channel. For this we need to do the following: Create an analyzer node: With this node we get realtime information about the data that is processed. This data we use to determine the signal strength Create a javascript node: We use this node as a timer to update the volume meters with new information Connect everything together Analyser node With the analyser node we can perform real-time frequency and time domain analysis. From the specification: a node which is able to provide real-time frequency and time-domain analysis information. The audio stream will be passed un-processed from input to output. I won't go into the mathematical details behind this node, since there are many articles out there that explain how this works (a good one is the chapter on fourier transformation from here). What you should now about this node is that it splits up the signal in frequency buckets and we get the amplitude (the signal strenght) for each set of frequencies (the bucket). The best way to understand this, is to skip a bit ahead in this article and look at the frequency distribution we'll create later on. This image plots the result from the analyser node. The frequencies increase from left to right, and the height of the bar shows the strength of that specific frequency bucket. More on this later on in the article. For now we don't want to see the strength of the separate frequency buckets, but the strength of the total signal. For this we'll just add all the strenghts from each bucket and divide it by the number of buckets. First we need to create an analyzer node // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; This creates an analyzer node whose result will be used to create the volume meter. We use a smoothingTimeConstant to make the meter less jittery. With this variable we use input from a longer time period to calculate the amplitudes, this results in a more smooth meter. The fftSize determine how many buckets we get containing frequency information. If we have a fftSize of 1024 we get 512 buckets (more info on this in the book on DPS and fourier transformations). When this node receives a stream of data, it analyzes this stream and provides us with information about the frequencies in that signal and their strengths. We now need a timer to update the meter at regular intervals. We could use the standard javascript setInterval function, but since we're looking at the Web Audio API lets use one of its nodes. The JavaScriptNode. The javascript node With the javascriptnode we can process the raw audio data directly from javascript. We can use this to write our own analyzers or complex components. We're not going to do that, though. When creating the javascript node, you can specify the interval at which it is called. We'll use that feature to update the meter at regulat intervals. Creating a javascript node is very easy. // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); This will create a javascriptnode that is called whenever the 2048 frames have been sampled. Since our data is sampled at 44.1k, this function will be called approximately 21 times a second. Now what happens when this function is called: // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function() { // get the average, bincount is fftsize / 2 var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); var average = getAverageVolume(array) // clear the current state ctx.clearRect(0, 0, 60, 130); // set the fill style ctx.fillStyle=gradient; // create the meters ctx.fillRect(0,130-average,25,130); } function getAverageVolume(array) { var values = 0; var average; var length = array.length; // get all the frequency amplitudes for (var i = 0; i < length; i++) { values += array[i]; } average = values / length; return average; } In these two functions we calculate the average and draw the meter directly on the canvas (using a gradient so we have nice colors). Now all we have to do is connect the output from the audiosource to the analyser, the analyser to the javasource node (and if we want audio to hear, we also need to connect something to the destionation). Connect everything together Connecting everything together is easy: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; // create a buffer source node sourceNode = context.createBufferSource(); // connect the source to the analyser sourceNode.connect(analyser); // we use the javascript node to draw at a specific interval. analyser.connect(javascriptNode); // and connect to destination, if you want audio sourceNode.connect(context.destination); } And that's it. This will draw a single volume meter, for the complete signal. Now what do we do when we want to have a volume meter for each channel. For this we use a ChannelSplitter. Let's dive right into the code to connect everything: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 1024; analyser2 = context.createAnalyser(); analyser2.smoothingTimeConstant = 0.0; analyser2.fftSize = 1024; // create a buffer source node sourceNode = context.createBufferSource(); splitter = context.createChannelSplitter(); // connect the source to the analyser and the splitter sourceNode.connect(splitter); // connect one of the outputs from the splitter to // the analyser splitter.connect(analyser,0,0); splitter.connect(analyser2,1,0); // we use the javascript node to draw at a // specific interval. analyser.connect(javascriptNode); // and connect to destination sourceNode.connect(context.destination); } As you can see we don't really change much. We introduce a new node, the splitter node. This node splits the sound into a left and a right channel. These channels can be processed separately. With this layout the following happens: The audiosource creates a signal based on the buffered audio. This signal is sent to the splitter, who splits the signal into a left and right stream. Each of these two streams is processed by their own realtime analyser. From the javascript node, we now get the information from both analysers and plot both meters I've shown step 1 through 3, let's quickly move on the step 4. For this we simply add the following to the onaudioprocess node: javascriptNode.onaudioprocess = function() { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); var average = getAverageVolume(array); // get the average for the second channel var array2 = new Uint8Array(analyser2.frequencyBinCount); analyser2.getByteFrequencyData(array2); var average2 = getAverageVolume(array2); // clear the current state ctx.clearRect(0, 0, 60, 130); // set the fill style ctx.fillStyle=gradient; // create the meters ctx.fillRect(0,130-average,25,130); ctx.fillRect(30,130-average2,25,130); } And now we've got two signal meters, one for each channel. Example 2: Visualize the signal strength with a volume meter. Or view the result on youtube: Now lets see how we can get the view of the frequencies I showed earlier. Create a frequency spectrum With all the work we already did in the previous section, creating a frequency spectrum overview is now very easy. We're going to aim for this: We set up the nodes just like we did in the first example: function setupAudioNodes() { // setup a javascript node javascriptNode = context.createJavaScriptNode(2048, 1, 1); // connect to destination, else it isn't called javascriptNode.connect(context.destination); // setup a analyzer analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0.3; analyser.fftSize = 512; // create a buffer source node sourceNode = context.createBufferSource(); sourceNode.connect(analyser); analyser.connect(javascriptNode); // sourceNode.connect(context.destination); } So this time we don't split the channels and we set the fftSize to 512. This means we get 256 bars that represent our frequency. We now just need to alter the onaudioprocess method and the gradient we use: var gradient = ctx.createLinearGradient(0,0,0,300); gradient.addColorStop(1,'#000000'); gradient.addColorStop(0.75,'#ff0000'); gradient.addColorStop(0.25,'#ffff00'); gradient.addColorStop(0,'#ffffff'); // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function() { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); // clear the current state ctx.clearRect(0, 0, 1000, 325); // set the fill style ctx.fillStyle=gradient; drawSpectrum(array); } function drawSpectrum(array) { for ( var i = 0; i < (array.length); i++ ){ var value = array[i]; ctx.fillRect(i*5,325-value,3,325); } }; In the drawSpectrum function we iterate over the array, and draw a vertical bar based on the value. That's it. For a live example, click on the following link: Example 3: Visualize the frequency spectrum. Or view it on youtube: And then the final one. The spectrogram. Time based spectrogram When you run the previous demo you see the strength of the various frequency buckets in real time. While this is a nice visualization, it doesn't allow you to analyze information over a period of time. If you want to do that you can create a spectrogram. With a spectrogram we plot a single line for each measurement. The y-axis represents the frequency, the x-asis the time and the color of a pixel the strength of that frequency. It can be used to analyze the received audio, and also creates nice looking images. The good thing, is that to output this data we don't have to change much from what we've already got in place. The only function that'll change is the onaudioprocess node and we'll create a slightly different analyser. analyser = context.createAnalyser(); analyser.smoothingTimeConstant = 0; analyser.fftSize = 1024; The enalyser we create here has an fftSize of 1024, this means we get 512 frequency buckets with strengths. So we can draw a spectrogram that has a height of 512 pixels. Also note that the smoothingTimeConstant is set to 0. This means we don't use any of the previous results in the analysis. We want to show the real information, not provide a smooth volume meter or frequency spectrum analysis. The easiest way to draw a spectrogram is by just start drawing the line at the left, and for each new set of frequencies increase the x-coordinate by one. The problem is that this will quickly fill up our canvas, and we'll only be able to see the first half a minute of the audio. To fix this, we need some creative canvas copying. The complete code for drawing the spectrogram is shown here: // create a temp canvas we use for copying and scrolling var tempCanvas = document.createElement("canvas"), tempCtx = tempCanvas.getContext("2d"); tempCanvas.width=800; tempCanvas.height=512; // used for color distribution var hot = new chroma.ColorScale({ colors:['#000000', '#ff0000', '#ffff00', '#ffffff'], positions:[0, .25, .75, 1], mode:'rgb', limits:[0, 300] }); ... // when the javascript node is called // we use information from the analyzer node // to draw the volume javascriptNode.onaudioprocess = function () { // get the average for the first channel var array = new Uint8Array(analyser.frequencyBinCount); analyser.getByteFrequencyData(array); // draw the spectrogram if (sourceNode.playbackState == sourceNode.PLAYING_STATE) { drawSpectrogram(array); } } function drawSpectrogram(array) { // copy the current canvas onto the temp canvas var canvas = document.getElementById("canvas"); tempCtx.drawImage(canvas, 0, 0, 800, 512); // iterate over the elements from the array for (var i = 0; i < array.length; i++) { // draw each pixel with the specific color var value = array[i]; ctx.fillStyle = hot.getColor(value).hex(); // draw the line at the right side of the canvas ctx.fillRect(800 - 1, 512 - i, 1, 1); } // set translate on the canvas ctx.translate(-1, 0); // draw the copied image ctx.drawImage(tempCanvas, 0, 0, 800, 512, 0, 0, 800, 512); // reset the transformation matrix ctx.setTransform(1, 0, 0, 1, 0, 0); } To draw the spectrogram we do the following: We copy what is currently drawn to a hidden canvas Next we draw a line of the current values at the far right of the canvas We set the translate on the canvas to -1 We copy the copied information back to the original canvas (that is now drawn 1 pixel to the left) And reset the transformation matrix See a running example here: Example 4: Create a spectrogram Or view it here: Last thing I'd like to mention regarding the code is the chroma.js library I used for the colors. If you ever need to draw something color or gradient related (e.g maps, strengths, levels) you can easily create color scales with this library. Two final pointers, I know I'll get questions about: Volume could be represented as a magnitude, just didn't want to complicate matters for this. The spectogram doesn't use logarithmic scales. Once again, didn't want to complicate things
October 23, 2012
by Jos Dirksen
· 70,043 Views · 1 Like
article thumbnail
PartitionKey and RowKey in Windows Azure Table Storage
For the past few months, I’ve been coaching a “Microsoft Student Partner” (who has a great blog on Kinect for Windows by the way!) on Windows Azure. One of the questions he recently had was around PartitionKey and RowKey in Windows Azure Table Storage. What are these for? Do I have to specify them manually? Let’s explain… Windows Azure storage partitions All Windows Azure storage abstractions (Blob, Table, Queue) are built upon the same stack (whitepaper here). While there’s much more to tell about it, the reason why it scales is because of its partitioning logic. Whenever you store something on Windows Azure storage, it is located on some partition in the system. Partitions are used for scale out in the system. Imagine that there’s only 3 physical machines that are used for storing data in Windows Azure storage: Based on the size and load of a partition, partitions are fanned out across these machines. Whenever a partition gets a high load or grows in size, the Windows Azure storage management can kick in and move a partition to another machine: By doing this, Windows Azure can ensure a high throughput as well as its storage guarantees. If a partition gets busy, it’s moved to a server which can support the higher load. If it gets large, it’s moved to a location where there’s enough disk space available. Partitions are different for every storage mechanism: In blob storage, each blob is in a separate partition. This means that every blob can get the maximal throughput guaranteed by the system. In queues, every queue is a separate partition. In tables, it’s different: you decide how data is co-located in the system. PartitionKey in Table Storage In Table Storage, you have to decide on the PartitionKey yourself. In essence, you are responsible for the throughput you’ll get on your system. If you put every entity in the same partition (by using the same partition key), you’ll be limited to the size of the storage machines for the amount of storage you can use. Plus, you’ll be constraining the maximal throughput as there’s lots of entities in the same partition. Should you set the PartitionKey to the same value for every entity stored? No. You’ll end up with scaling issues at some point. Should you set the PartitionKey to a unique value for every entity stored? No. You can do this and every entity stored will end up in its own partition, but you’ll find that querying your data becomes more difficult. And that’s where our next concept kicks in… RowKey in Table Storage A RowKey in Table Storage is a very simple thing: it’s your “primary key” within a partition. PartitionKey + RowKey form the composite unique identifier for an entity. Within one PartitionKey, you can only have unique RowKeys. If you use multiple partitions, the same RowKey can be reused in every partition. So in essence, a RowKey is just the identifier of an entity within a partition. PartitionKey and RowKey and performance Before building your code, it’s a good idea to think about both properties. Don’t just assign them a guid or a random string as it does matter for performance. The fastest way of querying? Specifying both PartitionKey and RowKey. By doing this, table storage will immediately know which partition to query and can simply do an ID lookup on RowKey within that partition. Less fast but still fast enough will be querying by specifying PartitionKey: table storage will know which partition to query. Less fast: querying on only RowKey. Doing this will give table storage no pointer on which partition to search in, resulting in a query that possibly spans multiple partitions, possibly multiple storage nodes as well. Wihtin a partition, searching on RowKey is still pretty fast as it’s a unique index. Slow: searching on other properties (again, spans multiple partitions and properties). Note that Windows Azure storage may decide to group partitions in so-called "Range partitions" - see http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx. In order to improve query performance, think about your PartitionKey and RowKey upfront, as they are the fast way into your datasets. Deciding on PartitionKey and RowKey Here’s an exercise: say you want to store customers, orders and orderlines. What will you choose as the PartitionKey (PK) / RowKey (RK)? Let’s use three tables: Customer, Order and Orderline. An ideal setup may be this one, depending on how you want to query everything: Customer (PK: sales region, RK: customer id) – it enables fast searches on region and on customer id Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: order line id) – allows fast querying on both order id as well as order line id. Of course, depending on the system you are building, the following may be a better setup: Customer (PK: customer id, RK: display name) – it enables fast searches on customer id and display name Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: item id) – allows fast querying on both order id as well as the item bought, of course given that one order can only contain one order line for a specific item (PK + RK should be unique) You see? Choose them wisely, depending on your queries. And maybe an important sidenote: don’t be afraid of denormalizing your data and storing data twice in a different format, supporting more query variations. There’s one additional “index” That’s right! People have been asking Microsoft for a secondary index. And it’s already there… The table name itself! Take our customer – order – orderline sample again… Having a Customer table containing all customers may be interesting to search within that data. But having an Orders table containing every order for every customer may not be the ideal solution. Maybe you want to create an order table per customer? Doing that, you can easily query the order id (it’s the table name) and within the order table, you can have more detail in PK and RK. And there's one more: your account name. Split data over multiple storage accounts and you have yet another "partition". Conclusion In conclusion? Choose PartitionKey and RowKey wisely. The more meaningful to your application or business domain, the faster querying will be and the more efficient table storage will work in the long run.
October 19, 2012
by Maarten Balliauw
· 57,698 Views · 10 Likes
article thumbnail
Debugging Hibernate Envers - Historical Data
recently in our project we reported a strange bug. in one report where we display historical data provided by hibernate envers , users encountered duplicated records in the dropdown used for filtering. we tried to find the source of this bug, but after spending a few hours looking at the code responsible for this functionality we had to give up and ask for a dump from production database to check what actually is stored in one table. and when we got it and started investigating, it turned out that there is a bug in hibernate envers 3.6 that is a cause of our problems. but luckily after some investigation and invaluable help from adam warski (author of envers) we were able to fix this issue. bug itself let’s consider following scenario: a transaction is started. we insert some audited entities during it and then it is rolled back. the same entitymanager is reused to start another transaction second transaction is committed but when we check audit tables for entities that were created and then rolled back in step one, we will notice that they are still there and were not rolled back as we expected. we were able to reproduce it in a failing test in our project, so the next step was to prepare failing test in envers so we could verify if our fix is working. failing test the simplest test cases already present in envers are located in simple.java class and they look quite straightforward: public class simple extends abstractentitytest { private integer id1; public void configure(ejb3configuration cfg) { cfg.addannotatedclass(inttestentity.class); } @test public void initdata() { entitymanager em = getentitymanager(); em.gettransaction().begin(); inttestentity ite = new inttestentity(10); em.persist(ite); id1 = ite.getid(); em.gettransaction().commit(); em.gettransaction().begin(); ite = em.find(inttestentity.class, id1); ite.setnumber(20); em.gettransaction().commit(); } @test(dependsonmethods = "initdata") public void testrevisionscounts() { assert arrays.aslist(1, 2).equals(getauditreader().getrevisions(inttestentity.class, id1)); } @test(dependsonmethods = "initdata") public void testhistoryofid1() { inttestentity ver1 = new inttestentity(10, id1); inttestentity ver2 = new inttestentity(20, id1); assert getauditreader().find(inttestentity.class, id1, 1).equals(ver1); assert getauditreader().find(inttestentity.class, id1, 2).equals(ver2); } } so preparing my failing test executing scenario described above wasn’t a rocket science: /** * @author tomasz dziurko (tdziurko at gmail dot com) */ public class transactionrollbackbehaviour extends abstractentitytest { public void configure(ejb3configuration cfg) { cfg.addannotatedclass(inttestentity.class); } @test public void testauditrecordsrollback() { // given entitymanager em = getentitymanager(); em.gettransaction().begin(); inttestentity itetorollback = new inttestentity(30); em.persist(itetorollback); integer rollbackediteid = itetorollback.getid(); em.gettransaction().rollback(); // when em.gettransaction().begin(); inttestentity ite2 = new inttestentity(50); em.persist(ite2); integer ite2id = ite2.getid(); em.gettransaction().commit(); // then list revisionsforsavedclass = getauditreader().getrevisions(inttestentity.class, ite2id); assertequals(revisionsforsavedclass.size(), 1, "there should be one revision for inserted entity"); list revisionsforrolledbackclass = getauditreader().getrevisions(inttestentity.class, rollbackediteid); assertequals(revisionsforrolledbackclass.size(), 0, "there should be no revisions for insert that was rolled back"); } } now i could verify that tests are failing on the forked 3.6 branch and check if the fix that we had is making this test green. the fix after writing a failing test in our project, i placed several breakpoints in envers code to understand better what is wrong there. but imagine being thrown in a project developed for a few years by many programmers smarter than you. i felt overwhelmed and had no idea where the fix should be applied and what exactly is not working as expected. luckily in my company we have adam warski on board. he is the initial author of envers and actually he pointed us the solution. the fix itself contains only one check that registers audit processes that will be executed on transaction completion only when such processes iare still in the map for the given transaction. it sounds complicated, but if you look at the class auditprocessmanager in this commit it should be more clear what is happening there. official path besides locating a problem and fixing it, there are some more official steps that must be performed to have fix included in envers. step 1. create jira issue with bug - https://hibernate.onjira.com/browse/hhh-7682 step 2: create local branch envers-bugfix-hhh-7682 of forked hibernate 3.6 step 3: commit and push failing test and fix to your local and remote repository on github step 4: create pull request - https://github.com/hibernate/hibernate-orm/pull/393 step 5: wait for merge and that’s all. now fix is merged into main repository and we have one bug less in the world of open source
October 17, 2012
by Tomasz Dziurko
· 7,792 Views
article thumbnail
Implementing Repository Pattern with Entity Framework
When working with Entity Framework - Code First model approach, a developer creates POCO entities for database tables. The benefit of using Code First model is to have POCO entity for each table that can be used as either WCF Data Contracts or you can apply your own custom attributes to handle Security, Logging, etc. and there is no mapping needed as we used to do in Entity Framework (Model First) approach if the application architecture is n-tier based. Considering the Data Access layer, we will implement a repository pattern that encapsulates the persistence logic in a separate class. This class will be responsible to perform database operations. Let’s suppose the application is based on n-tier architecture and having 3 tiers namely Presentation, Business and Data Access. Common library contains all our POCO entities that will be used by all the layers. Presentation Layer: Contains Views, Forms Business Layer: Managers that handle logic functionality Data Access Layer: Contains Repository class that handles CRUD operations Common Library: Contain POCO entities. We will implement an interface named “IRepository” that defines the signature of all the appropriate generic methods needed to perform CRUD operation and then implement the Repository class that defines the actual implementation of each method. We can also instantiate Repository object using Dependency Injection or apply Factory pattern. Code Snippet: IRepository public interface IRepository : IDisposable { /// /// Gets all objects from database /// /// IQueryable All() where T : class; /// /// Gets objects from database by filter. /// /// Specified a filter /// IQueryable Filter(Expression> predicate) where T : class; /// /// Gets objects from database with filting and paging. /// /// /// Specified a filter /// Returns the total records count of the filter. /// Specified the page index. /// Specified the page size /// IQueryable Filter(Expression> filter, out int total, int index = 0, int size = 50) where T : class; /// /// Gets the object(s) is exists in database by specified filter. /// /// Specified the filter expression /// bool Contains(Expression> predicate) where T : class; /// /// Find object by keys. /// /// Specified the search keys. /// T Find(params object[] keys) where T : class; /// /// Find object by specified expression. /// /// /// T Find(Expression> predicate) where T : class; /// /// Create a new object to database. /// /// Specified a new object to create. /// T Create(T t) where T : class; /// /// Delete the object from database. /// /// Specified a existing object to delete. int Delete(T t) where T : class; /// /// Delete objects from database by specified filter expression. /// /// /// int Delete(Expression> predicate) where T : class; /// /// Update object changes and save to database. /// /// Specified the object to save. /// int Update(T t) where T : class; /// /// Select Single Item by specified expression. /// /// /// /// T Single(Expression> expression) where T : class; void SaveChanges(); void ExecuteProcedure(String procedureCommand, params SqlParameter[] sqlParams); } Code Snippet: Repository public class Repository : IRepository { DbContext Context; public Repository() { Context = new DBContext(); } public Repository(DBContext context) { Context = context; } public void CommitChanges() { Context.SaveChanges(); } public T Single(Expression> expression) where T : class { return All().FirstOrDefault(expression); } public IQueryable All() where T : class { return Context.Set().AsQueryable(); } public virtual IQueryable Filter(Expression> predicate) where T : class { return Context.Set().Where(predicate).AsQueryable(); } public virtual IQueryable Filter(Expression> filter, out int total, int index = 0, int size = 50) where T : class { int skipCount = index * size; var _resetSet = filter != null ? Context.Set().Where(filter).AsQueryable() : Context.Set().AsQueryable(); _resetSet = skipCount == 0 ? _resetSet.Take(size) : _resetSet.Skip(skipCount).Take(size); total = _resetSet.Count(); return _resetSet.AsQueryable(); } public virtual T Create(T TObject) where T : class { var newEntry = Context.Set().Add(TObject); Context.SaveChanges(); return newEntry; } public virtual int Delete(T TObject) where T : class { Context.Set().Remove(TObject); return Context.SaveChanges(); } public virtual int Update(T TObject) where T : class { try { var entry = Context.Entry(TObject); Context.Set().Attach(TObject); entry.State = EntityState.Modified; return Context.SaveChanges(); } catch (OptimisticConcurrencyException ex) { throw ex; } } public virtual int Delete(Expression> predicate) where T : class { var objects = Filter(predicate); foreach (var obj in objects) Context.Set().Remove(obj); return Context.SaveChanges(); } public bool Contains(Expression> predicate) where T : class { return Context.Set().Count(predicate) > 0; } public virtual T Find(params object[] keys) where T : class { return (T)Context.Set().Find(keys); } public virtual T Find(Expression> predicate) where T : class { return Context.Set().FirstOrDefault(predicate); } public virtual void ExecuteProcedure(String procedureCommand, params SqlParameter[] sqlParams){ Context.Database.ExecuteSqlCommand(procedureCommand, sqlParams); } public virtual void SaveChanges() { Context.SaveChanges(); } public void Dispose() { if (Context != null) Context.Dispose(); } } The benefit of using Repository pattern is that all the database operations will be managed centrally and in future if you want to change the underlying database connector you can add another Repository class and defines its own implementation or change the existing one.
October 13, 2012
by Ovais Mehboob Ahmed Khan
· 31,747 Views
article thumbnail
MongoDB Aggregation Framework Examples in C#
MongoDB version 2.2 was released in late August and the biggest change it brought was the addition of the Aggregation Framework. Previously the aggregations required the usage of map/reduce, which in MongoDB doesn’t perform that well, mainly because of the single-threaded Javascript-based execution. The aggregation framework steps away from the Javascript and is implemented in C++, with an aim to accelerate performance of analytics and reporting up to 80 percent compared to using MapReduce. The aim of this post is to show examples of running the MongoDB Aggregation Framework with the official MongoDB C# drivers. Aggregation Framework and Linq Even though the current version of the MongoDB C# drivers (1.6) supports Linq, the support doesn’t extend to the aggregation framework. It’s highly probable that the Linq-support will be added later on and there’s already some hints about this in the driver’s source code. But at this point the execution of the aggregations requires the usage of the BsonDocument-objects. Aggregation Framework and GUIDs If you use GUIDs in your documents, the aggregation framework doesn’t work. This is because by default the GUIDs are stored in binary format and the aggregations won’t work against documents which contain binary data.. The solution is to store the GUIDs as strings. You can force the C# drivers to make this conversion automatically by configuring the mapping. Given that your C# class has Id-property defined as a GUID, the following code tells the driver to serialize the GUID as a string: BsonClassMap.RegisterClassMap(cm => { cm.AutoMap(); cm.GetMemberMap(c => c.Id) .SetRepresentation( BsonType.String); }); The example data These examples use the following documents: > db.examples.find() { "_id" : "1", "User" : "Tom", "Country" : "Finland", "Count" : 1 } { "_id" : "2", "User" : "Tom", "Country" : "Finland", "Count" : 3 } { "_id" : "3", "User" : "Tom", "Country" : "Finland", "Count" : 2 } { "_id" : "4", "User" : "Mary", "Country" : "Sweden", "Count" : 1 } { "_id" : "5", "User" : "Mary", "Country" : "Sweden", "Count" : 7 } Example 1: Aggregation Framework Basic usage This example shows how the aggregation framework can be executed through C#. We’re not going run any calculations to the data, we’re just going to filter it by the User. To run the aggregations, you can use either the MongoDatabase.RunCommand –method or the helper MongoCollection.Aggregate. We’re going to use the latter: var coll = localDb.GetCollection("examples"); ... coll.Aggregate(pipeline); The hardest part when working with Aggregation Framework through C# is building the pipeline. The pipeline is similar concept to the piping in PowerShell. Each operation in the pipeline will make modifications to the data: the operations can for example filter, group and project the data. In C#, the pipeline is a collection of BsonDocument object. Each document represents one operation. In our first example we need to do only one operation: $match. This operator will filter out the given documents. The following BsonDocument is a pipeline operation which filters out all the documents which don’t have User-field set to “Tom”. var match = new BsonDocument { { "$match", new BsonDocument { {"User", "Tom"} } } }; To execute this operation we add it to an array and pass the array to the MongoCollection.Aggregate-method: var pipeline = new[] { match }; var result = coll.Aggregate(pipeline); The MongoCollection.Aggregate-method returns an AggregateResult-object. It’s ResultDocuments-property (IEnumarable) contains the documents which are the output of the aggregation. To check how many results there were, we can get the Count: var result = coll.Aggregate(pipeline); Console.WriteLine(result.ResultDocuments.Count()); The result documents are BsonDocument-objects. If you have a C#-class which represent the documents, you can cast the results: var matchingExamples = result.ResultDocuments .Select(BsonSerializer.Deserialize) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example.User, example.Count); Console.WriteLine(message); } Another alternative is to use C#’s dynamic type. The following extension method uses JSON.net to convert a BsonDocument into a dynamic: public static class MongoExtensions { public static dynamic ToDynamic(this BsonDocument doc) { var json = doc.ToJson(); dynamic obj = JToken.Parse(json); return obj; } } Here’s a way to convert all the result documents into dynamic objects: var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); Example 2: Multiple filters & comparison operators This example filters the data with the following criteria: User: Tom Count: >= 2 var match = new BsonDocument { { "$match", new BsonDocument { {"User", "Tom"}, {"Count", new BsonDocument { { "$gte", 2 } } } } }; The execution of this operation is identical to the first example: var pipeline = new[] { match }; var result = coll.Aggregate(pipeline); var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); Also the result are as expected: foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example.User, example.Count); Console.WriteLine(message); } Example 3: Multiple operations In our first two examples, the pipeline was as simple as possible: It contained only one operation. This example will filter the data with the same exact criteria as the second example, but this time using two $match operations: User: Tom Count: >= 2 var match = new BsonDocument { { "$match", new BsonDocument { {"User", "Tom"} } } }; var match2 = new BsonDocument { { "$match", new BsonDocument { {"Count", new BsonDocument { { "$gte", 2 } } } } }; var pipeline = new[] { match, match2 }; The output stays the same: The first operation “match” takes all the documents from the examples collection and removes every document which doesn’t match the criteria User = Tom. The output of this operation (3 documents) then moves to the second operation “match2” of the pipeline. This operation only sees those 3 documents, not the original collection. The operation filters out these documents based on its criteria and moves the result (2 documents) forward. This is where our pipeline ends and this is also our result. Example 4: Group and sum Thus far we’ve used the aggregation framework to just filter out the data. The true strength of the framework is its ability to run calculations on the documents. This example shows how we can calculate how many documents there are in the collection, grouped by the user. This is done using the $group-operator: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "MyUser","$User" } } }, { "Count", new BsonDocument { { "$sum", 1 } } } } } }; The grouping key (in our case the User-field) is defined with the _id. The above example states that the grouping key has one field (“MyUser”) and the value for that field comes from the document’s User-field ($User). In the $group operation the other fields are aggregate functions. This example defines the field “Count” and adds 1 to it for every document that matches the group key (_id). var pipeline = new[] { group }; var result = coll.Aggregate(pipeline); var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example._id.MyUser, example.Count); Console.WriteLine(message); } Note the format in which the results are outputted: The user’s name is accessed through _id.MyUser-property. Example 5: Group and sum by field This example is similar to example 4. But instead of calculating the amount of documents, we calculate the sum of the Count-fields by the user: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "MyUser","$User" } } }, { "Count", new BsonDocument { { "$sum", "$Count" } } } } } }; The only change is that instead of adding 1, we add the value from the Count-field (“$Count”). Example 6: Projections This example shows how the $project operator can be used to change the format of the output. The grouping in example 5 works well, but to access the user’s name we currently have to point to the _id.MyUser-property. Let’s change this so that user’s name is available directly through UserName-property: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "MyUser","$User" } } }, { "Count", new BsonDocument { { "$sum", "$Count" } } } } } }; var project = new BsonDocument { { "$project", new BsonDocument { {"_id", 0}, {"UserName","$_id.MyUser"}, {"Count", 1}, } } }; var pipeline = new[] { group, project }; The code removes the _id –property from the output. It adds the UserName-property, which value is accessed from field _id.MyUser. The projection operations also states that the Count-value should stay as it is. var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example.UserName, example.Count); Console.WriteLine(message); } Example 7: Group with multiple fields in the keys For this example we add a new row into our document collection, leaving us with the following: { "_id" : "1", "User" : "Tom", "Country" : "Finland", "Count" : 1 } { "_id" : "2", "User" : "Tom", "Country" : "Finland", "Count" : 3 } { "_id" : "3", "User" : "Tom", "Country" : "Finland", "Count" : 2 } { "_id" : "4", "User" : "Mary", "Country" : "Sweden", "Count" : 1 } { "_id" : "5", "User" : "Mary", "Country" : "Sweden", "Count" : 7 } { "_id" : "6", "User" : "Tom", "Country" : "England", "Count" : 3 } This example shows how you can group the data by using multiple fields in the grouping key: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "MyUser","$User" }, { "Country","$Country" }, } }, { "Count", new BsonDocument { { "$sum", "$Count" } } } } } }; var project = new BsonDocument { { "$project", new BsonDocument { {"_id", 0}, {"UserName","$_id.MyUser"}, {"Country", "$_id.Country"}, {"Count", 1}, } } }; var pipeline = new[] { group, project }; var result = coll.Aggregate(pipeline); var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1} - {2}", example.UserName, example.Country, example.Count); Console.WriteLine(message); } Example 8: Match, group and project This example shows how you can combine many different pipeline operations. The data is first filtered ($match) by User=Tom, then grouped by the Country (“$group”) and finally the output is formatted into a readable format ($project). Match: var match = new BsonDocument { { "$match", new BsonDocument { {"User", "Tom"} } } }; Group: var group = new BsonDocument { { "$group", new BsonDocument { { "_id", new BsonDocument { { "Country","$Country" }, } }, { "Count", new BsonDocument { { "$sum", "$Count" } } } } } }; Project: var project = new BsonDocument { { "$project", new BsonDocument { {"_id", 0}, {"Country", "$_id.Country"}, {"Count", 1}, } } }; Result: var pipeline = new[] { match, group, project }; var result = coll.Aggregate(pipeline); var matchingExamples = result.ResultDocuments .Select(x => x.ToDynamic()) .ToList(); foreach (var example in matchingExamples) { var message = string.Format("{0} - {1}", example.Country, example.Count); Console.WriteLine(message); } More There are many other interesting operators in the MongoDB Aggregation Framework, like $unwind and $sort. The usage of these operators is identical to ones we used above so it should be possible to copy-paste one of the examples and use it as a basis for these other operations. Links MongoDB C# Language Center MongoDB Aggregation Framework Easy to follow blog post about the aggregation framework
October 11, 2012
by Mikael Koskinen
· 47,769 Views · 2 Likes
article thumbnail
How to Create and Deploy a Website with Windows Azure
Curator's note: This article originally appeared at WindowsAzure.com. To use this feature and other new Windows Azure capabilities, sign up for the free preview. Just as you can quickly create and deploy a web application created from the gallery, you can also deploy a website created on a workstation with traditional developer tools from Microsoft or other companies. Table of Contents Deployment Options How to: Create a Website Using the Management Portal How to: Create a Website from the Gallery How to: Delete a Website Next Steps Deployment Options Windows Azure supports deploying websites from remote computers using WebDeploy, FTP, GIT or TFS. Many development tools provide integrated support for publication using one or more of these methods and may only require that you provide the necessary credentials, site URL and hostname or URL for your chosen deployment method. Credentials and deployment URLs for all enabled deployment methods are stored in the website's publish profile, a file which can be downloaded in the Windows Azure (Preview) Management Portal from the Quick Start page or the quick glance section of the Dashboard page. If you prefer to deploy your website with a separate client application, high quality open source GIT and FTP clients are available for download on the Internet for this purpose. How to: Create a Website Using the Management Portal Follow these steps to create a website in Windows Azure. Login to the Windows Azure (Preview) Management Portal. Click the Create New icon on the bottom left of the Management Portal. Click the Web Site icon, click the Quick Create icon, enter a value for URL and then click the check mark next to create web site on the bottom right corner of the page. When the website has been created you will see the text Creation of Web Site '[SITENAME]' Completed. Click the name of the website displayed in the list of websites to open the website's Quick Start management page. On the Quick Start page you are provided with options to set up TFS or GIT publishing if you would like to deploy your finished website to Windows Azure using these methods. FTP publishing is set up by default for websites and the FTP Host name is displayed under FTP Hostname on the Quick Start and Dashboard pages. Before publishing with FTP or GIT choose the option to Reset deployment credentials on the Dashboard page. Then specify the new credentials (username and password) to authenticate against the FTP Host or the Git Repository when deploying content to the website. The Configure management page exposes several configurable application settings in the following sections: Framework: Set the version of .NET framework or PHP required by your web application. Diagnostics: Set logging options for gathering diagnostic information for your website in this section. App Settings: Specify name/value pairs that will be loaded by your web application on start up. For .NET sites, these settings will be injected into your .NET configuration AppSettings at runtime, overriding existing settings. For PHP and Node sites these settings will be available as environment variables at runtime. Connection Strings: View connection strings for linked resources. For .NET sites, these connection strings will be injected into your .NET configuration connectionStrings settings at runtime, overriding existing entries where the key equals the linked database name. For PHP and Node sites these settings will be available as environment variables at runtime. Default Documents: Add your web application's default document to this list if it is not already in the list. If your web application contains more than one of the files in the list then make sure your website's default document appears at the top of the list. How to: Create a Website from the Gallery The gallery makes available a wide range of popular web applications developed by Microsoft, third party companies, and open source software initiatives. Web applications created from the gallery do not require installation of any software other than the browser used to connect to the Windows Azure Management Portal. In this tutorial, you'll learn: How to create a new site through the gallery. How to deploy the site through the Windows Azure Portal. You'll build a Word press blog that uses a default template. The following illustration shows the completed application: Note To complete this tutorial, you need a Windows Azure account that has the Windows Azure Web Sites feature enabled. You can create a free trial account and enable preview features in just a couple of minutes. For details, see Create a Windows Azure account and enable preview features. Create a web site in the portal Login to the Windows Azure Management Portal. Click the New icon on the bottom left of the dashboard. Click the Web Site icon, and click From Gallery. Locate and click the WordPress icon in list, and then click Next. On the Configure Your App page, enter or select values for all fields: Enter a URL name of your choice Leave Create a new MySQL database selected in the Database field Select the region closest to you Then click Next. On the Create New Database page, you can specify a name for your new MySQL database or use the default name. Select the region closest to you as the hosting location. Select the box at the bottom of the screen to agree to ClearDB's usage terms for your hosted MySQL database. Then click the check to complete the site creation. After you click Complete Windows Azure will initiate build and deploy operations. While the web site is being built and deployed the status of these operations is displayed at the bottom of the Web Sites page. After all operations are performed, A final status message when the site has been successfully deployed. Launch and manage your WordPress site Click on your new site from the Web Sites page to open the dashboard for the site. On the Dashboard management page, scroll down and click the link on the left under Site Url to open the site’s welcome page. Enter appropriate configuration information required by WordPress and click Install WordPress to finalize configuration and open the web site’s login page. Login to the new WordPress web site by entering the username and password that you specified on the Welcome page. You'll have a new WordPress site that looks similar to the site below. How to: Delete a Website Websites are deleted using the Delete icon in the Windows Azure Management Portal. The Delete icon is available in the Windows Azure Portal when you click Web Sites to list all of your websites and at the bottom of each of the website management pages. Next Steps For more information about Websites, see the following: Walkthrough: Troubleshooting a Website on Windows Azure
October 9, 2012
by Eric Gregory
· 85,321 Views
article thumbnail
SQL Query Optimization and Normalization
Explore SQL query optimization and normalization.
October 4, 2012
by Michael Georgiou
· 37,781 Views · 2 Likes
article thumbnail
Difference Between Mysql Replace and Insert on Duplicate Key Update
While me and my friend roshan recently working as a support developers at Australia famous e-commerce website. recently roshan as assign a new bug in this site it’s related to the product synchronize process in the ware house product table and the e-commerce site, his main task was check the quickly the site product table and check with ware house product table product if the either insert new data into a site database, or update an existing record on the site database, Of course, doing a lookup to see if the record exists already and then either updating or inserting would be an expensive process (existing items are defined either by a unique key or a primary key). Luckily, MySQL offers two functions to combat this (each with two very different approaches). 1. REPLACE = DELETE+INSERT 2. INSERT ON DUPLICATE KEY UPDATE = UPDATE + INSERT 1 . REPLACE This syntax is the same as the INSERT function. When dealing with a record with a unique or primary key, REPLACE will either do a DELETE and then an INSERT, or just an INSERT if use this this function will cause a record to be removed, and inserted at the end. It will cause the indexing to get broken apart, decreasing the efficiency of the table. If, however REPLACE INTO ds_product SET pID = 3112, catID = 231, uniCost = 232.50, salePrice = 250.23; 2. ON DUPLICATE KEY UPDATE ON DUPLICATE KEY UPDATE clause to the INSERT function. This one actively hunts down an existing record in the table which has the same UNIQUE or PRIMARY KEY as the one we’re trying to update. If it finds an existing one, you specify a clause for which column(s) you would like to UPDATE. Otherwise, it will do a normal INSERT. INSERT INTO ds_product SET pID = 3112, catID = 231, uniCost = 232.50, salePrice = 250.23, ON DUPLICATE KEY UPDATE uniCost = 232.50, salePrice = 250.23; This should be helpful when trying to create database queries that add and update information, without having to go through the extra step. Thanks Have a Nice Day
October 3, 2012
by Prathap Givantha Kalansuriya
· 14,401 Views
article thumbnail
Customizing Spring Data JPA Repository
Spring Data is a very convenient library. However, as the project as quite new, it is not well featured. By default, Spring Data JPA will provide implementation of the DAO based on SimpleJpaRepository. In recent project, I have developed a customize repository base class so that I could add more features on it. You could add vendor specific features to this repository base class as you like. Configuration You have to add the following configuration to you spring beans configuration file. You have to specified a new repository factory class. We will develop the class later. extends SimpleJpaRepository implements GenericRepository , Serializable{ private static final long serialVersionUID = 1L; static Logger logger = Logger.getLogger(GenericRepositoryImpl.class); private final JpaEntityInformation entityInformation; private final EntityManager em; private final DefaultPersistenceProvider provider; private Class springDataRepositoryInterface; public Class getSpringDataRepositoryInterface() { return springDataRepositoryInterface; } public void setSpringDataRepositoryInterface( Class springDataRepositoryInterface) { this.springDataRepositoryInterface = springDataRepositoryInterface; } /** * Creates a new {@link SimpleJpaRepository} to manage objects of the given * {@link JpaEntityInformation}. * * @param entityInformation * @param entityManager */ public GenericRepositoryImpl (JpaEntityInformation entityInformation, EntityManager entityManager , Class springDataRepositoryInterface) { super(entityInformation, entityManager); this.entityInformation = entityInformation; this.em = entityManager; this.provider = DefaultPersistenceProvider.fromEntityManager(entityManager); this.springDataRepositoryInterface = springDataRepositoryInterface; } /** * Creates a new {@link SimpleJpaRepository} to manage objects of the given * domain type. * * @param domainClass * @param em */ public GenericRepositoryImpl(Class domainClass, EntityManager em) { this(JpaEntityInformationSupport.getMetadata(domainClass, em), em, null); } public S save(S entity) { if (this.entityInformation.isNew(entity)) { this.em.persist(entity); flush(); return entity; } entity = this.em.merge(entity); flush(); return entity; } public T saveWithoutFlush(T entity) { return super.save(entity); } public List saveWithoutFlush(Iterable entities) { List result = new ArrayList(); if (entities == null) { return result; } for (T entity : entities) { result.add(saveWithoutFlush(entity)); } return result; } } As a simple example here, I just override the default save method of the SimpleJPARepository. The default behaviour of the save method will not flush after persist. I modified to make it flush after persist. On the other hand, I add another method called saveWithoutFlush() to allow developer to call save the entity without flush. Define Custom repository factory bean The last step is to create a factory bean class and factory class to produce repository based on your customized base repository class. public class DefaultRepositoryFactoryBean , S, ID extends Serializable> extends JpaRepositoryFactoryBean { /** * Returns a {@link RepositoryFactorySupport}. * * @param entityManager * @return */ protected RepositoryFactorySupport createRepositoryFactory( EntityManager entityManager) { return new DefaultRepositoryFactory(entityManager); } } /** * * The purpose of this class is to override the default behaviour of the spring JpaRepositoryFactory class. * It will produce a GenericRepositoryImpl object instead of SimpleJpaRepository. * */ public class DefaultRepositoryFactory extends JpaRepositoryFactory{ private final EntityManager entityManager; private final QueryExtractor extractor; public DefaultRepositoryFactory(EntityManager entityManager) { super(entityManager); Assert.notNull(entityManager); this.entityManager = entityManager; this.extractor = DefaultPersistenceProvider.fromEntityManager(entityManager); } @SuppressWarnings({ "unchecked", "rawtypes" }) protected JpaRepository getTargetRepository( RepositoryMetadata metadata, EntityManager entityManager) { Class repositoryInterface = metadata.getRepositoryInterface(); JpaEntityInformation entityInformation = getEntityInformation(metadata.getDomainType()); if (isQueryDslExecutor(repositoryInterface)) { return new QueryDslJpaRepository(entityInformation, entityManager); } else { return new GenericRepositoryImpl(entityInformation, entityManager, repositoryInterface); //custom implementation } } @Override protected Class getRepositoryBaseClass(RepositoryMetadata metadata) { if (isQueryDslExecutor(metadata.getRepositoryInterface())) { return QueryDslJpaRepository.class; } else { return GenericRepositoryImpl.class; } } /** * Returns whether the given repository interface requires a QueryDsl * specific implementation to be chosen. * * @param repositoryInterface * @return */ private boolean isQueryDslExecutor(Class repositoryInterface) { return QUERY_DSL_PRESENT && QueryDslPredicateExecutor.class .isAssignableFrom(repositoryInterface); } } Conclusion You could now add more features to base repository class. In your program, you could now create your own repository interface extending GenericRepository instead of JpaRepository. public interface MyRepository extends GenericRepository { void someCustomMethod(ID id); } In next post, I will show you how to add hibernate filter features to this GenericRepository.
September 27, 2012
by Boris Lam
· 98,096 Views · 4 Likes
article thumbnail
Introducing the New Date and Time API for JDK 8
Date and time handling in Java is a somewhat tricky part when you are new to the language. Time can be accessed via the static method System.currentTimeMillis() which returns the current time in milliseconds from January 1st 1970. If you prefer to work with Objects instead you can use java.util.Date, a class whose methods are mostly deprecated in recent versions of Java. To work with time offsets, say add one month to a date, there is java.util.GregorianCalendar. All in all, those methods described here are not very convenient to work with. Java 7 and below are lacking a good date and time API. The Joda Time library is a common drop-in if you need to work with date/time. With JSR 310 (Java Specification Request) this is about to change. JSR 310 adds a new date, time and calendar API to Java 8. The ThreeTen project provides a reference implementation to this new API and can already be utilized in current Java projects (I however recommend not to do this for production). As the README states: The API is currently considered usable and accurate, yet incomplete and subject to change. If you use this API you must be able to handle incompatible changes in later versions. Building ThreeTen Building the ThreeTen project is relatively easy. It requires both Git and Ant to be installed on your system. git clone git://github.com/ThreeTen/threeten.git cd threeten ant This will first fetch the most recent version of ThreeTen and then start the build process using ant. Note that building the library also requires either OpenJDK 1.6 or Oracle JDK 1.6. JSR 310 The new API specifies a number of new classes which are divided into the categories of continuous and human time. Continuous time is based on Unix time and is represented as a single incrementing number. Class Description Instant A point in time in nanoseconds from January 1st 1970 Duration An amount of time measured in nanoseconds Human time is based on fields that we use in our daily lifes such as day, hour, minute and second. It is represented by a group of classes, some of which we will discuss in this article. Class Description LocalDate a date, without time of day, offset or zone LocalTime the time of day, without date, offset or zone LocalDateTime the date and time, without offset or zone OffsetDate a date with an offset such as +02:00, without time of day or zone OffsetTime the time of day with an offset such as +02:00, without date or zone OffsetDateTime the date and time with an offset such as +02:00, without a zone ZonedDateTime the date and time with a time zone and offset YearMonth a year and month MonthDay month and day Year/MonthOfDay/DayOfWeek/... classes for the important fields DateTimeFields stores a map of field-value pairs which may be invalid Calendrical access to the low-level API Period a descriptive amount of time, such as "2 months and 3 days" In addition to the above classes three support classes have been implemented. The Clock class wraps the current time and date, ZoneOffset is a time offset from UTC and ZoneId defines a time zone such as 'Australia/Brisbane'. Using the API Getting the current time The current time is represented by the Clock class. The class is abstract, so you can not create instances of it. The systemUTC() static method will return the current time based on your system clock and set to UTC. import javax.time.Clock; Clock clock = Clock.systemUTC(); To use the default time zone on your system there also is systemDefaultZone(). Clock clock = Clock.systemDefaultZone(); The millis() method can then be used to access the current time in milliseconds from January 1st, 1970. This shows, that the Clock class and all subclasses are wrapped around System.currentTimeMillis(). Clock clock = Clock.systemDefaultZone(); long time = clock.millis(); Working with time zones To work with time zones you need to import the ZoneId class. The class provides a method to get the default system time zone: import javax.time.ZoneId; import javax.time.Clock; ZoneId zone = ZoneId.systemDefault(); Clock clock = Clock.system(zone); As seen above, the ZoneId can then be used to get an instance of a Clock with that time zone. Other time zones can be accessed by their name, e.g.: ZoneId zone = ZoneId.of("Europe/Berlin"); Clock clock = Clock.system(zone); Getting human date and time Working with a time represented in a single long variable is not what we wanted. We want to work with objects that represent human readable time. The LocalDate, LocalTime and LocalDateTime classes do just that. import javax.time.LocalDate; // The now() method returns the current DateTime LocalDate date = LocalDate.now(); System.out.printf("%s-%s-%s", date.getYear(), date.getMonthValue(), date.getDayOfMonth() ); Using LocalDate to print the current date Doing calculations with times and dates One of the most important functionalities of JSR-310 is that you can do calculations with dates and times. The API makes it very easy to do that. import javax.time.LocalTime; import javax.time.Period; import static javax.time.calendrical.LocalPeriodUnit.HOURS; Period p = Period.of(5, HOURS); LocalTime time = LocalTime.now(); LocalTime newTime; newTime = time.plus(5, HOURS); // or newTime = time.plusHours(5); // or newTime = time.plus(p); Three ways of adding 5 hours to the current time Each class that represents human time implements the AdjustableDateTime interface. The interface requires the plus and the minus method that take a value and a PeriodUnit as argument. Conclusion This article gave a (very) brief introduction into the new date and time API that will ship with Java 8. The API seems to be very consistent and well thought through and provides many ways to interact with dates and times. Upon release of Java 8 the API will be moved from the javax.time package over to java.time, so there will be no conflict if you start using the current implementation.
September 25, 2012
by Fabian Becker
· 78,544 Views
article thumbnail
Nested Data Structures, and non-1NF design in PostgreSQL
This has been adapted from an ongoing series currently running on my blog. It has been adapted to be more self-contained, and rely less on other blog entries. For more see http://ledgersmbdev.blogspot.com PostgreSQL provides a very advanced set of tools for doing data modelling in ways which drift back and forth across a relational and non-relational divide. While it is generally a good idea to make the database relational first, and add objects later, the principles of object-relational database design allow you to do a lot more with PostgreSQL than you can on many other database platforms. This article will discuss the use of non-first-normal-form designs, in particular the storage of arrays of tuples in columns to simulate a nested table. The possible uses and problems of such a design will be discussed in detail. One of the promises of object-relational modelling is the ability to address information modelling on complex and nested data structures. Nested data structures bring considerable richness to the database, which is lost in a pure, flat, relational model. Nested data structures can be used to model tuple constraints in ways that are impossible to do when looking at flat data structures, at least as long as those constraints are limited to the information in a single tuple. At the same time there are cases where they simplify things and cases where they complicate things. This is true both in the case of using these for storage and for interfacing with stored procedures. PostgreSQL allows for nested tuples to be stored in a database, and for arrays of tuples. Other ORDBMS's allow something similar (Informix, DB2, and Oracle all support nested tables). Nested tables in PostgreSQL provide a number of gotchas, and additionally exposing the data in them to relational queries takes some extra work. In this post we will look at modelling general ledger transactions using a nested table approach, and both the benefits and limitations of this approach. In general this trades one set of problems for another and it is important to recognize the problems going in. The storage example came out of a brainstorming session I had with Marc Balmer of Micro Systems, though it is worth noting that this is not the solution they use in their products, nor is it the approach currently used by LedgerSMB. Basic Table Structure: The basic data schema will end up looking like this: CREATE TABLE journal_type ( id serial not null unique, label text primary key ); CREATE TABLE account ( id serial not null unique, control_code text primary key, -- account number description text ); CREATE TYPE journal_line_type AS ( account_id int, amount numeric ); CREATE TABLE journal_entry ( id serial not null unique, journal_type int references journal_type(id), source_document_id text,-- for example invoice number date_posted date not null, description text, line_items journal_line_type[], PRIMARY KEY (journal_type, source_document_id) ); This schema has a number of obvious gotchas and cannot, by itself, guarantee the sorts of things we want to do. However, using object-relational modelling we can fix these in ways that cannot do in a purely relational schema. The main problems are: First, since this is a double entry model, we need a constraint that says that the sum of the amounts of the lines must always equal zero. However, if we just add a sum() aggregate, we will end up with it summing every record in the db every time we do an insert, which is not what we want. We also want to make sure that no account_id's are null and no amounts are null. Additionally it is not possible in the schema above to easily expose the journal line information to purely relational tools. However we can use a VIEW to do this, though this produces yet more problems. Finally referential integrity enforcement between the account lines and accounts cannot be done declaratively. We will have to create TRIGGERs to enforce this manually. These problems are traded off against the fact that the relational model does not allow for the first problem to be solved at all so we trade off the fact that we have some solutions which are a bit of a pain for the fact that we have some solutions at all. Nested Table Constraints If we simply had a tuple as a column, we could look inside the tuple with check constraints. Something like check((column).subcolumn is not null). However in this case we cannot do that because we need to aggregate on a set of tuples attached to the row. To do this instead we create a set of table methods for managing the constraints: CREATE OR REPLACE FUNCTION is_balanced(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ SELECT sum(amount) = 0 FROM unnest($1.line_items); $$; CREATE OR REPLACE FUNCTION has_no_null_account_ids(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ SELECT bool_and(account_id is not null) FROM unnest($1.line_items); $$; CREATE OR REPLACE FUNCTION has_no_null_amounts(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ select bool_and(amount is not null) from unnest($1.line_items); $$; We can then create our constraints. Note that because we have to create the methods first, we have to add our constraints after the functions are defined, and these are added after the table is constructed. I have gone ahead and given these friendly names so that errors are easier for people (and machines) to process and handle. ALTER TABLE journal_entry ADD CONSTRAINT is_balanced CHECK ((journal_entry).is_balanced); ALTER TABLE journal_entry ADD CONSTRAINT has_no_null_account_ids CHECK ((journal_entry).has_no_null_account_ids); ALTER TABLE journal_entry ADD CONSTRAINT has_no_null_amounts CHECK ((journal_entry).has_no_null_amounts); Now we have integrity constraints reaching into our nested data. So let's test this out. insert into journal_type (label) values ('General'); We will re-use the account data from the previous post: or_examples=# select * from account; id | control_code | description ----+--------------+------------- 1 | 1500 | Inventory 2 | 4500 | Sales 3 | 5500 | Purchase (3 rows) Let's try inserting a few meaningless transactions, some of which violate our constraints: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type]); ERROR: new row for relation "journal_entry" violates check constraint "is_balanced" So far so good. insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(null, -100)::journal_line_type]); ERROR: new row for relation "journal_entry" violates check constraint "has_no_null_account_ids" Still good. insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(2, -100)::journal_line_type, row(3, NULL)::journal_line_type]) ERROR: new row for relation "journal_entry" violates check constraint "has_no_null_amounts" Great. All constraints working properly. Let's try inserting a valid row: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(2, -100)::journal_line_type]); And it works! or_examples=# select * from journal_entry; id | journal_type | source_document_id | date_posted | description | li ne_items ----+--------------+--------------------+-------------+----------------+------------------------ 5 | 1 | ref-10001 | 2012-08-23 | This is a test | {"(1,100)","(2,-100)"} (1 row) Break-Out Views A second major problem that we will be facing with this schema is that if someone wants to create a report using a reporting tool that only really supports relational data very well, then the financial data will be opaque and not available. This scenario is one of the reasons why I think it is important generally to push the relational model to its breaking point before looking at object-relational functions. Consequently I think when doing nested tables it is important to ensure that the data in them is available through a relational interface, in this case, a view. In this case, we may want to model debits and credits in a way which is re-usable, so we will start by creating two type methods: CREATE OR REPLACE FUNCTION debits(journal_line_type) RETURNS NUMERIC LANGUAGE SQL AS $$ SELECT CASE WHEN $1.amount < 0 THEN $1.amount * -1 ELSE NULL END $$; CREATE OR REPLACE FUNCTION credits(journal_line_type) RETURNS NUMERIC LANGUAGE SQL AS $$ SELECT CASE WHEN $1.amount > 0 THEN $1.amount ELSE NULL END $$; Now we can use these as virtual columns anywhere a journal_line_type is used. The view definition itself is rather convoluted and this may impact performance. I am waiting for the LATERAL construct to become available which will make this easier. CREATE VIEW journal_line_items AS SELECT id AS journal_entry_id, (li).*, (li).debits, (li).credits FROM (SELECT je.*, unnest(line_items) li FROM journal_entry je) j; Remember li.debits and li.credits gets turned by the parser into debits(li) and credits(li), allowing for class.method notation here. Testing this out: SELECT * FROM journal_line_items; gives us journal_entry_id | account_id | amount | debits | credits ------------------+------------+--------+--------+--------- 5 | 1 | 100 | | 100 5 | 2 | -100 | 100 | 6 | 1 | 200 | | 200 6 | 3 | -200 | 200 | As you can see, this works. Now people with purely relational tools can access the information in the nested table. In general it is almost always worth creating break-out views of this sort where nested data is stored. However it is important to note that with larger data sets this is insufficient because indexing considerations makes it hard to look up specific information on a row level. This may or may not be the end of the world depending on data set size. Referential Integrity Controls The final problem is that relational integrity is not a well defined concept for nested data. For this reason, if we value relational integrity and foreign keys are involved, we must find ways of enforcing these. The simplest solution is a trigger which runs on insert, update, or delete, and manages another relation which can be used as a proxy for relational integrity checks. For example, we could: CREATE TABLE je_account ( je_id int references journal_entry (id), account_id int references account(id), primary key (je_id, account_id) ); This will be a very narrow table and so should be quick to search. It may also be useful in determining which accounts to look at for transactions if we need to do that. This table could then be used to optimize queries. To maintain the table we need to recognize that never ever will a journal entry's line items be updated or deleted. This is due to the need to maintain clear audit controls and trails. We may add other flags to the table to indicate transactions but we can handle insert, update, and delete conditions with a trigger, namely: CREATE FUNCTION je_ri_management() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ DECLARE accounts int[]; BEGIN IF TG_OP ILIKE 'INSERT' THEN INSERT INTO je_account (je_id, account_id) SELECT NEW.id, account_id FROM unnest(NEW.line_items) GROUP BY account_id; RETURN NEW; ELSIF TG_OP ILIKE 'UPDATE' THEN IF NEW.line_items <> OLD.line_items THEN RAISE EXCEPTION 'Cannot journal entry line items!'; ELSE RETURN NEW; END IF; ELSIF TG_OP ILIKE 'DELETE' THEN RAISE EXCEPTION 'Cannot delete journal entries!'; ELSE RAISE EXCEPTION 'Invalid TG_OP in trigger'; END IF; END; $$; Then we add the trigger with: CREATE TRIGGER je_breakout_for_ri AFTER INSERT OR UPDATE OR DELETE ON journal_entry FOR EACH ROW EXECUTE PROCEDURE je_ri_management(); The final invalid TG_OP could be omitted but this is not a bad check to have. Let's try this out: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10003', now()::date, 'This is a test', ARRAY[row(1, 200)::journal_line_type, row(3, -200)::journal_line_type]); or_examples=# select * from je_account; je_id | account_id -------+------------ 10 | 3 10 | 1 (2 rows) In this way referential integrity can be enforced. Solution 2.0: Refactoring the above to eliminate the view. The above solution will work great for small businesses but for larger businesses, querying this data will become slow for certain kinds of reports. Storage here is tied to a specific criteria, and indexing is somewhat problematic. There are ways we can address this, but they are not always optimal. At the same time our work is simplified because the actual accounting details are append-only. One solution to this is to refactor the above solution. Instead of: Main table Relational view Materialized view for referential integrity checking we can have: Main table, with tweaked storage for line items Materialized view for RI checking and relational access Unfortunately this sort of refactoring after the fact isn't simple. Typically you want to convert the journal_line_type type to a journal_line_type table, and inherit this in your materialized view table. You cannot simply drop and recreate since the column you are storing the data in is dependent on the structure. The solution is to rename the type, create a new one in its place. This must be done manually and there is no current capability to copy a composite type's structure into a table. You will then need to create a cast and a cast function. Then, when you can afford the downtime, you will want to convert the table to the new type. It is quite possible that the downtime will be delayed and you will have an extended time period where you are half-way through migrating the structure of your database. You can, however, decide to create a cast between the table and the type, perhaps an implicit one (though this is not inherited) and use this to centralize your logic. Unfortunately this leads to duplication-related complexity and in an ideal world would be avoided. However, assuming that the downtime ends up being tolerable, the resulting structures will end up such that they can be more readily optimized for a variety of workloads. In this regard you would have a main table, most likely with line_items moved to extended storage, whose function is to model journal entries as journal entries and apply relevant constraints, and a second table which models journal entry lines as independent lines. This also simplifies some of the constraint issues on the first table, and makes the modelling easier because we only have to look into the nested storage where we are looking at subset constraints. This section then provides a warning regarding the use of advanced ORDBMS functionality, namely that it is easy to get tunnel vision and create problems for the future. The complexity cost here is so high, that the primary model should generally remain relational, with things like nested storage primarily used to create constraints that cannot be effectively modelled otherwise. However, this becomes a great deal more complicated where values may be update or deleted. Here, however, we have a relatively simple case regarding data writes combined with complex constraints that cannot be effectively expressed in normalized, relational SQL. Therefore the standard maintenance concerns that counsel against duplicating information may give way to the fact that such duplication allows for richer constraints. Now, if we had been aware of the problems going in we would have chosen this structure all along. Our design would have been: CREATE TYPE journal_line AS ( entry_id bigserial primary key, --only possible key je_id int not null, account_id int, amount numeric ); After creating the journal entry table we'd: ALTER TABLE journal_line ADD FOREIGN KEY (je_id) REFERENCES journal_entry(id); If we have to handle purging old data we can make that key ON DELETE CASCADE. And the lines would have been of this type instead. We can then get rid of all constraints and their supporting functions other than the is_balanced one. Our debit and credit functions then also reference this type. Our trigger then looks like: CREATE FUNCTION je_ri_management() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ DECLARE accounts int[]; BEGIN IF TG_OP ILIKE 'INSERT' THEN INSERT INTO journal_line (je_id, account_id, amount) SELECT NEW.id, account_id, amount FROM unnest(NEW.line_items); RETURN NEW; ELSIF TG_OP ILIKE 'UPDATE' THEN RAISE EXCEPTION 'Cannot journal entry line items!'; ELSIF TG_OP ILIKE 'DELETE' THEN RAISE EXCEPTION 'Cannot delete journal entries!'; ELSE RAISE EXCEPTION 'Invalid TG_OP in trigger'; END IF; END; $$; Approval workflows can be handled with a separate status table with its own constraints. Deletions of old information (up to a specific snapshot) can be handled by a stored procedure which is unit tested and disables this trigger before purging data. This system has the advantage of having several small components which are all complete and easily understood, and it is made possible because the data is exclusively append-only. As you can see from the above examples, nested data structures greatly complicate the data model and create problems with relational math that must be addressed if data logic will remain meaningful. This is a complex field, and it adds a lot of complexity to storage. In general, these are best avoided in actual data storage except where this approach makes formerly insurmountable problems manageable. Moreover, they add complexity to optimization once data gets large. Thus while non-atomic fields in this regard make sense as an initial point of entry in some narrow cases, as a point of actual query, they are very rarely the right approaches. It is possible that, at some point, nested storage will be able to have its own indexes, foreign keys, etc. but I cannot imagine this being a high priority and so it isn't clear that this will ever happen. In general, it usually makes the most sense to simply store the data in a pseudo-normalized way, with any non-1NF designs being the initial point of entry in a linear write model. Nested Data Structures as Interfaces Nested data structures as interfaces to stored procedures are a little more manageable. The main difficulties are in application-side data construction and output parsing. Some languages handle this more easily than others. Upper-level construction and handling of these structures is relatively straight-forward on the database-side and poses none of these problems. However, they do cause additional complexity and this must be managed carefully. The biggest issue when interfacing with an application is that ROW types are not usually automatically constructed by application-level frameworks even if they have arrays. This leaves the programmer to choose between unstructured text arrays which are fundamentally non-discoverable (and thus brittle), and arrays of tuples which are discoverable but require a lot of additional application code to handle. At the same time as a chicken and egg problem, frameworks will not add handling for this sort of problem unless people are already trying to do it. So my general recommendation is to use nested data types everywhere in the database sparingly, only where the benefits clearly outweigh the complexity costs. Complexity costs are certainly lower in the interface level and there are many more cases where it these techniques are net wins there, but that does not mean that they should be routinely used even there.
September 25, 2012
by Chris Travers
· 20,825 Views
article thumbnail
Asynchronous WMI Queries: Stay Away From Them
So, it turns out that I have a WMI category on my blog. During the last couple of years I almost forgot about it, but WMI got a chance to wrap its poisonous tentacles around me again yesterday. Here’s another story. WMI is known for requiring lots of attention to security. To establish a WMI connection to a remote machine, you need to muck around with registry settings, DCOM configuration, group policy details, and other infernal things which we developers like to defer to someone else. But at least you know that once a machine has been configured properly to give you access through WMI, you can then access it from any other machine. Right? Right? Not so much. WMI has a concept of asynchronous queries, which are notably used for receiving event notifications. For example, the following code registers for an event notification whenever a process is created on my desktop machine: ManagementScope scope = new ManagementScope(@"\\sasha-desktop\root\cimv2"); WqlEventQuery query = new WqlEventQuery( "SELECT * FROM Win32_ProcessStartTrace"); ManagementEventWatcher watcher = new ManagementEventWatcher(scope, query); watcher.EventArrived += (o, e) => ...; //TODO: process the event watcher.Start(); Indeed, this thing works just fine if you point it to a local machine; but it fails when you call the Start method when you connect it to a remote machine. You could now strip the remote machine bare and have it expose its very innate networking guts to the entire Internet, and it still wouldn’t help you establish the connection. Interesting. When troubleshooting this nasty bug, I looked up a VBScript sample that receives new process creation events on another machine. Here it is: Set wmi = GetObject("winmgmts:\\sasha-desktop\root\cimv2") Set query = wmi.ExecNotificationQuery _ ("SELECT * FROM Win32_ProcessStartTrace'") Set process = query.NextEvent VBScript and all, it worked just fine. I started to suspect something smelly in the kingdom of .NET, so I rewrote the VBScript sample in C#, using the long-forgotten Microsoft.VisualBasic.Interaction class: dynamic wmi = Microsoft.VisualBasic.Interaction.GetObject( "winmgmts:\\sasha-desktop\root\cimv2"); dynamic query = wmi.ExecNotificationQuery( "SELECT * FROM Win32_ProcessStartTrace"); dynamic evt = query.NextEvent; This, too, worked just fine – although it’s not much a surprise, as it’s pretty much equivalent to the VBScript code at this time. Still interesting. This is when it hit me – the asynchronous nature of the ManagementEventWatcher.EventArrived event relies on an asynchronous WMI query, which requires a reverse connection to the client machine! This is configuration inferno, x2, on the client machine now, what with the DCOM security settings and sacrifices to the gods of group policy. Unless, of course, we give away the asynchrony and rely on the ManagementEventWatcher.WaitForNextEvent method. It’s synchronous. It burns a thread that has to sit idly by and wait while its siblings execute useful work. But it doesn’t establish a reverse DCOM connection to the caller. At least that.
September 22, 2012
by Sasha Goldshtein
· 11,029 Views
article thumbnail
How To Create A Theme Options Page For WordPress
If you have ever used a WordPress premium theme then you would of seen the custom theme options page that is available. The theme options page that is found under the appearance menu which allows the admin of the WordPress site to change some of the settings on the theme. Most premium themes will come with options to change the colors of fonts, backgrounds, change images or font types...anything that allows you to style the WordPress theme. Some of the most common fields to change are: Theme Options - To edit the theme logo, change the stylesheet, upload a new favicon, Add Google analytics code, enter your feedburner URL and add custom CSS. Styling Options - Change the background colour or change the background image. Fonts - Change the font on all your header tags or the main content text. Social - Providing you theme with your social media profiles will make it easier to link to them in parts of your theme or display your latest tweets. Option pages can also be used on plugins to change settings and to customize the plugin. Examples Of Theme Options Pages Here is what some of the theme options page from premium themes look like. How To Build A Theme Option Page When creating an option page there are a few things you need to setup. Add Menu - If you want to display the menu under the appearance menu or if you want to give the options page it's own menu. Add Sections - These are sections of settings you are adding to the options page. Register Settings - Settings are the different fields you are adding to the options page, they need to be registered with the settings API. Display Settings - The settings API will be used to call a function to display the setting. Validate Setting - When the user saves the settings field the input will need to be validated before stored in the options table. Feedback Messages - When the settings are saved you need to be able to feedback to the user if the settings were saved successfully or if there was an error during validation. To help us perform all these tasks there is a WordPress API called the Settings API. This API allows admin pages to handle setting forms semi-automatically. With the API you can define pages for the settings, sections for the settings and fields for the settings. This works by registering setting fields to be displayed within sections and page will display these sections. WordPress uses the Settings API by default on existing admin pages, this means that by using the Settings API you can add to existing pages by registering new settings. All validation must be performed by the developer of the settings pages but the Settings API will control the creation of the form and storing the values in the form in the options table. Add Menu To WordPress Admin When adding a menu to the WordPress admin screen you have loads of flexibility you have the option of adding brand new menu items or adding the menu as a sub menu. To add a top level menu just use the following function add_menu_page(). $page_title - The title used on the settings page. $menu_title - The title used on the menu. $capability - Only displays the menu if the user matches this capability. $menu_slug - The unique name of the menu slug. $function - This is the callback function to run to display the page. $icon_url - Display a icon just for the menu. $position - This allows you to choose when the menu item appears in the list. If you prefer to have the menu under the appearance parent menu you can use the following code snippet. Or you can use the function add_theme_page() which will add a sub-menu under the appearance menu. add_theme_page( $page_title, $menu_title, $capability, $menu_slug, $function); Registering The Settings To start off we need to register the settings group we are going to store the settings page values. This will use the Settings API to define the group of settings, we will then add the settings to a group. When you store the settings in this group they are stored in the wp_options database table so you can get these values out at a later date. The wp_options table is a key value pairing stored in the database. This is what you should use when storing long term data on your WordPress site. If you are storing a lot of data it's best practice to turn the data into an array and store it under one key, instead of storing all the values over multiple keys. This means that if you have a settings page to change the site logo, background color, font, font size etc, you won't have an option for each of these but you will group them into an option group. The reason you do this is to increase on database efficiency by not adding too many rows to the options database. To register settings on the Settings API you need to use the function register_setting(). The parameters you pass into this are: Option Group - The name of the group of settings you are going to store. This must match the group name used in the settings_field() function. Option name - The name of the option which will be saved, this is the key that is used in the options table. Sanitize Callback - This is the function that is used to validate the settings for this option group. Add Sections To Settings Once the settings are registered we can add section groups to the Settings API. This will allow us to organise the settings on the page, so that you can add styles to display these differently on the page. The benefit of adding sections on your Settings API is so that we can call the function do_settings_sections() as this will display all the settings under this one section. To create you own settings all you have to do is use the function add_settings_section(). The parameters you need to use on this function are: Id - String to use for the ID of the section. Title - The title to use on the section. Callback - This is the function that will display the settings on the page. Page - This is the page that is displaying the section, should match the menu slug of the page. Add Fields To The Sections The last important function we need to use to add settings to the page is the add_settings_field() function, this is used as part of the Settings API to define fields to a section. The function will need to know the page slug and the section Id before you can define the settings to use. All the settings which you setup here will be stored in the options table under the key used in the register_settings() function. To use this function you need to add the following parameters. ID - ID of the field Title - Title of the field. Callback - Function used to display the setting. This is very important as it is used to display the input field you want. Page - Page which is going to display the field should be the same as the menu slug on the section. Section - Section Id which the field will be added to. $args - Additional arguments which are passed to the callback function. Example Of Using The Settings API There is a lot of information to take in above to create this settings page so it can seem a bit complicated but once you get your head around the structure the Settings API uses it's actually quite easy to understand. The best way to understand how this all works is to show you with an example. Create A Theme Option Page With A Textbox Field In this example we will create a theme option page and add a textbox on the page to add additional text to the index.php. Just add the following to your functions.php file to create a theme options page. First we start off by creating the menu item under the appearance menu by using the add_theme_page() function on the admin_menu action. /** * Theme Option Page Example */ function pu_theme_menu() { add_theme_page( 'Theme Option', 'Theme Options', 'manage_options', 'pu_theme_options.php', 'pu_theme_page'); } add_action('admin_menu', 'pu_theme_menu'); As you can see above we set the callback function to the theme options page to be pu_theme_page so we need to create this function to display our page. Here we create a form to submit to the options.php so that we can save in the options table, we call settings_fields() to the get the settings in register_settings() and use the do_settings_sections() function to display our settings. /** * Callback function to the add_theme_page * Will display the theme options page */ function pu_theme_page() { ?> Custom Theme Options Created by Paulund. 'text', 'id' => 'pu_textbox', 'name' => 'pu_textbox', 'desc' => 'Example of textbox description', 'std' => '', 'label_for' => 'pu_textbox', 'class' => 'css_class' ); add_settings_field( 'example_textbox', 'Example Textbox', 'pu_display_setting', 'pu_theme_options.php', 'pu_text_section', $field_args ); } The callback function on creating sections can be used to add addition information that will appear above every section, on this example we are just leaving it blank. /** * Function to add extra text to display on each section */ function pu_display_section($section){ } The callback function on the add_settings_field() function is pu_display_setting, this is the function that is going to echo the display of any input's on the page. The parameter to this function is the $args value on the add_settings_field() we can use this to add things like id, name, default value etc. We want to get any existing values from the wp_option table to display any values which previously typed in by the user, do to this we get the values from the table by using the get_option() function. /** * Function to display the settings on the page * This is setup to be expandable by using a switch on the type variable. * In future you can add multiple types to be display from this function, * Such as checkboxes, select boxes, file upload boxes etc. */ function pu_display_setting($args) { extract( $args ); $option_name = 'pu_theme_options'; $options = get_option( $option_name ); switch ( $type ) { case 'text': $options[$id] = stripslashes($options[$id]); $options[$id] = esc_attr( $options[$id]); echo ""; echo ($desc != '') ? "$desc" : ""; break; } } Finally we can validate the values added to the form by creating the validation callback function pu_validate_settings. All this does at the moment is loop through the inputs passed to it and checks if it's a letter or a number. The return of this function is what will be added to the database. /** * Callback function to the register_settings function will pass through an input variable * You can then validate the values and the return variable will be the values stored in the database. */ function pu_validate_settings($input) { foreach($input as $k => $v) { $newinput[$k] = trim($v); // Check the input is a letter or a number if(!preg_match('/^[A-Z0-9 _]*$/i', $v)) { $newinput[$k] = ''; } } return $newinput; } If you copy all the snippets above into your functions.php file you will see this options form under the appearance menu. Using Theme Options Within Your Theme Now that you understand how to create a theme options page you need to be able to use this value in your theme so you can change the settings. All the settings are stored in the wp_options table with WordPress it's very easy to get these values out all you have to do is use the get_option() function. The option name is the name you put on the register_settings() function. So in our example above you will use this code. The $options variable will now store an array of the values from the theme options, which you can display the value of the textbox we put on the page by using this snippet. Conclusion That's the basics that you need to understand to use the Settings API, now you can take this information and create your own theme options page. Experiment with different input types you can add to the form, experiment with different validation methods you want to use. In future tutorials I will post how you can use some of the inbuilt WordPress third party applications to create a better user experience on your theme options panel. This will include things like color pickers, date pickers, jQuery UI features etc. As you can see we have created a settings option page in just over 100 lines of code, so it's not a hard thing to do but that are a few steps to it and the features can be expanded on. For this reason people have created theme option frameworks to allow you to easily create a theme option page with much higher level of complexity with the options. But like many other frameworks I always recommend you learn the basics before using a framework, this is why it's important to understand how the Settings API works before using or creating a settings page framework.
September 18, 2012
by Paul Underwood
· 23,708 Views
article thumbnail
Fixing Bugs - If You Can't Reproduce a Bug, You Can't Fix It
Fixing a problem usually starts with reproducing it – what Steve McConnell calls “stabilizing the error.” Technically speaking, you can’t be sure you are fixing the problem unless you can run through the same steps, see the problem happen yourself, fix it, and then run through the same steps and make sure that the problem went away. If you can’t reproduce it, then you are only guessing at what’s wrong, and that means you are only guessing that your fix is going to work. But let’s face it – it’s not always practical or even possible to reproduce a problem. Lots of bug reports don’t include enough information for you to understand what the hell the problem actually was, never mind what was going on when the problem occurred – especially bug reports from the field. Rahul Premraj and Thomas Zimmermann found in The Art of Collecting Bug Reports (from the book Making Software), that the two most important factors in determining whether a bug report will get fixed or not are: Is the description well-written, can the programmer understand what was wrong or why the customer thought something was wrong? Does it include steps to reproduce the problem, even basic information about what they were doing when the problem happened? It’s not a lot to ask – from a good tester at least. But you can’t reasonably expect this from customers. There are other cases where you have enough information, but don’t have the tools or expertise to reproduce a problem – for example, when a pen tester has found a security bug using specialist tools that you don’t have or don’t understand how to use. Sometimes you can fix a problem without being able to see it happen in front of you, come up with a theory on your own, trusting your gut – especially if this is code that you recently worked on. But reproducing the problem first gives you the confidence that you aren’t wasting your time and that you actually fixed the right issue. Trying to reproduce the problem should almost always be your first step. What’s involved in reproducing a bug? What you want to do is to find, as quickly as possible, a simple test that consistently shows the problem, so that you can then run a set of experiments, trace through the code, isolate what’s wrong, and prove that it went away after you fixed the code. The best explanation that I’ve found of how to reproduce a bug is in Debug It! where Paul Butcher patiently explains the pre-conditions (identifying the differences between your test environment and the customer’s environment, and trying to control as many of them as possible), and then how to walk backwards from the error to recreate the conditions required to make the problem happen again. Butcher is confident that if you take a methodical approach, you will (almost) always be able to reproduce the problem successfully. In Why Programs Fail: A guide to Systematic Debugging, Andreas Zeller, a German Comp Sci professor, explains that it’s not enough just to make the problem happen again. Your goal is to come up with the simplest set of circumstances that will trigger the problem – the smallest set of data and dependencies, the simplest and most efficient test(s) with the fewest variables, the shortest path to making the problem happen. You need to understand what is not relevant to the problem, what’s just noise that adds to the cost and time of debugging and testing – and get rid of it. You do this using binary techniques to slice up the input data set, narrowing in on the data and other variables that you actually need, repeating this until the problem starts to become clear. Code Complete’s chapter on Debugging is another good guide on how to reproduce a problem following a set of iterative steps, and how to narrow in on the simplest and most useful set of test conditions required to make the problem happen; as well as common places to look for bugs: checking for code that has been changed recently, code that has a history of other bugs, code that is difficult to understand (if you find it hard to understand, there’s a good chance that the programmers who worked on it before you did too). Replay Tools One of the most efficient ways to reproduce a problem, especially in server code, is by automatically replaying the events that led up to the problem. To do this you’ll need to capture a time-sequenced record of what happened, usually from an audit log, and a driver to read and play the events against the system. And for this to work properly, the behavior of the system needs to be deterministic – given the same set of inputs in the same sequence, the same results will occur each time. Otherwise you’ll have to replay the logs over and over and hope for the right set of circumstances to occur again. On one system that I worked on, the back-end engine was a deterministic state machine designed specifically to support replay. All of the data and events, including configuration and control data and timer events, were recorded in an inbound event log that we could replay. There were no random factors or unpredictable external events – the behavior of the system could always be recreated exactly by replaying the log, making it easy to reproduce bugs from the field. It was a beautiful thing, but most code isn’t designed to support replay in this way. Recent research in virtual machine technology has led to the development of replay tools to snapshot and replay events in a virtual machine. VMWare Workstation, for example, included a cool replay debugging facility for C/C++ programmers which was “guaranteed to have instruction-by-instruction identical behavior each time.” Unfortunately, this was an expensive thing to make work, and it was dropped in version 8, at the end of last year. Replay Solutions provides replay for Java programs, creating a virtual machine to record the complete stream of events (including database I/O, network I/O, system calls, interrupts) as the application is running, and then later letting you simulate and replay the same events against a copy of the running system, so that you can debug the application and observe its behavior. They also offer similar application record and replay technology for mobile HTML5 and JavaScript applications. This is exciting stuff, especially for complex systems where it is difficult to setup and reproduce problems in different environments. Fuzzing and Randomness If the problem is non-deterministic, or you can't come up with the right set of inputs, one approach to try is to simulate random data inputs and watch to see what happens - hoping to happen on a set of input variables that will trigger the problem. This is called fuzzing. Fuzzing is a brute force testing technique that is used to uncover data validation weaknesses that can cause reliability and security problems. It's effective at finding bugs, but it’s a terribly inefficient way to reproduce a specific problem. First you need to setup something to fuzz the inputs (this is easy if a program is reading from a file, or a web form – there are fuzzing tools to help with this – but a hassle if you need to write your own smart protocol fuzzer to test against internal APIs). Then you need time to run through all of the tests (with mutation fuzzing, you may need to run tens of thousands or hundreds of thousands of tests to get enough interesting combinations) and more time to sift through and review all of the test results and understand any problems that are found. Through fuzzing you will get new information about the system to help you identity problem areas in the code, and maybe find new bugs, but you may not end up any closer to fixing the problem that you started on. Reproducing problems, especially when you are working from a bad bug report (“the system was running fine all day, then it crashed… the error said something about a null pointer I think?”) can be a serious time sink. But what if you can’t reproduce the problem at all? Let’s look at that next…
September 9, 2012
by Jim Bird
· 45,531 Views
  • Previous
  • ...
  • 511
  • 512
  • 513
  • 514
  • 515
  • 516
  • 517
  • 518
  • 519
  • 520
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×