Data Resources

The Latest Data Topics

Groovy Goodness: Remove Part of String With Regular Expression Pattern

Since Groovy 2.2 we can subtract a part of a String value using a regular expression pattern. The first match found is replaced with an empty String. In the following sample code we see how the first match of the pattern is removed from the String: // Define regex pattern to find words starting with gr (case-insensitive). def wordStartsWithGr = ~/(?i)\s+Gr\w+/ assert ('Hello Groovy world!' - wordStartsWithGr) == 'Hello world!' assert ('Hi Grails users' - wordStartsWithGr) == 'Hi users' // Remove first match of a word with 5 characters. assert ('Remove first match of 5 letter word' - ~/\b\w{5}\b/) == 'Remove match of 5 letter word' // Remove first found numbers followed by a whitespace character. assert ('Line contains 20 characters' - ~/\d+\s+/) == 'Line contains characters' Code written with Groovy 2.2.

November 23, 2013

by Hubert Klein Ikkink

· 19,859 Views

Deep Dive into Connection Pooling

As your application grows in functionality and/or usage, managing resources becomes increasingly important. Failure to properly utilize connection pooling is one major “gotcha” that we’ve seen greatly impact MongoDB performance and trip up developers of all levels. Connection Pools Creating new authenticated connections to the database is expensive. So, instead of creating and destroying connections for each request to the database, you want to re-use existing connections as much as possible. This is where connection pooling comes in. A Connection Pool is a cache of database connections maintained by your driver so that connections can be re-used when new connections to the database are required. When properly used, connection pools allow you to minimize the frequency and number of new connections to your database. Connection Churn Used improperly however, or not at all, your application will likely open and close new database connections too often, resulting in what we call “connection churn”. In a high-throughput application this can result in a constant flood of new connection requests to your database which will adversely affect the performance of your database and your application. Opening Too Many Connections Alternately, although less common, is the problem of creating too many MongoClient objects that are never closed. In this case, instead of churn, you get a steady increase in the number of connections to your database such that you have tens of thousands of connections open when you application could almost certainly due with far fewer. Since each connection takes RAM, you may find yourself wasting a good portion of your memory on connections which will also adversely affect your application’s performance. Although every application is different and the total number of connections to your database will greatly depend on how many client processes or application servers are connected, in our experience, any connection count great than 1000 – 1500 connections should raise an eyebrow, and most of the time your application will require far fewer than that. MongoClient and Connection Pooling Most MongoDB language drivers implement the MongoClient class which, if used properly, will handle connection pooling for you automatically. The syntax differs per language, but often you do something like this to create a new connection-pool-enabled client to your database: mongoClient = new MongoClient(URI, connectionOptions); Here the mongoClient object holds your connection pool, and will give your app connections as needed. You should strive to create this object once as your application initializes and re-use this object throughout your application to talk to your database. The most common connection pooling problem we see results from applications that create a MongoClient object way too often, sometimes on each database request. If you do this you will not be using your connection pool as each MongoClient object maintains a separate pool that is not being reused by your application. Example with Node.js Let’s look at a concrete example using the Node.js driver. Creating new connections to the database using the Node.js driver is done like this: mongodb.MongoClient.connect(URI, function(err, db) { // database operations }); The syntax for using MongoClient is slightly different here than with other drivers given Node’s single-threaded nature, but the concept is the same. You only want to call ‘connect’ once during your apps initialization phase vs. on each database request. Let’s take a closer look at the difference between doing the right thing vs. doing the wrong thing. Note: If you clone the repo from here, the logger will output your logs in your console so you can follow along. Consider the following examples: var express = require('express'); var mongodb = require('mongodb'); var app = express(); var MONGODB_URI = 'mongo-uri'; app.get('/', function(req, res) { // BAD! Creates a new connection pool for every request mongodb.MongoClient.connect(MONGODB_URI, function(err, db) { if(err) throw err; var coll = db.collection('test'); coll.find({}, function(err, docs) { docs.each(function(err, doc) { if(doc) { res.write(JSON.stringify(doc) + "\n"); } else { res.end(); } }); }); }); }); // App may initialize before DB connection is ready app.listen(3000); console.log('Listening on port 3000'); The first (no pooling): calls connect() in every request handler establishes new connections for every request (connection churn) initializes the app (app.listen()) before database connections are made var express = require('express'); var mongodb = require('mongodb'); var app = express(); var MONGODB_URI = 'mongodb-uri'; var db; var coll; // Initialize connection once mongodb.MongoClient.connect(MONGODB_URI, function(err, database) { if(err) throw err; db = database; coll = db.collection('test'); app.listen(3000); console.log('Listening on port 3000'); }); // Reuse database/collection object app.get('/', function(req, res) { coll.find({}, function(err, docs) { docs.each(function(err, doc) { if(doc) { res.write(JSON.stringify(doc) + "\n"); } else { res.end(); } }); }); }); The second (with pooling): calls connect() once reuses the database/collection variable (reuses existing connections) waits to initialize the app until after the database connection is established If you run the first example and refresh your browser enough times, you’ll quickly see that your MongoDB has a hard time handling the flood of connections and will terminate. Further Consideration – Connection Pool Size Most MongoDB drivers support a parameter that sets the max number of connections (pool size) available to your application. The connection pool size can be thought of as the max number of concurrent requests that your driver can service. The default pool size varies from driver to driver, e.g. for Node it is 5, whereas for Python it is 100. If you anticipate your application receiving many concurrent or long-running requests, we recommend increasing your pool size- adjust accordingly!

November 7, 2013

by Chris Chang

· 24,522 Views · 2 Likes

Data Access Module using Groovy with Spock testing

This blog is more of a tutorial where we describe the development of a simple data access module, more for fun and learning than anything else. All code can be found here for those who don’t want to type along: https://github.com/ricston-git/tododb As a heads-up, we will be covering the following: Using Groovy in a Maven project within Eclipse Using Groovy to interact with our database Testing our code using the Spock framework We include Spring in our tests with ContextConfiguration A good place to start is to write a pom file as shown here. The only dependencies we want packaged with this artifact are groovy-all and commons-lang. The others are either going to be provided by Tomcat or are only used during testing (hence the scope tags in the pom). For example, we would put the jar with PostgreSQL driver in Tomcat’s lib, and tomcat-jdbc and tomcat-dbcp are already there. (Note: regarding the postgre jar, we would also have to do some minor configuration in Tomcat to define a DataSource which we can get in our app through JNDI – but that’s beyond the scope of this blog. See here for more info). Testing-wise, I’m depending on spring-test, spock-core, and spock-spring (the latter is to get spock to work with spring-test). Another significant addition in the pom is the maven-compiler-plugin. I have tried to get gmaven to work with Groovy in Eclipse, but I have found the maven-compiler-plugin to be a lot easier to work with. With your pom in an empty directory, go ahead and mkdir -p src/main/groovy src/main/java src/test/groovy src/test/java src/main/resources src/test/resources. This gives us a directory structure according to the Maven convention. Now you can go ahead and import the project as a Maven project in Eclipse (install the m2e plugin if you don’t already have it). It is important that you do not mvn eclipse:eclipse in your project. The .classpath it generates will conflict with your m2e plugin and (at least in my case), when you update your pom.xml the plugin will not update your dependencies inside Eclipse. So just import as a maven project once you have your pom.xml and directory structure set up. Okay, so our tests are going to be integration tests, actually using a PostgreSQL database. Since that’s the case, lets set up our database with some data. First go ahead and create a tododbtest database which will only be used for testing purposes. Next, put the following files in your src/test/resources: Note, fill in your username/password: DROP TABLE IF EXISTS todouser CASCADE; CREATE TABLE todouser ( id SERIAL, email varchar(80) UNIQUE NOT NULL, password varchar(80), registered boolean DEFAULT FALSE, confirmationCode varchar(280), CONSTRAINT todouser_pkey PRIMARY KEY (id) ); insert into todouser (email, password, registered, confirmationCode) values ('[email protected]', 'abc123', FALSE, 'abcdefg') insert into todouser (email, password, registered, confirmationCode) values ('[email protected]', 'pass1516', FALSE, '123456') insert into todouser (email, password, registered, confirmationCode) values ('[email protected]', 'anon', FALSE, 'codeA') insert into todouser (email, password, registered, confirmationCode) values ('[email protected]', 'anon2', FALSE, 'codeB') Basically, testContext.xml is what we’ll be configuring our test’s context with. The sub-division into datasource.xml and initdb.xml may be a little too much for this example… but changes are usually easier that way. The gist is that we configure our data source in datasource.xml (this is what we will be injecting in our tests), and the initdb.xml will run the schema.sql and test-data.sql to create our table and populate it with data. So lets create our test, or should I say, our specification. Spock is specification framework that allows us to write more descriptive tests. In general, it makes our tests easier to read and understand, and since we’ll be using Groovy, we might as well make use of the extra readability Spock gives us. package com.ricston.blog.sample.model.spec; import javax.sql.DataSource import org.springframework.beans.factory.annotation.Autowired import org.springframework.test.annotation.DirtiesContext import org.springframework.test.annotation.DirtiesContext.ClassMode import org.springframework.test.context.ContextConfiguration import spock.lang.Specification import com.ricston.blog.sample.model.data.TodoUser import com.ricston.blog.sample.model.dao.postgre.PostgreTodoUserDAO // because it supplies a new application context after each test, the initialize-database in initdb.xml is // executed for each test/specification @DirtiesContext(classMode=ClassMode.AFTER_EACH_TEST_METHOD) @ContextConfiguration('classpath:testContext.xml') class PostgreTodoUserDAOSpec extends Specification { @Autowired DataSource dataSource PostgreTodoUserDAO postgreTodoUserDAO def setup() { postgreTodoUserDAO = new PostgreTodoUserDAO(dataSource) } def "findTodoUserByEmail when user exists in db"() { given: "a db populated with a TodoUser with email [email protected] and the password given below" String email = '[email protected]' String password = 'anon' when: "searching for a TodoUser with that email" TodoUser user = postgreTodoUserDAO.findTodoUserByEmail email then: "the row is found such that the user returned by findTodoUserByEmail has the correct password" user.password == password } } One specification is enough for now, just to make sure that all the moving parts are working nicely together. The specification itself is easy enough to understand. We’re just exercising the findTodoUserByEmail method of PostgreTodoUserDAO – which we will be writing soon. Using the ContextConfiguration from Spring Test we are able to inject beans defined in our context (the dataSource in our case) through the use of annotations. This keeps our tests short and makes them easier to modify later on. Additionally, note the use of DirtiesContext. Basically, after each specification is executed, we cannot rely on the state of the database remaining intact. I am using DirtiesContext to get a new Spring context for each specification run. That way, the table creation and test data insertions happen all over again for each specification we run. Before we can run our specification, we need to create at least the following two classes used in the spec: TodoUser and PostgreTodoUserDAO package com.sample.data import org.apache.commons.lang.builder.ToStringBuilder class TodoUser { long id; String email; String password; String confirmationCode; boolean registered; @Override public String toString() { ToStringBuilder.reflectionToString(this); } } package com.ricston.blog.sample.model.dao.postgre import groovy.sql.Sql import javax.sql.DataSource import com.ricston.blog.sample.model.dao.TodoUserDAO import com.ricston.blog.sample.model.data.TodoUser class PostgreTodoUserDAO implements TodoUserDAO { private Sql sql public PostgreTodoUserDAO(DataSource dataSource) { sql = new Sql(dataSource) } /** * * @param email * @return the TodoUser with the given email */ public TodoUser findTodoUserByEmail(String email) { sql.firstRow """SELECT * FROM todouser WHERE email = $email""" } } package com.ricston.blog.sample.model.dao; import com.ricston.blog.sample.model.data.TodoUser; public interface TodoUserDAO { /** * * @param email * @return the TodoUser with the given email */ public TodoUser findTodoUserByEmail(String email); } We’re just creating a POGO in TodoUser, implementing its toString using common’s ToStringBuilder. In PostgreTodoUserDAO we’re using Groovy’s SQL to access the database, for now, only implementing the findTodoUserByEmail method. PostgreTodoUserDAO implements TodoUserDAO, an interface which specifies the required methods a TodoUserDAO must have. Okay, so now we have all we need to run our specification. Go ahead and run it as a JUnit test from Eclipse. You should get back the following error message: org.codehaus.groovy.runtime.typehandling.GroovyCastException: Cannot cast object '{id=3, [email protected], password=anon, registered=false, confirmationcode=codeA}' with class 'groovy.sql.GroovyRowResult' to class 'com.ricston.blog.sample.model.data.TodoUser' due to: org.codehaus.groovy.runtime.metaclass.MissingPropertyExceptionNoStack: No such property: confirmationcode for class: com.ricston.blog.sample.model.data.TodoUser Possible solutions: confirmationCode at com.ricston.blog.sample.model.dao.postgre.PostgreTodoUserDAO.findTodoUserByEmail(PostgreTodoUserDAO.groovy:23) at com.ricston.blog.sample.model.spec.PostgreTodoUserDAOSpec.findTodoUserByEmail when user exists in db(PostgreTodoUserDAOSpec.groovy:37) Go ahead and connect to your tododbtest database and select * from todouser; As you can see, our confirmationCode varchar(280), ended up as the column confirmationcode with a lower case ‘c’. In PostgreTodoUserDAO’s findTodoUserByEmail, we are getting back GroovyRowResult from our firstRow invocation. GroovyRowResult implements Map and Groovy is able to create a POGO (in our case TodoUser) from a Map. However, in order for Groovy to be able to automatically coerce the GroovyRowResult into a TodoUser, the keys in the Map (or GroovyRowResult) must match the property names in our POGO. We are using confirmationCode in our TodoUser, and we would like to stick to the camel case convention. What can we do to get around this? Well, first of all, lets change our schema to use confirmation_code. That’s a little more readable. Of course, we still have the same problem as before since confirmation_code will not map to confirmationCode by itself. (Note: remember to change the insert statements in test-data.sql too). One way to get around this is to use Groovy’s propertyMissing methods as show below: def propertyMissing(String name, value) { if(isConfirmationCode(name)) { this.confirmationCode = value } else { unknownProperty(name) } } def propertyMissing(String name) { if(isConfirmationCode(name)) { return confirmationCode } else { unknownProperty(name) } } private boolean isConfirmationCode(String name) { 'confirmation_code'.equals(name) } def unknownProperty(String name) { throw new MissingPropertyException(name, this.class) } By adding this to our TodoUser.groovy we are effectively tapping in on how Groovy resolves property access. When we do something like user.confirmationCode, Groovy automatically calls getConfirmationCode(), a method which we got for free when declared the property confirmationCode in our TodoUser. Now, when user.confirmation_code is invoked, Groovy doesn’t find any getters to invoke since we never declared the property confirmation_code, however, since we have now implemented the propertyMissing methods, before throwing any exceptions it will use those methods as a last resort when resolving properties. In our case we are effectively checking whether a get or set on confirmation_code is being made and mapping the respective operations to our confirmationCode property. It’s as simple as that. Now we can keep the auto coercion in our data access object and the property name we choose to have in our TodoUser. Assuming you’ve made the changes to the schema and test-data.sql to use confirmation_code, go ahead and run the spec file and this time it should pass. That’s it for this tutorial. In conclusion, I would like to discuss some finer points which someone who’s never used Groovy’s SQL before might not know. As you can see in PostgreTodoUserDAO.groovy, our database interaction is pretty much a one-liner. What about resource handling (e.g. properly closing the connection when we’re done), error logging, and prepared statements? Resource handling and error logging are done automatically, you just have to worry about writing your SQL. When you do write your SQL, try to stick to using triple quotes as used in the PostgreTodoUserDAO.groovy example. This produces prepared statements, therefore protecting against SQL injection and avoids us having to put ‘?’ all over the place and properly lining up the arguments to pass in to the SQL statement. Note that transaction management is something which the code using our artifact will have to take care of. Finally, note that a bunch of other operations (apart from findTodoUserByEmail) are implemented in the project on GitHub: https://github.com/ricston-git/tododb. Additionally, there is also a specification test for TodoUser, making sure that the property mapping works correctly. Also, in the pom.xml, there is some maven-surefire-plugin configuration in order to get the surefire-plugin to pick up our Spock specifications as well as any JUnit tests which we might have in our project. This allows us to run our specifications when we, for example, mvn clean package. After implementing all the operations you require in PostgreTodoUserDAO.groovy, you can go ahead and compile the jar or include in a Maven multi-module project to get a data access module you can use in other applications.

November 6, 2013

by Justin Calleja

· 21,205 Views

Modeling Data in Neo4j: Bidirectional Relationships

transitioning from the relational world to the beautiful world of graphs requires a shift in thinking about data. although graphs are often much more intuitive than tables, there are certain mistakes people tend to make when modelling their data as a graph for the first time. in this article, we look at one common source of confusion: bidirectional relationships. directed relationships relationships in neo4j must have a type, giving the relationship a semantic meaning, and a direction. frequently, the direction becomes part of the relationship's meaning. in other words, the relationship would be ambiguous without it. for example, the following graph shows that the czech republic defeated sweden in ice hockey. had the direction of the relationship been reversed, the swedes would be much happier. with no direction at all, the relationship would be ambiguous, since it would not be clear who the winner was. note that the existence of this relationship implies a relationship of a different type going in the opposite direction, as the next graph illustrates. this is often the case. to give another example, the fact that pulp fiction was directed_by quentin tarantino implies that quentin tarantino is_director_of pulp fiction. you could come up with a huge number of such relationship pairs. one common mistake people often make when modelling their domain in neo4j is creating both types of relationships. since one relationship implies the other, this is wasteful, both in terms of space and traversal time. neo4j can traverse relationships in both directions. more importantly, thanks to the way neo4j organizes its data, the speed of traversal does not depend on the direction of the relationships being traversed. bidirectional relationships some relationships, on the other hand, are naturally bidirectional. a classic example is facebook or real-life friendship. this relationship is mutual - when someone is your friend, you are (hopefully) his friend, too. depending on how we look at the model, we could also say such relationship is undirected. graphaware and neo technology are partner companies. since this is a mutual relationship, we could model it as bidirectional or undirected relationship, respectively. but since none of this is directly possible in neo4j, beginners often resort to the following model, which suffers from the exact same problem as the incorrect ice hockey model: an extra unnecessary relationship. neo4j apis allow developers to completely ignore relationship direction when querying the graph, if they so desire. for example, in neo4j's own query language, cypher, the key part of a query finding all partner companies of neo technology would look something like match (neo)-[:partner]-(partner) the result would be the same as executing and merging the results of the following two different queries: match (neo)-[:partner]->(partner) and match (neo)<-[:partner]-(partner) therefore, the correct (or at least most efficient) way of modelling the partner relationships is using a single partner relationship with an arbitrary direction . conclusion relationships in neo4j can be traversed in both directions with the same speed. moreover, direction can be completely ignored. therefore, there is no need to create two different relationships between nodes, if one implies the other.

November 6, 2013

by Michal Bachman

· 28,465 Views · 2 Likes

LINQ - AddRange Method in C#

In this post I’m going to explain you about Linq AddRange method. This method is quite useful when you want to add multiple elements to a end of list. Following is a method signature for this. public void AddRange( IEnumerable collection ) To Understand let’s take simple example like following. using System; using System.Collections.Generic; namespace Linq { class Program { static void Main(string[] args) { List names=new List {"Jalpesh"}; string[] newnames=new string[]{"Vishal","Tushar","Vikas","Himanshu"}; foreach (var newname in newnames) { names.Add(newname); } foreach (var n in names) { Console.WriteLine(n); } } } } Here in the above code I am adding content of array to a already created list via foreach loop. You can use AddRange method instead of for loop like following.It will same output as above. using System; using System.Collections.Generic; namespace Linq { class Program { static void Main(string[] args) { List names=new List {"Jalpesh"}; string[] newnames=new string[]{"Vishal","Tushar","Vikas","Himanshu"}; names.AddRange(newnames); foreach (var n in names) { Console.WriteLine(n); } } } } Now when you run that example output is like following. Add Range in more complex scenario: You can also use add range to more complex scenarios also like following.You can use other operator with add range as following. using System; using System.Collections.Generic; using System.Linq; namespace Linq { class Program { static void Main(string[] args) { List names=new List {"Jalpesh"}; string[] newnames=new string[]{"Vishal","Tushar","Vikas","Himanshu"}; names.AddRange(newnames.Where(nn=>nn.StartsWith("Vi"))); foreach (var n in names) { Console.WriteLine(n); } } } } Here in the above code I have created array with string and filter it with where operator while adding it to an existing list. Following is output as expected. That’s it. Hope you like it. Stay tuned for more..

November 3, 2013

by Jalpesh Vadgama

· 62,455 Views · 2 Likes

EasyNetQ: Publisher Confirms

Publisher confirms are a RabbitMQ addition to AMQP to guarantee message delivery. You can read all about them here and here. In short they provide a asynchronous confirmation that a publish has successfully reached all the queues that it was routed to. To turn on publisher confirms with EasyNetQ set the publisherConfirms connection string parameter like this: var bus = RabbitHutch.CreateBus("host=localhost;publisherConfirms=true"); When you set this flag, EasyNetQ will wait for the confirmation, or a timeout, before returning from the Publish method: bus.Publish(new MyMessage { Text = "Hello World!" }); // here the publish has been confirmed. Nice and easy. There’s a problem though. If I run the above code in a while loop without publisher confirms, I can publish around 4000 messages per second, but with publisher confirms switched on that drops to around 140 per second. Not so good. With EasyNetQ 0.15 we introduced a new PublishAsync method that returns a Task. The Task completes when the publish is confirmed: bus.PublishAsync(message).ContinueWith(task => { if (task.IsCompleted) { Console.WriteLine("Publish completed fine."); } if (task.IsFaulted) { Console.WriteLine(task.Exception); } }); Using this code in a while loop gets us back to 4000 messages per second with publisher confirms on. Happy confirms!

November 1, 2013

by Mike Hadlow

· 8,917 Views

Adding Appsec to Agile: Security Stories, Evil User Stories and Abuse(r) Stories

Because Agile development teams work from a backlog of stories, one way to inject application security into software development is by writing up application security risks and activities as stories, making them explicit and adding them to the backlog so that application security work can be managed, estimated, prioritized and done like everything else that the team has to do. Security Stories SAFECode has tried to do this by writing a set of common, non-functional Security Stories following the well-known “As a [type of user] I want {something} so that {reason}” template. These stories are not customer- or user-focused: not the kind that a Product Owner would understand or care about. Instead, they are meant for the development team (architects, developers and testers). Example: As a(n) architect/developer, I want to ensure AND as QA, I want to verify that sensitive data is kept restricted to actors authorized to access it. There are stories to prevent/check for the common security vulnerabilities in applications: XSS, path traversal, remote execution, CSRF, OS command injection, SQL injection, password brute forcing. Checks for information exposure through error messages, proper use of encryption, authentication and session management, transport layer security, restricted uploads and URL redirection to un-trusted sites; and basic code quality issues: NULL pointer checking, boundary checking, numeric conversion, initialization, thread/process synchronization, exception handling, use of unsafe/restricted functions. SAFECode also includes a list of secure development practices (operational tasks) for the team that includes making sure that you’re using the latest compiler, patching the run-time and libraries, static analysis, vulnerability scanning, code reviews of high-risk code, tracking and fixing security bugs; and more advanced practices that require help from security experts like fuzzing, threat modeling, pen tests, environmental hardening. Altogether this is a good list of problems that need to be watched out for and things that should be done on most projects. But although SAFECode’s stories look like stories, they can’t be used as stories by the team. These Security Stories are non-functional requirements (NFRs) and technical constraints that (like requirements for scalability and maintainability and supportability) need to be considered in the design of the system, and may need to be included as part of the definition of done and conditions of acceptance for every user story that the team works on. Security Stories can’t be pulled from the backlog and delivered like other stories and removed from the backlog when they are done, because they are never “done”. The team has to keep worrying about them throughout the life of the project and of the system. As Rohit Sethi points out, asking developers to juggle long lists of technical constraints like this is not practical: If you start adding in other NFR constraints, such as accessibility, the list of constraints can quickly grow overwhelming to developers. Once the list grows unwieldy, our experience is that developers tend to ignore the list entirely. They instead rely on their own memories to apply NFR constraints. Since the number of NFRs continues to grow in increasingly specialized domains such as application security, the cognitive burden on developers’ memories is substantial. OWASP Evil User Stories – Hacking the Backlog Someone at OWASP has suggested an alternative, much smaller set of non-functional Evil User Stories that can be "hacked" into the backlog: A way for a security guy to get security on the agenda of the development team is by “hacking the backlog”. The way to do this is by crafting Evil User Stories, a few general negative cases that the team needs to consider when they implement other stories. Example #1. "As a hacker, I can send bad data in URLs, so I can access data and functions for which I'm not authorized." Example #2. "As a hacker, I can send bad data in the content of requests, so I can access data and functions for which I'm not authorized." Example #3. "As a hacker, I can send bad data in HTTP headers, so I can access data and functions for which I'm not authorized." Example #4. "As a hacker, I can read and even modify all data that is input and output by your application." Thinking like a Bad Guy – Abuse Cases and Abuser Stories Another way to beef up security in software development is to get the team to carefully look at the system they are building from the bad guy's perspective. In “Misuse and Abuse Cases: Getting Past the Positive”, Dr. Gary McGraw at Cigital talks about the importance of anticipating things going wrong, and thinking about behaviour that the system needs to prevent. Assume that the customer/user is not going to behave, or is actively out to attack the application. Question all of the assumptions in the design (the can’ts and won’ts), especially trust conditions – what if the bad guy can be anywhere along the path of an action (for example, using an attack proxy between the client and the server)? Abuse Cases are created by security experts working with the team as part of a critical review – either of the design or of an existing application. The goal of a review like this is to understand how the system behaves under attack/failure conditions, and document any weaknesses or gaps that need to be addressed. At Agile 2013 Judy Neher presented a hands-on workshop on how to write Abuser Stories, a lighter-weight, Agile practice which makes “thinking like a bad guy” part of the team’s job of defining and refining user requirements. Take a story, and as part of elaborating the story and listing the scenarios, step back and look at the story through a security lens. Don’t just think of what the user wants to do and can do - think about what they don’t want to do and can’t do. Get the same people who are working on the story to “put their black hats on” and think evil for a little while, brainstorm to come up with negative cases. As {some kind of bad guy} I want to {do some bad thing}… The {bad guy} doesn’t have to be a hacker. They could be an insider with a grudge or a selfish customer who is willing to take advantage of other users, or an admin user who needs to be protected from making expensive mistakes, or an external system that may not always function correctly. Ask questions like: How do I know who the user is and that I can trust them? Who is allowed to do what, and where are the authorization checks applied? Look for holes in multi-step workflows – what happens if somebody bypasses a check or tries to skip a step or do something out of sequence? What happens if an action or a check times-out or blocks or fails – what access should be allowed, what kind of information should be shown, what kind shouldn’t be? Are we interacting with children? Are we dealing with money? With dangerous command-and-control/admin functions? With confidential or pirvate data? Look closer at the data. Where is it coming from? Can I trust it? Is the source authenticated? Where is it validated – do I have to check it myself? Where is it stored (does it have to be stored)? If it has to be stored, should it be encrypted or masked (including in log files)? Who should be able to see it? Who shouldn’t be able to see it? Who can change it, and to the changes need to be audited? Do we need to make sure the data hasn't been tampered with (checksum, HMAC, digital signature)? Use this exercise to come up with refutation criteria (user can do this, but can’t do that; they can see this but they can’t see that), instead of, or as part of the conditions of acceptance for the story. Prioritize these cases based on risk, add the cases that you agree need to be taken care of as scenarios to the current story, or as new stories to the backlog if they are big enough. “Thinking like a bad guy” as you are working on a story seems more useful and practical than other story-based approaches. It doesn’t take a lot of time, and it’s not expensive. You don’t need to write Abuser Stories for every user Story and the more Abuser Stories that you do, the easier it will get – you'll get better at it, and you’ll keep running into the same kinds of problems that can be solved with the same patterns. You end up with something concrete and functional and actionable, work that has to be done and can be tested. Concrete, actionable cases like this are easier for the team to understand and appreciate – including the Product Owner, which is critical in Scrum, because the Product Owner decides what is important and what gets done. And because Abuser Stories are done in phase, by the people who are working on the stories already (rather than a separate activity that needs to be setup and scheduled) they are more likely to get done. Simple, quick, informal threat modeling like this isn’t enough to make a system secure – the team won’t be able to find and plug all of the security holes in the system this way, even if the developers are well-trained in secure software development and take their work seriously. Abuser Stories are good for identifying business logic vulnerabilities, reviewing security features (authentication, access control, auditing, password management, licensing) improving error handling and basic validation, and keeping onside of privacy regulations. Effective software security involves a lot more work than this: choosing a good framework and using it properly, watching out for changes to the system's attack surface, carefully reviewing high-risk code for design and coding errors, writing good defensive code as much as possible, using static analysis to catch common coding mistakes, and regular security testing (pen testing and dynamic analysis). But getting developers and testers to think like a bad guy as they build a system should go a long way to improving the security and robustness of your app.

October 31, 2013

by Jim Bird

· 21,894 Views · 1 Like

How to Use MongoDB as a Pure In-memory DB (Redis Style)

The Idea There has been a growing interest in using MongoDB as an in-memory database, meaning that the data is not stored on disk at all. This can be super useful for applications like: a write-heavy cache in front of a slower RDBMS system embedded systems PCI compliant systems where no data should be persisted unit testing where the database should be light and easily cleaned That would be really neat indeed if it was possible: one could leverage the advanced querying / indexing capabilities of MongoDB without hitting the disk. As you probably know the disk IO (especially random) is the system bottleneck in 99% of cases, and if you are writing data you cannot avoid hitting the disk. One sweet design choice of MongoDB is that it uses memory-mapped files to handle access to data files on disk. This means that MongoDB does not know the difference between RAM and disk, it just accesses bytes at offsets in giant arrays representing files and the OS takes care of the rest! It is this design decision that allows MongoDB to run in RAM with no modification. How it is done This is all achieved by using a special type of filesystem called tmpfs. Linux will make it appear as a regular FS but it is entirely located in RAM (unless it is larger than RAM in which case it can swap, which can be useful!). I have 32GB RAM on this server, let’s create a 16GB tmpfs: # mkdir /ramdata # mount -t tmpfs -o size=16000M tmpfs /ramdata/ # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvde1 5905712 4973924 871792 86% / none 15344936 0 15344936 0% /dev/shm tmpfs 16384000 0 16384000 0% /ramdata Now let’s start MongoDB with the appropriate settings. smallfiles and noprealloc should be used to reduce the amount of RAM wasted, and will not affect performance since it’s all RAM based. nojournal should be used since it does not make sense to have a journal in this context! dbpath=/ramdata nojournal = true smallFiles = true noprealloc = true After starting MongoDB, you will find that it works just fine and the files are as expected in the FS: # mongo MongoDB shell version: 2.3.2 connecting to: test > db.test.insert({a:1}) > db.test.find() { "_id" : ObjectId("51802115eafa5d80b5d2c145"), "a" : 1 } # ls -l /ramdata/ total 65684 -rw-------. 1 root root 16777216 Apr 30 15:52 local.0 -rw-------. 1 root root 16777216 Apr 30 15:52 local.ns -rwxr-xr-x. 1 root root 5 Apr 30 15:52 mongod.lock -rw-------. 1 root root 16777216 Apr 30 15:52 test.0 -rw-------. 1 root root 16777216 Apr 30 15:52 test.ns drwxr-xr-x. 2 root root 40 Apr 30 15:52 _tmp Now let’s add some data and make sure it behaves properly. We will create a 1KB document and add 4 million of them: > str = "" > aaa = "aaaaaaaaaa" aaaaaaaaaa > for (var i = 0; i < 100; ++i) { str += aaa; } > for (var i = 0; i < 4000000; ++i) { db.foo.insert({a: Math.random(), s: str});} > db.foo.stats() { "ns" : "test.foo", "count" : 4000000, "size" : 4544000160, "avgObjSize" : 1136.00004, "storageSize" : 5030768544, "numExtents" : 26, "nindexes" : 1, "lastExtentSize" : 536600560, "paddingFactor" : 1, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 129794000, "indexSizes" : { "_id_" : 129794000 }, "ok" : 1 } The document average size is 1136 bytes and it takes up about 5GB of storage. The index on _id takes about 130MB. Now we need to verify something very important: is the data duplicated in RAM, existing both within MongoDB and the filesystem? Remember that MongoDB does not buffer any data within its own process, instead data is cached in the FS cache. Let’s drop the FS cache and see what is in RAM: # echo 3 > /proc/sys/vm/drop_caches # free total used free shared buffers cached Mem: 30689876 6292780 24397096 0 1044 5817368 -/+ buffers/cache: 474368 30215508 Swap: 0 0 0 As you can see there is 6.3GB of used RAM of which 5.8GB is in FS cache (buffers). Why is there still 5.8GB of FS cache even after all caches were dropped?? The reason is that Linux is smart and it does not duplicate the pages between tmpfs and its cache… Bingo! That means your data exists with a single copy in RAM. Let’s access all documents and verify RAM usage is unchanged: > db.foo.find().itcount() 4000000 # free total used free shared buffers cached Mem: 30689876 6327988 24361888 0 1324 5818012 -/+ buffers/cache: 508652 30181224 Swap: 0 0 0 # ls -l /ramdata/ total 5808780 -rw-------. 1 root root 16777216 Apr 30 15:52 local.0 -rw-------. 1 root root 16777216 Apr 30 15:52 local.ns -rwxr-xr-x. 1 root root 5 Apr 30 15:52 mongod.lock -rw-------. 1 root root 16777216 Apr 30 16:00 test.0 -rw-------. 1 root root 33554432 Apr 30 16:00 test.1 -rw-------. 1 root root 536608768 Apr 30 16:02 test.10 -rw-------. 1 root root 536608768 Apr 30 16:03 test.11 -rw-------. 1 root root 536608768 Apr 30 16:03 test.12 -rw-------. 1 root root 536608768 Apr 30 16:04 test.13 -rw-------. 1 root root 536608768 Apr 30 16:04 test.14 -rw-------. 1 root root 67108864 Apr 30 16:00 test.2 -rw-------. 1 root root 134217728 Apr 30 16:00 test.3 -rw-------. 1 root root 268435456 Apr 30 16:00 test.4 -rw-------. 1 root root 536608768 Apr 30 16:01 test.5 -rw-------. 1 root root 536608768 Apr 30 16:01 test.6 -rw-------. 1 root root 536608768 Apr 30 16:04 test.7 -rw-------. 1 root root 536608768 Apr 30 16:03 test.8 -rw-------. 1 root root 536608768 Apr 30 16:02 test.9 -rw-------. 1 root root 16777216 Apr 30 15:52 test.ns drwxr-xr-x. 2 root root 40 Apr 30 16:04 _tmp # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvde1 5905712 4973960 871756 86% / none 15344936 0 15344936 0% /dev/shm tmpfs 16384000 5808780 10575220 36% /ramdata And that verifies it! :) What about replication? You probably want to use replication since a server loses its RAM data upon reboot! Using a standard replica set you will get automatic failover and more read capacity. If a server is rebooted MongoDB will automatically rebuild its data by pulling it from another server in the same replica set (resync). This should be fast enough even in cases with a lot of data and indices since all operations are RAM only :) It is important to remember that write operations get written to a special collection called oplog which resides in the local database and takes 5% of the volume by default. In my case the oplog would take 5% of 16GB which is 800MB. In doubt, it is safer to choose a fixed oplog size using the oplogSize option. If a secondary server is down for a longer time than the oplog contains, it will have to be resynced. To set it to 1GB, use: oplogSize = 1000 What about sharding? Now that you have all the querying capabilities of MongoDB, what if you want to implement a large service with it? Well you can use sharding freely to implement a large scalable in-memory store. Still the config servers (that contain the chunk distribution) should be disk based since their activity is small and rebuilding a cluster from scratch is not fun. What to watch for RAM is a scarce resource, and in this case you definitely want the entire data set to fit in RAM. Even though tmpfs can resort to swapping the performance would drop dramatically. To make best use of the RAM you should consider: usePowerOf2Sizes option to normalize the storage buckets run a compact command or resync the node periodically. use a schema design that is fairly normalized (avoid large document growth) Conclusion Sweet, you can now use MongoDB and all its features as an in-memory RAM-only store! Its performance should be pretty impressive: during the test with a single thread / core I was achieving 20k writes per second, and it should scale linearly over the number of cores.

October 28, 2013

by Antoine Girbal

· 61,120 Views

HashMap – Single Key and Multiple Values Example

Sometimes you want to store multiple values for the same hash key. The following code examples show you three different ways to do this.

October 26, 2013

by Jagadeesh Motamarri

· 799,629 Views · 9 Likes

Extracting File Metadata with C# and the .NET Framework

How to extract extended image metadata using C# and the Windows API Code Pack, simplifying access to detailed file properties typically seen in Windows Explorer.

October 26, 2013

by Rob Sanders

· 39,971 Views · 2 Likes

Too Many Parameters in Java Methods, Part 7: Mutable State

In this seventh post of my series on addressing the issue of too many parameters in a Java method or constructor, I look at using state to reduce the need to pass parameters. One of the reasons I have waited until the 7th post of this series to address this is that it is one of my least favorite approaches for reducing parameters passed to methods and constructors. That stated, there are multiple flavors of this approach and I definitely prefer some flavors over others. Perhaps the best known and most widely scorned approach in all of software development for using state to reduce parameter methods is the use global variables. Although it may be semantically accurate to say thatJava does not have global variables, the reality is that for good or for bad the equivalent of global variables is achieved in Java via public static constructs. A particularly popular way to achieve this in Java is via the Stateful Singleton. In Patterns of Enterprise Application Architecture, Martin Fowler wrote that "any global data is always guilty until proven innocent." Global variables and "global-like" constructs in Java are considered bad form for several reasons. They can make it difficult for developers maintaining and reading code to know where the values are defined or last changed or even come from. By their very nature and intent, global data violates the principles of encapsulation and data hiding. Miško Hevery has written the following regarding the problems of static globals in an object-oriented language: Accessing global state statically doesn’t clarify those shared dependencies to readers of the constructors and methods that use the Global State. Global State and Singletons make APIs lie about their true dependencies. ... The root problem with global state is that it is globally accessible. In an ideal world, an object should be able to interact only with other objects which were directly passed into it (through a constructor, or method call). Having state available globally reduces the need for parameters because there is no need for one object to pass data to another object if both objects already have direct access to that data. However, as Hevery put it, that's completely orthogonal to the intent of object-oriented design. Mutable state is also an increasing problem as concurrent applications become more common. In his JavaOne 2012 presentation on Scala, Scala creator Martin Odersky stated that "every piece of mutable state you have is a liability" in a highly concurrent world and added that the problem is "non-determinism caused by concurrent threads accessing shared mutable state." Although there are reasons to avoid mutable state, it still remains a generally popular approach in software development. I think there are several reasons for this including that it's superfically easy to write mutable state sharing code and mutable shared code does provide ease of access. Some types of mutable data are popular because those types of mutable data have been taught and learned as effective for years. Finally, three are times when mutable state may be the most appropriate solution. For that last reason and to be complete, I now look at how the use of mutable state can reduce the number of parameters a method must expect. Stateful Singleton and Static Variables A Java implementation of Singleton and other public Java static fields are generally available to any Java code within the same Java Virtual Machine (JVM) and loaded with the same classloader [for more details, see When is a Singleton not a Singleton?]. Any data stored universally (at least from JVM/classloader perspective) is already available to client code in the same JVM and loaded with the same class loader. Because of this, there is no need to pass that data between clients and methods or constructors in that same JVM/classloader combination. Instance State While "statics" are considered "globally available," narrower instance-level state can also be used in a similar fashion to reduce the need to pass parameters between methods of the same class. An advantage of this over global variables is that the accessibility is limited to instances of the class (private fields) or instances of the class's children (private fields). Of course, if the fields are public, accessibility is pretty wide open, but the same data is not automatically available to other code in the same JVM/classloader. The next code listing demonstrates how state data can and sometimes is used to reduce the need for parameters between two methods internal to a given class. Example of Instance State Used to Avoid Passing Parameters /** * Simple example of using instance variable so that there is no need to * pass parameters to other methods defined in the same class. */ public void doSomethingGoodWithInstanceVariables() { this.person = Person.createInstanceWithNameAndAddressOnly( new FullName.FullNameBuilder(new Name("Flintstone"), new Name("Fred")).createFullName(), new Address.AddressBuilder(new City("Bedrock"), State.UN).createAddress()); printPerson(); } /** * Prints instance of Person without requiring it to be passed in because it * is an instance variable. */ public void printPerson() { out.println(this.person); } The above example is somewhat contrived and simplified, but does illustrate the point: the instance variableperson can be accessed by other instance methods defined in the same class, so that instance does not need to be passed between those instance methods. This does reduce the signature of potentially (public accessibility means it may be used by external methods) internal methods, but also introduces state and now means that the invoked method impacts the state of that same object. In other words, the benefit of not having to pass the parameter comes at the cost of another piece of mutable state. The other side of the trade-off, needing to pass the instance of Person because it is not an instance variable, is shown in the next code listing for comparison. Example of Passing Parameter Rather than Using Instance Variable /** * Simple example of passing a parameter rather than using an instance variable. */ public void doSomethingGoodWithoutInstanceVariables() { final Person person = Person.createInstanceWithNameAndAddressOnly( new FullName.FullNameBuilder(new Name("Flintstone"), new Name("Fred")).createFullName(), new Address.AddressBuilder(new City("Bedrock"), State.UN).createAddress()); printPerson(person); } /** * Prints instance of Person that is passed in as a parameter. * * @param person Instance of Person to be printed. */ public void printPerson(final Person person) { out.println(person); } The previous two code listings illustrate that parameter passing can be reduced by using instance state. I generally prefer to not use instance state solely to avoid parameter passing. If instance state is needed for other reasons, than the reduction of parameters to be passed is a nice side benefit, but I don't like introducing unnecessary instance state simply to remove or reduce the number of parameters. Although there was a time when the readability of reduced parameters might have justified instance state in a large single-threaded environment, I feel that the slight readability gain from reduced parameters is not worth the cost of classes that are not thread-safe in an increasingly multi-threaded world. I still don't like to pass a whole lot of parameters between methods of the same class, but I can use the parameters object (perhaps with a package-private scope class) to reduce the number of these parameters and pass that parameters object around instead of the large number of parameters. JavaBean Style Construction The JavaBeans convention/style has become extremely popular in the Java development community. Many frameworks such as Spring Framework and Hibernate rely on classes adhering to the JavaBeans conventions and some of the standards like Java Persistence API also are built around the JavaBeans conventions. There are multiple reasons for the popularity of the JavaBeans style including its ease-of-use and the ability to usereflection against this code adhering to this convention to avoid additional configuration. The general idea behind the JavaBean style is to instantiate an object with a no-argument constructor and then set its fields via single-argument "set" methods and access it fields via no-argument "get" methods. This is demonstrated in the next code listings. The first listing shows a simple example of a PersonBean class with no-arguments constructor and getter and setter methods. That code listing also includes some of the JavaBeans-style classes it uses. That code listing is followed by code using that JavaBean style class. Examples of JavaBeans Style Class public class PersonBean { private FullNameBean name; private AddressBean address; private Gender gender; private EmploymentStatus employment; private HomeownerStatus homeOwnerStatus; /** No-arguments constructor. */ public PersonBean() {} public FullNameBean getName() { return this.name; } public void setName(final FullNameBean newName) { this.name = newName; } public AddressBean getAddress() { return this.address; } public void setAddress(final AddressBean newAddress) { this.address = newAddress; } public Gender getGender() { return this.gender; } public void setGender(final Gender newGender) { this.gender = newGender; } public EmploymentStatus getEmployment() { return this.employment; } public void setEmployment(final EmploymentStatus newEmployment) { this.employment = newEmployment; } public HomeownerStatus getHomeOwnerStatus() { return this.homeOwnerStatus; } public void setHomeOwnerStatus(final HomeownerStatus newHomeOwnerStatus) { this.homeOwnerStatus = newHomeOwnerStatus; } } /** * Full name of a person in JavaBean style. * * @author Dustin */ public final class FullNameBean { private Name lastName; private Name firstName; private Name middleName; private Salutation salutation; private Suffix suffix; /** No-args constructor for JavaBean style instantiation. */ private FullNameBean() {} public Name getFirstName() { return this.firstName; } public void setFirstName(final Name newFirstName) { this.firstName = newFirstName; } public Name getLastName() { return this.lastName; } public void setLastName(final Name newLastName) { this.lastName = newLastName; } public Name getMiddleName() { return this.middleName; } public void setMiddleName(final Name newMiddleName) { this.middleName = newMiddleName; } public Salutation getSalutation() { return this.salutation; } public void setSalutation(final Salutation newSalutation) { this.salutation = newSalutation; } public Suffix getSuffix() { return this.suffix; } public void setSuffix(final Suffix newSuffix) { this.suffix = newSuffix; } @Override public String toString() { return this.salutation + " " + this.firstName + " " + this.middleName + this.lastName + ", " + this.suffix; } } package dustin.examples; /** * Representation of a United States address (JavaBeans style). * * @author Dustin */ public final class AddressBean { private StreetAddress streetAddress; private City city; private State state; /** No-arguments constructor for JavaBeans-style instantiation. */ private AddressBean() {} public StreetAddress getStreetAddress() { return this.streetAddress; } public void setStreetAddress(final StreetAddress newStreetAddress) { this.streetAddress = newStreetAddress; } public City getCity() { return this.city; } public void setCity(final City newCity) { this.city = newCity; } public State getState() { return this.state; } public void setState(final State newState) { this.state = newState; } @Override public String toString() { return this.streetAddress + ", " + this.city + ", " + this.state; } } Example of JavaBeans Style Instantiation and Population public PersonBean createPerson() { final PersonBean person = new PersonBean(); final FullNameBean personName = new FullNameBean(); personName.setFirstName(new Name("Fred")); personName.setLastName(new Name("Flintstone")); person.setName(personName); final AddressBean address = new AddressBean(); address.setStreetAddress(new StreetAddress("345 Cave Stone Road")); address.setCity(new City("Bedrock")); person.setAddress(address); return person; } The examples just shown demonstrate how the JavaBeans style approach can be used. This approach makes some concessions to reduce the need to pass a large number of parameters to a class's constructor. Instead, no parameters are passed to the constructor and each individual attribute that is needed must be set. One of the advantages of the JavaBeans style approach is that readability is enhanced as compared to a constructor with a large number of parameters because each of the "set" methods is hopefully named in a readable way. The JavaBeans approach is simple to understand and definitely achieves the goal of reducing lengthy parameters in the case of constructors. However, there are some disadvantages to this approach as well. One advantage is a lot of tedious client code for instantiating the object and setting its attributes one-at-a-time. It is easy with this approach to neglect to set a required attribute because there is no way for the compiler to enforce all required parameters be set without leaving the JavaBeans convention. Perhaps most damaging, there are several objects instantiated in this last code listing and these objects exist in different incomplete states from the time they are instantiated until the time the final "set" method is called. During that time, the objects are in what is really an "undefined" or "incomplete" state. The existence of "set" methods necessarily means that the class's attributes cannot be final, rendering the entire object highly mutable. Regarding the prevalent use of the JavaBeans pattern in Java, several credible authors have called into questionits value. Allen Holub's controversial article Why getter and setter methods are evil starts off with no holds barred: Though getter/setter methods are commonplace in Java, they are not particularly object oriented (OO). In fact, they can damage your code's maintainability. Moreover, the presence of numerous getter and setter methods is a red flag that the program isn't necessarily well designed from an OO perspective. Josh Bloch, in his less forceful and more gently persuasive tone, says of the JavaBeans getter/setter style: "The JavaBeans pattern has serious disadvantages of its own" (Effective Java, Second Edition, Item #2). It is in this context that Bloch recommends the builder pattern instead for object construction. I'm not against using the JavaBeans get/set style when the framework I've selected for other reasons requires it and the reasons for using that framework justify it. There are also areas where the JavaBeans style class is particularly well suited such as interacting with a data store and holding data from the data store for use by the application. However, I am not a fan of using the JavaBeans style for instantiating a question simply to avoid the need to pass parameters. I prefer one of the other approaches such as builder for that purpose. Benefits and Advantages I've covered different approaches to reducing the number of arguments to a method or constructor in this post, but they also share the same trade-off: exposing mutable state to reduce or eliminate the number of parameters that must be passed to a method or to a constructor. The advantages of these approaches are simplicity, generally readable (though "globals" can be difficult to read), and ease of first writing and use. Of course, their biggest advantage from this post's perspective is that they generally eliminate the need for any parameter passing. Costs and Disadvantages The trait that all approaches covered in this post share is the exposure of mutable state. This can lead to an extremely high cost if the code is used in a highly concurrent environment. There is a certain degree of unpredictability when object state is exposed for anyone to tinker with it as they like. It can be difficult to know which code made the wrong change or failed to make a necessary change (such as failing to call a "set" method when populating a newly instantiated object). Conclusion Some of the approaches covered in this post are highly popular despite their drawbacks. This may be for a variety of reasons including prevalence of use in popular frameworks (forcing users of the framework to use that style and also providing examples to others for their own code development). Other reasons for these approaches' popularity is the relative ease of initial development and the seemingly (deceptively) relatively little thought that needs to go into design with these approaches. In general, I prefer to spend a little more design and implementation effort to use builders and less mutable approaches when practical. However, there are cases where these mutable approaches work well in reducing the number of parameters passed around and introduce no more risk than was already present. My feeling is that Java developers should carefully consider use of any mutable Java classes and ensure that the mutability is either desired or is a cost that is justified by the reasons for using a mutable state approach.

October 25, 2013

by Dustin Marx

· 16,649 Views

What’s the Difference Between System.String and string?

One of the questions that lot of developers ask is – Is there any difference between string andSystem.String and what should be used? Short Answer There is no difference between the two. You can use either of them in your code. Explanation System.String is a class (reference type) defined the mscorlib in the namespace System. In other words, System.String is a type in the CLR. string is a keyword in C# Before we understand the difference, let us understand BCL and FCL terms. BCL is Common Language Infrastructure (CLI) available to languages like C#, A#, Boo, Cobra, F#, IronRuby, IronPython and other CLI languages. It includes common functions such as File Read/Write or IO and database/XML interactions. BCL was first implemented in Microsoft .NET in the form of mscorlib.dll FCL is standard Microsoft .NET specific library containing reusable classes/assets like System, System.CodeDom, System.Collections, System.Diagnostics, System.Globalization, System.IO, System.Resources and System.Text Now in C#, string (keyword in BCL) directly maps to System.String (an FCL type). Similarly, intmaps directly to System.Int32. Here int is mapped to a integer type that is 32 bit. But in other language, you could probably map int (keyword in BCL) to a 64 bit integer (FCL type). So the fact that using string and System.String in C# makes no difference is well established. Is it better to still use string instead of System.String? There is no universally agreed answer to this. But, as per me, even though both string and System.String mean the same and have no difference in performance of the application, it is better to use string. This is because string is a C# language specific keyword. Also C# language specification states, As a matter of style, use of the keyword is favored over use of the complete system type name Following this practice ensures that your code consistently uses keywords wherever possible rather than having a code with BCL and FCL types used.

October 25, 2013

by Punit Ganshani

· 11,387 Views · 3 Likes

Reasons to Move from DataTables to Generic Collections

These days, no community member writes or speaks about using DataTables and DataSets for data operations. But, there are a number of real projects built using them, and many developers still feel happy when they use them in their projects. Sometimes it is not easy to completely replace DataTables with typed generic lists, particularly in bulky projects. But now is the right time to move, as future developers may not even learn about DataTables :). Generic collections have a number of advantages over DataTables. One cannot imagine a day without generic collections once he/she gets to know how beneficial they are. The following is a list of the reasons to move from DataTables to collections that I could think of now: DataTable stores boxed objects, and one needs to unbox values when needed. This adds overhead on the runtime environment. However, values in generic collections are strongly typed, so no boxing involved. Unboxing happens at runtime, as does the type checking. If there is a mismatch between types of source and target, it leads to a runtime exception. This may lead to a number of issues while using DataTables. In case of collections, as the types are checked at the compile time, such type mismatches are caught during compilation. .NET languages got very nice support for creating collections, like object initializer and collection initializer. We don’t have such features for DataTables. LINQ queries can be used on both DataTables and collections. But the experience of writing the queries on generic collections is better because of IntelliSense support provided by Visual Studio. DataTables are framework specific; we often see issues with serializing and de-serializing them in web services. Generic collections are easier to serialize and de-serialize, so they can be easily used in any service and consumed from a client written in any language. ORMs are becoming increasingly popular, and they use generic collections for all data operations. Mocking DataTables in unit tests is a pain, as it involves creating the structure of the table wherever needed. But a generic collection needs a class defined just once. These are my opinions on preferring collections over DataTables. Any feedback is welcome. Happy coding!

October 21, 2013

by Rabi Kiran Srirangam

· 30,177 Views · 3 Likes

Database vs. Data Science

One thing that Big Data certainly made happen is that it brought the database/infrastructure community and the data analysis/statistics/machine learning communities closer together. As always, each community had it’s own set of models, methods, and ideas about how to structure and interpret the world. You can still tell these differences when looking at current Big Data projects, and I think it’s important to be aware of the distinctions in order to better understand the relationships between different projects. Because, let’s face it, every project claims to re-invent Big Data. Hadoop and MapReduce being something like the founding fathers of Big Data, other’s projects have since appeared. Most notably, there are stream processing projects like Twitter’s Storm who move from batch-oriented processing to event-based processing which is more suited for real-time, low-latency processing. Spark is yet something different, a bit like Hadoop, but puts greater emphasis on iterative algorithms, and in-memory processing to achieve that landmark “100x faster than Hadoop” every current project seems to need to sport. Twitter’s summingbird project tries to bridge the gap between MapReduce and stream processing by providing us with a high-level set of operators which can then either run on MapReduce or Storm. However, both Spark or summingbird leave me sort of flat because you can see that they come from a database background, which means that there will still be a considerable gap to serious machine learning. So, what exactly is the difference? In the end, it’s the difference between relational and linear algebra. In the database world, you model relationships between objects, which you encode in tables, and foreign keys to link up entries between different tables. Probably the most important insight of the database world was to develop a query language, a declarative description of what you want to extract from your database, leaving the optimization of the query and the exact details of how to perform them efficiently to the database guys. The machine learning community, on the other hand, has its roots in linear algebra and probability theory. Objects are usually encoded as a feature vector, that is, a list of numbers describing different properties of an object. Data is often collected in matrices where each row corresponds to an object, and each column to a feature, not much unlike a table in a database. However, the operations you perform in order to do data analysis are quite different from the data base world. Take something as basic as linear regression: your try to learn a linear function f(x)=di=1wixi in a d-dimensional space (that is, where your objects are described by a d-dimensional vector) given n examples Xi, and Yi, where Xi are the features describing your objects and Yi is the real number you attach to Xi. One way to “learn” w is to tune it such that the quadratic error on the training examples is minimal. The solution can be written in closed form as w=(XXT)−1XY where X is the matrix built from the Xi (putting the Xi in the columns of X), and Y is the vector of outputs Yi. In order to solve this, you need to solve the linear equation (XXT)w=XY which can be done by one of a large number of algorithms, starting with Gaussian elimination, which you’ve probably learned in your undergrad studies, or the conjugate gradient algorithm, or by first computing a Cholesky decomposition. All of these algorithms have in common that they are iterative. They go through a number of operations, for example O(d3) for the Gaussian elimination case. They also need to store intermediate results. Gaussian elimination and Cholesky decomposition have rather elementary operations acting on individual entries, while the conjugate gradient algorithm performs a matrix-vector multiplication in each iteration. Most importantly, these algorithms can only be expressed very badly in SQL! It’s certainly not impossible, but you’d need to store your data in much different ways than you would in idiomatic database usage. So, it’s not about whether or not your framework can support iterative algorithms without significant latency, it’s about understanding that joins, group bys, and count() won’t get you far, but you need scalar products, matrix-vector and matrix-matrix multiplications. You don’t need indices for most ML algorithms, maybe except for being able to quickly find the k-nearest neighbors, because most algorithms tend to either take in the whole data set in each iteration or otherwise stream the whole set by some model which is iteratively updated like in stochastic gradient descent. I’m not sure projects like Spark or Stratosphere have fully grasped the significance of this yet. Database infrastructure-inspired Big Data has it’s place when it comes to extracting and preprocessing data, but eventually, you move from database land to machine learning land, which invariably means linear algebra land (or probability theory land, which often also reduces to linear algebra like computations). What often happens today is that you either painstakingly have to break down your linear algebra into MapReduce jobs, or you actively look for algorithms which fit the database view better. I think we’re still at the beginning of what is possible. Or, to be a bit more aggressive, claims that existing (infrastructure, database, parallelism inspired) frameworks provide you with sophistic data analytics are widely exaggerated. They take care of a very important problem by giving you a reliable infrastructure to scale your data analysis code, but there’s still a lot of work that needs to be done on your side. High-level DSLs like Apache Hive or Pig are a first step in this direction but still too much rooted in the database world IMHO. In summary, one should be aware of the difference between a framework which mostly is concerned with scaling and a tool which actually provides some piece of data analysis. And even if it comes with basic database-like analytics mechanisms, there is still a long way to go to do some serious data science. That’s why we’re also thinking that streamdrill occupies an interesting spot, because it is a bit of infrastructure, allowing you to process a serious amount of event data, but it also provides valuable analysis based on algorithms you wouldn’t want to implement yourself, even if you had some Big Data framework like Hadoop at hand. That’s an interesting direction I also would like to see more of in the future. Note: Just saw that Spark has a logistic regression example on their landing page. Well, doing matrix operations explicitly via map() on collections doesn’t count in my view ;)

October 18, 2013

by Mikio Braun

· 11,410 Views · 1 Like

Generating SQL Railroad Diagrams

simple talk - How to get SQL Railroad Diagrams from MSDN BNF syntax notation. On SQL Server Books-On-Line, in the Transact-SQL Reference (database Engine), every SQL Statement has its syntax represented in ‘Backus–Naur Form’ notation (BNF) syntax. For a programmer in a hurry, this should be ideal because It is the only quick way to understand and appreciate all the permutations of the syntax. It is a great feature once you get your eye in. It isn’t the only way to get the information; You can, of course, reverse-engineer an understanding of the syntax from the examples, but your understanding won’t be complete, and you’ll have wasted time doing it. BNF is a good start in representing the syntax: Oracle and SQLite go one step further, and have proper railroad diagrams for their syntax, which is a far more accessible way of doing it. There are three problems with the BNF on MSDN. Firstly, it is isn’t a standard version of BNF, but an ancient fork from EBNF, inherited from Sybase. Secondly, it is excruciatingly difficult to understand, and thirdly it has a number of syntactic and semantic errors. The page describing DML triggers, for example, currently has the absurd BNF error that makes it state that all statements in the body of the trigger must be separated by commas. There are a few other detail problems too. Here is the offending syntax for a DML trigger, pasted from MSDN. ... I’ve been trying to create railroad diagrams for all the important SQL Server SQL statements, as good as you’d find for Oracle, and have so far published the CREATE TABLE and ALTER TABLE railroad diagrams based on the BNF. Although I’ve been aware of them, I’ve never realised until recently how many errors there are. Then, Colin Daley created a translator for the SQL Server dialect of BNF which outputs standard EBNF notation used by the W3C. The example MSDN BNF for the trigger would be rendered as … ... Colin’s intention was to allow anyone to paste SQL Server’s BNF notation into his website-based parser, and from this generate classic railroad diagrams via Gunther Rademacher's Railroad Diagram Generator. Colin's application does this for you: you're not aware that you are moving to a different site. Because Colin's 'translator' it is a parser, it will pick up syntax errors. Once you’ve fixed the syntax errors, you will get the syntax in the form of a human-readable railroad diagram and, in this form, the semantic mistakes become flamingly obvious. Gunter’s Railroad Diagram Generator is brilliant. To be able, after correcting the MSDN dialect of BNF, to generate a standard EBNF, and from thence to create railroad diagrams for SQL Server’s syntax that are as good as Oracle’s, is a great boon, and many thanks to Colin for the idea. Here is the result of the W3C EBNF from Colin’s application then being run through the Railroad diagram generator. Now that’s much better, you’ll agree. This is pretty easy to understand, and at this point any error is immediately obvious. This should be seriously useful, and it is to me. However there is that snag. The BNF is generally incorrect, and you can’t expect the average visitor to mess about with it. The answer is, of course, to correct the BNF on MSDN and maybe even add railroad diagrams for the syntax. Stop giggling! I agree it won’t happen. In the meantime, we need to collaboratively store and publish these corrected syntaxes ourselves as we do them. How? GitHub? SQL Server Central? Simple-Talk? What should those of us who use the system do with our corrected EBNF so that anyone can use them without hassle? Grammar Translator If you are familiar with the Grammar Translator, go ahead and create railroad diagrams from the Transact-SQL Reference. Otherwise, please see the FAQ. In particular, be sure to try thetutorial. Welcome to Railroad Diagram Generator! This is a tool for creating syntax diagrams, also known as railroad diagrams, from context-free grammars specified in EBNF. Syntax diagrams have been used for decades now, so the concept is well-known, and some tools for diagram generation are in existence. The features of this one are usage of the W3C's EBNF notation, web-scraping of grammars from W3C specifications, online editing of grammars, diagram presentation in SVG, and it was completely written in web languages (XQuery, XHTML, CSS, JavaScript). There's nothing like a diagram to help grok something (and the MSDN BNF SQL stuff really makes my brain hurt...)

October 18, 2013

by Greg Duncan

· 9,201 Views

Extracting File Metadata with C# and the .NET Framework

The Windows Explorer (shell) provides extended file property information which can be quite valuable. The challenge was how to extract this information, given that the .NET Framework has somewhat limited support for this type of extraction?

October 14, 2013

by Rob Sanders

· 64,287 Views

Oracle Weblogic Stuck Thread Detection

The following question will again test your knowledge of the Oracle Weblogic threading model. I’m looking forward for your comments and experience on the same. If you are a Weblogic administrator, I’m certain that you heard of this common problem: stuck threads. This is one of the most common problems you will face when supporting a Weblogic production environment. A Weblogic stuck thread simply means a thread performing the same request for a very long time and more than the configurable Stuck Thread Max Time. Question: How can you detect the presence of STUCK threads during and following a production incident? Answer: As we saw from our last article “Weblogic Thread Monitoring Tips”, Weblogic provides functionalities allowing us to closely monitor its internal self-tuning thread pool. It will also highlight you the presence of any stuck thread. This monitoring view is very useful when you do a live analysis but what about after a production incident? The good news is that Oracle Weblogic will also log any detected stuck thread to the server log. Such information includes details on the request and more importantly, the thread stack trace. This data is crucial and will allow you to potentially better understand the root cause of any slowdown condition that occurred at a certain time. < ExecuteThread: '11' for queue: 'weblogic.kernel.Default (self-tuning)'> <[STUCK] ExecuteThread: '35' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "608" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 608213 ms POST /App1/jsp/test.jsp HTTP/1.1 Accept: application/x-ms-application... Referer: http://.. Accept-Language: en-US User-Agent: Mozilla/4.0 .. Content-Type: application/x-www-form-urlencoded Accept-Encoding: gzip, deflate Content-Length: 539 Connection: Keep-Alive Cache-Control: no-cache Cookie: JSESSIONID= ]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace: ................................... javax.servlet.http.HttpServlet.service(HttpServlet.java:727) javax.servlet.http.HttpServlet.service(HttpServlet.java:820) weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:301) weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:184) weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.... weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run() weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120) weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2281) weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2180) weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1491) weblogic.work.ExecuteThread.execute(ExecuteThread.java:256) weblogic.work.ExecuteThread.run(ExecuteThread.java:221) Here is one more tip: the generation and analysis of a JVM thread dump will also highlight you stuck threads. As we can see from the snapshot below, the Weblogic thread state is now updated to STUCK, which means that this particular request is being executed since at least 600 seconds or 10 minutes. This is very useful information since the native thread state will typically remain to RUNNABLE. The native thread state will only get updated when dealing with BLOCKED threads etc. You have to keep in mind that RUNNABLE simply means that this thread is healthy from a JVM perspective. However, it does not mean that it truly is from a middleware or Java EE container perspective. This is why Oracle Weblogic has its own internal ExecuteThread state. Finally, if your organization or client is using any commercial monitoring tool, I recommend that you enable some alerting around both hogging thread and stuck thread. This will allow your support team to take some pro-active actions before the affected Weblogic managed server(s) become fully unresponsive.

October 9, 2013

by Pierre - Hugues Charbonneau

· 55,062 Views

Cache Scope with EHCache

In another blog post we explained how you can use a new feature of Mule 3.3 to cache data in your Mule flows. Here we look at how to configure Mule to use EHCache to handle the caching part, rather than storing the data in the default InMemoryObjectStore. Let’s get going. First let’s start by saying that there are millions of different ways to do this... We’ve taken the route of configuring everything through Spring. So just because you configured your EHCache differently, does not mean it’s wrong or ours is better. We prefer this way since Mule integrates very nicely with Spring. Now we have settled that, let’s make a list of what we need to do: Define cache manager Define cache factory bean Create a custom object store Define a Mule caching strategy The first job is to define a cache manager and cache factory bean. Spring provides two very handy classes for this, specific for EHCache: EhCacheManagerFactoryBean and EhCacheFactoryBean. The cache manager just needs to be defined. However, on the cache factory bean you can configure all EHCache details such as time to live, time to idle, when to overflow on disk, the eviction policy, and much more. For more information, you can check the API here or the EHCache website. Also, from the cache factory bean, you need to refer back to the cache manager. An example is shown in the following gist: Once the cache and the cache manager are configured, we need to define a custom object store that uses EHCache to store and retrieve the data. This is very easy to do, we just need to create a new class that implements the standard Mule’s ObjectStore interface, and use EHCache to do the operations. A working custom EHCache object store is shown in the following gist: package com.ricston.cache; import java.io.Serializable; import net.sf.ehcache.Ehcache; import net.sf.ehcache.Element; import org.mule.api.store.ObjectStore; import org.mule.api.store.ObjectStoreException; public class EhcacheObjectStore implements ObjectStore { private Ehcache cache; @Override public synchronized boolean contains(Serializable key) throws ObjectStoreException { return cache.isKeyInCache(key); } @Override public synchronized void store(Serializable key, T value) throws ObjectStoreException { Element element = new Element(key, value); cache.put(element); } @SuppressWarnings("unchecked") @Override public synchronized T retrieve(Serializable key) throws ObjectStoreException { Element element = cache.get(key); if (element == null) { return null; } return (T) element.getValue(); } @Override public synchronized T remove(Serializable key) throws ObjectStoreException { T value = retrieve(key); cache.remove(key); return value; } @Override public boolean isPersistent() { return false; } public Ehcache getCache() { return cache; } public void setCache(Ehcache cache) { this.cache = cache; } } As you can clearly see, this object store encapsulates an EHCache instance. This should be set before we start using this object store. As you can imagine, we will do this through Spring. The next step is to configure a caching strategy which uses our brand new EHCache object store. The caching strategy using our custom object store, and in the object store, we are using Spring to inject the cache defined earlier in this blog post. The rest in Mule can be exactly the same as in the other blog post we explained before. So here we have shown you how we can use EHCache as the caching engine for the cache scopes provided by Mule 3.3. A reason why you would do this is that with EHCache, you have a very good and proven caching product with a ton of settings that you can exploit and tune for your application. As a side note, if you have issues with EHCache classloading in Mule, place the EHCache jars inside $MULE_HOME/lib/user rather than in your application. Enjoy.

October 4, 2013

by Alan Cassar

· 14,203 Views · 1 Like

TestNG @Test Annotation and DataProviderClass Example

In the previous post, we have seen an example where dataProvider attribute has been used3 to test methods with different sets of input data for the same test method. TestNG provides another attribute dataProviderClass in conjunction with dataProvider to fetch the input data for the test methods from an external class. The actual class that holds input data is set to the dataProviderClass attribute and datProvider by itself holds the method name where the input data is actually fetched. Here is a quick example to show how to use dataProviderClass and dataProvide attribute Code Service Class ? view source print? 01.package com.skilledmonster.example; 02./** 03.* Simple calculator service to demonstrate TestNG Framework 04.* 05.* @author Jagadeesh Motamarri 06.* @version 1.0 07.*/ 08.public interface CalculatorService { 09.int sum(int a, int b); 10.int multiply(int a, int b); 11.int div(int a, int b); 12.int sub(int a, int b); 13.} Service Implementation Class ? view source print? 01.package com.skilledmonster.example; 02./** 03.* Simple calculator service implementation to demonstrate TestNG Framework 04.* 05.* @author Jagadeesh Motamarri 06.* @version 1.0 07.*/ 08.public class SimpleCalculator implements CalculatorService { 09.public int sum(int a, int b) { 10.return a + b; 11.} 12.public int multiply(int a, int b) { 13.return a * b; 14.} 15.public int div(int a, int b) { 16.return a / b; 17.} 18.public int sub(int a, int b) { 19.return a - b; 20.} 21.} Data Provider Class ? view source print? 01.package com.skilledmonster.common; 02.import org.testng.annotations.DataProvider; 03./** 04.* Data Provider class for TestNG test cases 05.* 06.* @author Jagadeesh Motamarri 07.* @version 1.0 08.*/ 09.public class TestNGDataProvider { 10./** 11.* Data Provider for testing sum of 2 numbers 12.* 13.* @return 14.*/ 15.@DataProvider 16.public static Object[][] testSumInput() { 17.return new Object[][] { { 5, 5 }, { 10, 10 }, { 20, 20 } }; 18.} 19./** 20.* Data Provider for testing multiplication of 2 numbers 21.* 22.* @return 23.*/ 24.@DataProvider 25.public static Object[][] testMultipleInput() { 26.return new Object[][] { { 5, 5 }, { 10, 10 }, { 20, 20 } }; 27.} 28.} Finally, test class that uses dataProviderClass attribute to feed the input data for the test methods ? package com.skilledmonster.example; import org.testng.Assert; import org.testng.annotations.BeforeClass; import org.testng.annotations.Test; import com.skilledmonster.common.TestNGDataProvider; /** * Example to demonstrate use of dataProviderClass and dataProvide attributes of TestNG framework * * @author Jagadeesh Motamarri * @version 1.0 */ public class TestNGAnnotationTestDataProviderExample { public CalculatorService service; @BeforeClass public void init() { System.out.println("@BeforeClass: The annotated method will be run before the first test method in the current class is invoked."); System.out.println("init service"); service = new SimpleCalculator(); } @Test(dataProviderClass = TestNGDataProvider.class, dataProvider = "testSumInput") public void testSum(int a, int b) { System.out.println("@Test : testSum()"); int result = service.sum(a, b); Assert.assertEquals(result, a + b); } @Test(dataProviderClass = TestNGDataProvider.class, dataProvider = "testMultipleInput") public void testMultiple(int a, int b) { System.out.println("@Test : testMultiple()"); int result = service.multiply(a, b); Assert.assertEquals(result, a * b); } } Output As shown in the above console output, each of the testSum() and testMutiple() methods are invoked with different sets of input data using an external class with dataProviderClass attribute. Advantage More flexibility and re-usability of commonly used data across several test classes. Download Download TestNG DataProvider Example

October 2, 2013

by Jagadeesh Motamarri

· 25,503 Views

Clojure: Converting a string to a date

I wanted to do some date manipulation in Clojure recently and figured that since clj-time is a wrapper around Joda Time it’d probably do the trick. The first thing we need to do is add the dependency to our project file and then run lein reps to pull down the appropriate JARs. The project file should look something like this: project.clj (defproject ranking-algorithms "0.1.0-SNAPSHOT" :license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :dependencies [[org.clojure/clojure "1.4.0"] [clj-time "0.6.0"]]) Now let’s load the clj-time.format namespace into the REPL since we know we’ll be parsing dates: > (require '(clj-time [format :as f])) The string that I want to convert into a date looks like this: (def string-date "18 September 2012") The first thing we should do is check whether there is an existing formatter that we can use by evaluating the following function: > (f/show-formatters) ... :hour-minute 06:45 :hour-minute-second 06:45:22 :hour-minute-second-fraction 06:45:22.473 :hour-minute-second-ms 06:45:22.473 :mysql 2013-09-20 06:45:22 :ordinal-date 2013-263 :ordinal-date-time 2013-263T06:45:22.473Z :ordinal-date-time-no-ms 2013-263T06:45:22Z :rfc822 Fri, 20 Sep 2013 06:45:22 +0000 ... There are a lot of different built in formatters but unfortunately I couldn’t find one that exactly matched our date format so we’ll have to write our own one. For that we’ll need to refresh our knowledge of Java date formatting: We end up with the following formatter: > (f/parse (f/formatter "dd MMM YYYY") string-date) # It took me much longer than it should have to remember that ‘MMM’ is the pattern to match a short form of a month but it’s just the same as what we’d have to do in Java but with some neat wrapper functions.

October 2, 2013

by Mark Needham

· 5,879 Views