DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
MongoDB and its locks
Sometimes, you need your jobs to be persisted to a database. Existing solutions such as Gearman only used relational or file-based persistence, so they were a no-go for us and we went with MongoDB. Fast-forward a few months, and we have some problems with the database load. However, it's not that workers are pestering it too much: the problem was related to locks. MongoDB locking model As of 2.4, MongoDB holds write locks on an entire database for each write operation. Since atomicity is guaranteed only on a single document, this isn't usually a problem because even if you are inserting thousands of documents you are doing so in thousands of different operations that can be interleaved with queries and other inserts with a fair policy. This sometimes results in count() queries being inconsistent as documents are moved and indexes are asynchronously updated. However, write corruption is inexistent as documents are a very cohesive entity. However, atomic operations over a single document still lock the whole database, as in the case of findAndModify(), which looks for a document matching a certain query and updates it with a $set operation before returning it; all in a single shot and with the guarantee no other process will be able to perform the same operation of reading and writing at the same time. You can see this operation is ideal for implementing workers based on a pull model, each asking the database for a new job to do and locking it with '$set: {locked: true}'. However, after the number of workers increases a little bit, locks become a problem. Lock duration We cleaned up the working space collection of our MongoDB database by keeping in it only the unfinished jobs, and moving all the rest (completed or failed) to a different collection for archival. As the load increases due to new contracts, we saw the locking time increase as well: the application and the workers were insisting on the same database. The first of the problems was that after reducing the specs of our primary server, we started seeing timeouts of unrelated code even if the CPU and IO usage were low. The locks taken by workers to pick jobs were starting to take seconds or tens of seconds. Moreover, the MongoDB server started filling the logs with: Fri Dec 6 00:01:07 [conn280998] warning: ClientCursor::yield can't unlock b/c of recursive lock... I'm a user, not MongoDB guru but that seems not very good, especially given hundreds of these messages were written every day (although the queues continued to work correctly.) We did not find any explanation for these messages in the documentation, but I suppose they mean some operations are taking so long that they have to yield to make room for others, but in the case of atomic operations they can't to preserve consistency. An easy solution Since MongoDB does not have collection-wide locks yet, we decided to move the job pool and the completed job collections to a different database. In this way, we had a main database with the usual collections and one containing just these two, named with a '_queue' suffix. Note that we're still writing to the same database server: there is still the same number of connections being created by each process. This solution preallocates more space given two databases are involved, but as you know space is cheap nowadays. Both insertion of jobs and worker reads must take place on the same database. Here is where we discovered cohesion pays: if you have this information in a single place it is very easy to change configuration. If you have a singleton database, because "we should only have one database in this application, it will never change" this feature would cost you a lot. Fortunately, in our case it was about 10 lines of code, including the refactoring on the Factory Methods that created MongoDB database objects. Long term This solution is not for the long term, as we know the numbers of machines and their workers pool will increase in the future; a sufficiently high number of workers will saturate the connections available on the MongoDB server and lock the common collection until a pick of a job takes dozens of seconds. The design towards which we are moving includes one "foreman" to each machine, and many workers under his control; only the foreman polls the database and may lock the common collection. Distributing the job pool is not what we want for ease of retrieval of a job in case something goes bad (ever done a query on multiple databases?). Also, we don't want a push solution as it will involve the registration of workers or foremen to a central point of failure that assignes them their jobs. Since most of our servers are shutdown and rebooted according to the user load, we prefer a dynamic solution where a server can start picking jobs whenever it wants and stop without notifying remote machines.
December 6, 2013
by Giorgio Sironi
· 27,560 Views
article thumbnail
Adding Java 8 Lambda Goodness to JDBC
Data access, specifically SQL access from within Java, has never been nice. This is in large part due to the fact that the JDBC api has a lot of ceremony. Java 7 vastly improved things with ARM blocks by taking away a lot of the ceremony around managing database objects such as Statements and ResultSets but fundamentally the code flow is still the same. Java 8 Lambdas gives us a very nice tool for improving the flow of JDBC. Out first attempt at improving things here is very simply to make it easy to work with ajava.sql.ResultSet. Here we simply wrap the ResultSet iteration and then delegate it to Lambda function. This is very similar in concept to Spring's JDBCTemplate. NOTE: I've released All the code snippets you see here under an Apache 2.0 license on Github. First we create a functional interface called ResultSetProcessor as follows: @FunctionalInterface public interface ResultSetProcessor { public void process(ResultSet resultSet, long currentRow) throws SQLException; } Very straightforward. This interface takes the ResultSet and the current row of theResultSet as a parameter. Next we write a simple utility to which executes a query and then calls ourResultSetProcessor each time we iterate over the ResultSet: public static void select(Connection connection, String sql, ResultSetProcessor processor, Object... params) { try (PreparedStatement ps = connection.prepareStatement(sql)) { int cnt = 0; for (Object param : params) { ps.setObject(++cnt, param)); } try (ResultSet rs = ps.executeQuery()) { long rowCnt = 0; while (rs.next()) { processor.process(rs, rowCnt++); } } catch (SQLException e) { throw new DataAccessException(e); } } catch (SQLException e) { throw new DataAccessException(e); } } Note I've wrapped the SQLException in my own unchecked DataAccessException. Now when we write a query it's as simple as calling the select method with a connection and a query: select(connection, "select * from MY_TABLE",(rs, cnt)-> { System.out.println(rs.getInt(1)+" "+cnt) }); So that's great, but I think we can do more... One of the nifty Lambda additions in Java is the new Streams API. This would allow us to add very powerful functionality with which to process a ResultSet. Using the Streams API over a ResultSet however creates a bit more of a challenge than the simple select with Lambda in the previous example. The way I decided to go about this is create my own Tuple type which represents a single row from a ResultSet. My Tuple here is the relational version where a Tuple is a collection of elements where each element is identified by an attribute, basically a collection of key value pairs. In our case the Tuple is ordered in terms of the order of the columns in the ResultSet. The code for the Tuple ended up being quite a bit so if you want to take a look, see the GitHub project in the resources at the end of the post. Currently the Java 8 API provides the java.util.stream.StreamSupport object which provides a set of static methods for creating instances of java.util.stream.Stream. We can use this object to create an instance of a Stream. But in order to create a Stream it needs an instance ofjava.util.stream.Spliterator. This is a specialised type for iterating and partitioning a sequence of elements, the Stream needs for handling operations in parallel. Fortunately the Java 8 api also provides the java.util.stream.Spliterators class which can wrap existing Collection and enumeration types. One of those types being ajava.util.Iterator. Now we wrap a query and ResultSet in an Iterator: public class ResultSetIterator implements Iterator { private ResultSet rs; private PreparedStatement ps; private Connection connection; private String sql; public ResultSetIterator(Connection connection, String sql) { assert connection != null; assert sql != null; this.connection = connection; this.sql = sql; } public void init() { try { ps = connection.prepareStatement(sql); rs = ps.executeQuery(); } catch (SQLException e) { close(); throw new DataAccessException(e); } } @Override public boolean hasNext() { if (ps == null) { init(); } try { boolean hasMore = rs.next(); if (!hasMore) { close(); } return hasMore; } catch (SQLException e) { close(); throw new DataAccessException(e); } } private void close() { try { rs.close(); try { ps.close(); } catch (SQLException e) { //nothing we can do here } } catch (SQLException e) { //nothing we can do here } } @Override public Tuple next() { try { return SQL.rowAsTuple(sql, rs); } catch (DataAccessException e) { close(); throw e; } } } This class basically delegates the iterator methods to the underlying result set and then on the next() call transforms the current row in the ResultSet into my Tuple type. And that's the basics done (This class will need a little bit more work though). All that's left is to wire it all together to make a Stream object. Note that due to the nature of a ResultSet it's not a good idea to try process them in parallel, so our stream cannot process in parallel. public static Stream stream(final Connection connection, final String sql, final Object... parms) { return StreamSupport .stream(Spliterators.spliteratorUnknownSize( new ResultSetIterator(connection, sql), 0), false); } Now it's straightforward to stream a query. In the usage example below I've got a table TEST_TABLE with an integer column TEST_ID which basically filters out all the non even numbers and then runs a count: long result = stream(connection, "select TEST_ID from TEST_TABLE") .filter((t) -> t.asInt("TEST_ID") % 2 == 0) .limit(100) .count(); And that's it! We now have a very powerful way of working with a ResultSet. So all this code is available under an Apache 2.0 license on GitHub here. I've rather lamely dubbed the project "lambda tuples," and the purpose really is to experiment and see where you can take Java 8 and Relational DB access, so please download or feel free to contribute.
December 5, 2013
by Julian Exenberger
· 78,239 Views · 6 Likes
article thumbnail
Implementing the “Card” UI Pattern in PhoneGap/HTML5 Applications
The Card UI pattern is a common look used by Pinterest and many other content sites. See how you can make a PhoneGap app with this look.
December 2, 2013
by Andrew Trice
· 116,230 Views · 2 Likes
article thumbnail
Deconstructing the Azure Point-to-Site VPN for Command Line usage
when configuring an azure virtual network one of the most common things you'll want to do is setup a point-to-site vpn so that you can actually get to your servers to manage and maintain them. azure point-to-site vpns use client certificates to secure connections which can be quite complicated to configure so microsoft has gone the extra mile to make it easy for you to configure and get setup – sadly at the cost of losing the ability to connect through the command line or through powershell – let's change that. current state of play == no command line vpn connections normally when you want to launch a vpn from the cli or powershell in windows you can simply use the following command: rasdial "my home vpn" the azure pre-packaged vpn doesn't allow this because it's really just not a normal vpn. it's something else , something mysterious - not a normal native windows vpn connection. when you run the azure vpn through the command line you get this (you'll see a hint as to why i'd be using azure point-to-site in this screenshot): azure vpns don't appear to support this. if you want to keep your servers behind a private network in azure and use continuous deployment to get your code into production this makes it hard to deploy without a human being around. not really the best case scenario – especially when you remind yourself that automated builds aim to do away with human error altogether. what the azure point-to-site looks like out of the box when you first go to setup a point-to-site vpn into your azure virtual network microsoft points you at a page that walks you through creating a client certificate on your local machine to use as authentication. they then get you to download a package for setting up the azure vpn ras dialler on your local machine. this is accessed from within the azure "networks" page for your virtual network. you install this package and then whenever connecting you're greeted with a connection screen that you might of seen in a previous life. and by seen i don't mean that windows azure virtual networks have been around for ages. but more that the login screen may look familiar. this is because this login screen is a microsoft " connection manager " login screen and has been around for a while. example from technet (note extremely dated bitmap awesomeness): connection manager is used to pre-package vpn and dial up connections for easy-install distribution in a large organisation. this also means we can reconstruct the underlying vpn connection and use it as a normal vpn – claiming back our cli super powers. digging through the details so what we really want to know is: what is this mystical vpn technology the people at microsoft have bestowed upon us? here's how i started getting more information about the implementation: connecting once successfully then disconnect. open it up again to connect and click on properties then clicking on view log you'll then be greeted by something that looks like this: ****************************************************************** operating system : windows nt 6.2 dialler version : 7.2.9200.16384 connection name : my azure virtual network all users/single user : single user start date/time : 24/11/2013, 7:50:31 ****************************************************************** module name, time, log id, log item name, other info for connection type, 0=dial-up, 1=vpn, 2=vpn over dial-up ****************************************************************** [cmdial32] 7:50:31 03 pre-init event callingprocess = c:\windows\system32\cmmon32.exe [cmdial32] 7:50:39 04 pre-connect event connectiontype = 1 [cmdial32] 7:50:39 06 pre-tunnel event username = myclientsslcertificate domain = dunsetting = [obfuscated azure gateway id] tunnel devicename = tunneladdress = [obfuscated azure gateway id].cloudapp.net [cmdial32] 7:50:44 07 connect event [cmdial32] 7:50:44 08 custom action dll actiontype = connect actions description = to update your routing table actionpath = c:\users\doug\appdata\roaming\microsoft\network\connections\cm\[obfuscated azure gateway id]\cmroute.dll returnvalue = 0x0 [cmmon32] 7:56:21 23 external disconnect [cmdial32] 7:56:21 13 disconnect event callingprocess = c:\windows\explorer.exe more importantly you'll see this path included in the connection: within this folder is all the magic connection manager odds and ends. apologies for the [obfuscated], simply the path contains information to my azure endpoint. within this folder you'll see a bunch of files: most importantly there is a pbk file – a personal phonebook. this is what stores the connect settings for the vpn as is a commonly distributed way of sending out connection settings in the enterprise. if you run this on its own you'll actually be able to connect to the vpn directly (without your network routes being updated). this phonebook is where we can steal our settings from to recreate a command line driven connection. setting it up open up the properties of your azure point-to-site vpn phonebook above, and copy the connection address. it will look like this: azuregateway-[guid].cloudapp.net open network sharing centre , and create a new connection. then select connect to a workplace . select that you'll "use my internet connection". then enter your azure point-to-site vpn address and then give your new connection a name. remember this name for later then click create to save your vpn. now open the connection properties for your newly created vpn. this is where we'll use the settings in your azure diallers config to setup your connection. i'll save you the hassle of showing you me copying the settings from one connection to another and instead i'll just focus on what you need to set them to. flick over to the options tab and then click ppp settings . click the 2 missing options enable software compression and negotiate multi-link for single-link connections . set the type of vpn to secure socket tunnelling protocol (sstp), turn on eap and select microsoft: smart card of other certificate as the authentication type. then click on properties . select "use a certificate on this computer", un-tick "connect to these servers", and then select the certificate that uses your azure endpoint uri as its certificate name and then save out. then flick over to the network tab. open tcp/ipv4 then advanced then untick use default gateway on remote network . this setting stops internet traffic going over the vpn while you're connected so you can still surf reddit while managing your azure environment. close the vpn configuration panel. you now have a working vpn connection to azure. when you connect using windows you'll be asked to select the name of the client certificate you'll be authenticating with. you select the certificate you created and uploaded into azure before you setup your connection. when you connect using the command line you don't need to specify your certificate: rasdial "azure vpn" but there's one catch: your local machine's route table doesn't know when to send any traffic to your azure virtual network. the network link is there, but windows doesn't know what to send over your internet link and what to send over the vpn link. you see microsoft did a few things when they packaged your connection manager, and one of these things was to also copy a file called "cmroute.dll" and call this after connection to route your traffic onto your virtual network. this file altered your routing table to route traffic to your virtual network subnets through the vpn connection . we can do the same thing – so lets go about it. what's this about routing... rooting (for the english speakers in the room) my azure virtual network consists of the following network range: 10.0.0.0/8 i also have the following subnets for different machines groups. 10.0.1.0/24 (web servers) 10.0.2.0/24 (application servers) 10.0.3.0/24 (management services) my pptp connections, or point-to-site connections sit on the range: 172.16.0/24 this means that when i connect to the azure vpn i will get an ip address in this range. example: 172.16.0.17 when this happens we need to tell windows to route all traffic going to my 10.0.x.x range ip addresses through the ip address that has been given to us by azure's vpn rras service. you can see your current routing table by entering route print into a command prompt or powershell console. automating the routing additions luckily the windows task scheduler supports event listeners that allow us to watch for vpn connections and run commands off the back of them. take the below powershell script below and save it for arguments sake in c:\scripts\updateroutetableforazurevpn.ps1 ############################################################# # adds ip routes to azure vpn through the point-to-site vpn ############################################################# # define your azure subnets $ips = @("10.0.1.0", "10.0.2.0","10.0.3.0") # point-to-site ip address range # should be the first 4 octets of the ip address '172.16.0.14' == '172.16.0. $azurepptprange = "172.16.0." # find the current new dhcp assigned ip address from azure $azureipaddress = ipconfig | findstr $azurepptprange # if azure hasn't given us one yet, exit and let u know if (!$azureipaddress){ "you do not currently have an ip address in your azure subnet." exit 1 } $azureipaddress = $azureipaddress.split(": ") $azureipaddress = $azureipaddress[$azureipaddress.length-1] $azureipaddress = $azureipaddress.trim() # delete any previous configured routes for these ip ranges foreach($ip in $ips) { $routeexists = route print | findstr $ip if($routeexists) { "deleting route to azure: " + $ip route delete $ip } } # add our new routes to azure virtual network foreach($subnet in $ips) { "adding route to azure: " + $subnet echo "route add $ip mask 255.255.255.0 $azureipaddress" route add $subnet mask 255.255.255.0 $azureipaddress } now execute the following from an elevated command prompt window. this tells windows to add an event listener based task that looks for events to our "azure vpn" connection and if it sees them, it runs our powershell script. schtasks /create /f /tn "vpn connection update" /tr "powershell.exe -noninteractive -command c:\scripts\updateroutetableforazurevpn.ps1" /sc onevent /ec application /mo "*[system[(level=4 or level=0) and (eventid=20225)]] and *[eventdata[data='azure vpn']] " if i then connect to my vpn the above script should execute. after connecting if i check my routing table by entering route print into a console application we have our routes to azure added correctly. we're done! with that we're now able to fully use an azure point-to-site vpn simply from the command line. this means we can use it as part of a build server deployment, or if you're working on it all the time you can simply set it up to connect every time you login to windows . command line usage rasdial "[connection name]" rasdial "[connection name]" /disconnect for my connection named "azure vpn" this command line usage becomes: rasdial "azure vpn" rasdial "azure vpn" /disconnect
November 29, 2013
by Douglas Rathbone
· 10,426 Views
article thumbnail
New in Neo4j: Optional Relationships with OPTIONAL MATCH
One of the breaking changes in Neo4j 2.0.0-RC1 compared to previous versions is that the -[?]-> syntax for matching optional relationships has been retired and replaced with the OPTIONAL MATCH construct. An example where we might want to match an optional relationship could be if we want to find colleagues that we haven’t worked with given the following model: Suppose we have the following data set: CREATE (steve:Person {name: "Steve"}) CREATE (john:Person {name: "John"}) CREATE (david:Person {name: "David"}) CREATE (paul:Person {name: "Paul"}) CREATE (sam:Person {name: "Sam"}) CREATE (londonOffice:Office {name: "London Office"}) CREATE UNIQUE (steve)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (john)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (david)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (paul)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (sam)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(john) CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(david) We might write the following query to find people from the same office as Steve but that he hasn’t worked with: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague MATCH (potentialColleague)-[c?:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague ==> +----------------------+ ==> | potentialColleague | ==> +----------------------+ ==> | Node[4]{name:"Paul"} | ==> | Node[5]{name:"Sam"} | ==> +----------------------+ We first find which office Steve works in and find the people who also work in that office. Then we optionally match the ‘COLLEAGUES_WITH’ relationship and only return people who Steve doesn’t have that relationship with. If we run that query in 2.0.0-RC1 we get this exception: ==> SyntaxException: Question mark is no longer used for optional patterns - use OPTIONAL MATCH instead (line 1, column 199) ==> "MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague MATCH (potentialColleague)-[c?:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague" ==> Based on that advice we might translate our query to read like this: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague If we run that we get back more people than we’d expect: ==> +------------------------+ ==> | potentialColleague | ==> +------------------------+ ==> | Node[15]{name:"John"} | ==> | Node[14]{name:"David"} | ==> | Node[13]{name:"Paul"} | ==> | Node[12]{name:"Sam"} | ==> +------------------------+ The reason this query doesn’t work as we’d expect is because the WHERE clause immediately following OPTIONAL MATCH is part of the pattern rather than being evaluated afterwards as we’ve become used to. The OPTIONAL MATCH part of the query matches a ‘COLLEAGUES_WITH’ relationship where the relationship is actually null, something of a contradiction! However, since the match is optional a row is still returned. If we include ‘c’ in the RETURN part of the query we can see that this is the case: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague, c ==> +---------------------------------+ ==> | potentialColleague | c | ==> +---------------------------------+ ==> | Node[15]{name:"John"} | | ==> | Node[14]{name:"David"} | | ==> | Node[13]{name:"Paul"} | | ==> | Node[12]{name:"Sam"} | | ==> +---------------------------------+ If we take out the WHERE part of the OPTIONAL MATCH the query is a bit closer to what we want: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) RETURN potentialColleague, c ==> +-----------------------------------------------+ ==> | potentialColleague | c | ==> +-----------------------------------------------+ ==> | Node[2]{name:"John"} | :COLLEAGUES_WITH[5]{} | ==> | Node[3]{name:"David"} | :COLLEAGUES_WITH[6]{} | ==> | Node[4]{name:"Paul"} | | ==> | Node[5]{name:"Sam"} | | ==> +-----------------------------------------------+ If we introduce a WITH after the OPTIONAL MATCH we can choose to filter out those people that we’ve already worked with: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WITH potentialColleague, c WHERE c IS null RETURN potentialColleague If we evaluate that query it returns the same output as our original query: ==> +----------------------+ ==> | potentialColleague | ==> +----------------------+ ==> | Node[4]{name:"Paul"} | ==> | Node[5]{name:"Sam"} | ==> +----------------------+
November 26, 2013
by Mark Needham
· 21,443 Views · 7 Likes
article thumbnail
How to Create a Range From 1 to 10 in SQL
How do you create a range from 1 to 10 in SQL? Have you ever thought about it? This is such an easy problem to solve in any imperative language, it’s ridiculous. Take Java (or C, whatever) for instance: for (int i = 1; i <= 10; i++) System.out.println(i); This was easy, right? Things even look more lean when using functional programming. Take Scala, for instance: (1 to 10) foreach { t => println(t) } We could fill about 25 pages about various ways to do the above in Scala, agreeing on how awesome Scala is (or what hipsters we are). But how to create a range in SQL? … And we’ll exclude using stored procedures, because that would be no fun. In SQL, the data source we’re operating on are tables. If we want a range from 1 to 10, we’d probably need a table containing exactly those ten values. Here are a couple of good, bad, and ugly options of doing precisely that in SQL. OK, they’re mostly bad and ugly. By creating a table The dumbest way to do this would be to create an actual temporary table just for that purpose: CREATE TABLE "1 to 10" AS SELECT 1 value FROM DUAL UNION ALL SELECT 2 FROM DUAL UNION ALL SELECT 3 FROM DUAL UNION ALL SELECT 4 FROM DUAL UNION ALL SELECT 5 FROM DUAL UNION ALL SELECT 6 FROM DUAL UNION ALL SELECT 7 FROM DUAL UNION ALL SELECT 8 FROM DUAL UNION ALL SELECT 9 FROM DUAL UNION ALL SELECT 10 FROM DUAL See also this SQLFiddle This table can then be used in any type of select. Now that’s pretty dumb but straightforward, right? I mean, how many actual records are you going to put in there? By using a VALUES() table constructor This solution isn’t that much better. You can create a derived table and manually add the values from 1 to 10 to that derived table using the VALUES() table constructor. In SQL Server, you could write: SELECT V FROM ( VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) ) [1 to 10](V) See also this SQLFiddle By creating enough self-joins of a sufficent number of values Another “dumb”, yet a bit more generic solution would be to create only a certain amount of constant values in a table, view or CTE (e.g. two) and then self join that table enough times to reach the desired range length (e.g. four times). The following example will produce values from 1 to 10, “easily”: WITH T(V) AS ( SELECT 0 FROM DUAL UNION ALL SELECT 1 FROM DUAL ) SELECT V FROM ( SELECT 1 + T1.V + 2 * T2.V + 4 * T3.V + 8 * T4.V V FROM T T1, T T2, T T3, T T4 ) WHERE V <= 10 ORDER BY V See also this SQLFiddle By using grouping sets Another way to generate large tables is by using grouping sets, or more specifically by using the CUBE() function. This works much in a similar way as the previous example when self-joining a table with two records: SELECT ROWNUM FROM ( SELECT 1 FROM DUAL GROUP BY CUBE(1, 2, 3, 4) ) WHERE ROWNUM <= 10 See also this SQLFiddle By just taking random records from a “large enough” table In Oracle, you could probably use ALL_OBJECTs. If you’re only counting to 10, you’ll certainly get enough results from that table: SELECT ROWNUM FROM ALL_OBJECTS WHERE ROWNUM <= 10 See also this SQLFiddle What’s so “awesome” about this solution is that you can cross join that table several times to be sure to get enough values: SELECT ROWNUM FROM ALL_OBJECTS, ALL_OBJECTS, ALL_OBJECTS, ALL_OBJECTS WHERE ROWNUM <= 10 OK. Just kidding. Don’t actually do that. Or if you do, don’t blame me if your productive system runs low on memory. By using the awesome PostgreSQL GENERATE_SERIES() function Incredibly, this isn’t part of the SQL standard. Neither is it available in most databases but PostgreSQL, which has the GENERATE_SERIES() function. This is much like Scala’s range notation: (1 to 10) SELECT * FROM GENERATE_SERIES(1, 10) See also this SQLFiddle By using CONNECT BY If you’re using Oracle, then there’s a really easy way to create such a table using the CONNECT BY clause, which is almost as convenient as PostgreSQL’s GENERATE_SERIES() function: SELECT LEVEL FROM DUAL CONNECT BY LEVEL < 10 See also this SQLFiddle By using a recursive CTE Recursive common table expressions are cool, yet utterly unreadable. the equivalent of the above Oracle CONNECT BY clause when written using a recursive CTE would look like this: WITH "1 to 10"(V) AS ( SELECT 1 FROM DUAL UNION ALL SELECT V + 1 FROM "1 to 10" WHERE V < 10 ) SELECT * FROM "1 to 10" See also this SQLFiddle By using Oracle’s MODEL clause A decent “best of” comparison of how to do things in SQL wouldn’t be complete without at least one example using Oracle’s MODEL clause (see this awesome use-case for Oracle’s spreadsheet feature). Use this clause only to make your co workers really angry when maintaining your SQL code. Bow before this beauty! SELECT V FROM ( SELECT 1 V FROM DUAL ) T MODEL DIMENSION BY (ROWNUM R) MEASURES (V) RULES ITERATE (10) ( V[ITERATION_NUMBER] = CV(R) + 1 ) ORDER BY 1 See also this SQLFiddle Conclusion There aren’t actually many nice solutions to do such a simple thing in SQL. Clearly, PostgreSQL’s GENERATE_SERIES() table function is the most beautiful solution. Oracle’s CONNECT BY clause comes close. For all other databases, some trickery has to be applied in one way or another. Unfortunately.
November 20, 2013
by Lukas Eder
· 30,217 Views
article thumbnail
Neo4j: Modeling Hyper Edges in a Property Graph
At the Graph Database meet up in Antwerp, we discussed how you would model a hyper edge in a property graph like Neo4j, and I realized that I’d done this in my football graph without realizing. A hyper edge is defined as follows: A hyperedge is a connection between two or more vertices, or nodes, of a hypergraph. A hypergraph is a graph in which generalized edges (called hyperedges) may connect more than two nodes with discrete properties. In Neo4j, an edge (or relationship) can only be between itself or another node; there’s no way of creating a relationship between more than 2 nodes. I had problems when trying to model the relationship between a player and a football match because I wanted to say that a player participated in a match and represented a specific team in that match. I started out with the following model: Unfortunately, creating a direct relationship from the player to the match means that there’s no way to work out which team they played for. This information is useful because sometimes players transfer teams in the middle of a season and we want to analyze how they performed for each team. In a property graph, we need to introduce an extra node which links the match, player and team together: Although we are forced to adopt this design it actually helps us realize an extra entity in our domain which wasn’t visible before – a player’s performance in a match. If we want to capture information about a players’ performance in a match we can store it on this node. We can also easily aggregate players stats by following the played relationship without needing to worry about the matches they played in. The Neo4j manual has a few more examples of domain models containing hyper edges which are worth having a look at if you want to learn more.
November 19, 2013
by Mark Needham
· 7,000 Views
article thumbnail
Camel CXF Service With Multiple Query Parameters
While the awesome Apache Camel team is busy fixing the handling of the multiple parameters in the query, here’s a workaround. Hopefully, this post will become obsolete with the next versions of Camel. (Currently, I use 2.7.5) Problem Query parameters more than 1 is passed as a null value into a Camel-CXF service. Say, if the URL has four query parameters as in name=arun&[email protected]&age=10&phone=123456 only the first one gets populated when you do a Multi Query Params @GET @Path("search") @Produces(MediaType.APPLICATION_JSON) public String sourceResultsFromTwoSources(@QueryParam("name") String name, @QueryParam("age") String age, @QueryParam("phone") String phone,@QueryParam("email") String email ); All other parameters are null. Final Output For url http://localhost:8181/cxf/karafcxfcamel/search?name=arun&[email protected]&age=31&phone=232323 the result expected is : Workaround Interestingly, we could get the entire query string in the header. QueryStringHeader 1.String queryString = exchange.getIn().getHeader(Exchange.HTTP_QUERY, String.class); We could then do a ExtractingParams MultivaluedMap queryMap = JAXRSUtils.getStructuredParams(queryString, "&", false, false); to get the query parameters as a multi valued Map. The query parameters could then be set as a property to the Exchange and used across the exchange. Code The entire code could be downloaded from github Please note that I am running Camel as part of OSGi inside the Karaf container. While the workaround does not differ because of the environment in which you are using Camel-CXF, please be wary of this fact when you download the code from github. Watch out for the blueprint xmls for Camel configuration. The most important piece here is the Router Router RestToBeanRouter package me.rerun.karafcxfcamel.camel.beans; import org.apache.camel.Exchange; import org.apache.camel.Processor; import org.apache.camel.builder.RouteBuilder; import org.apache.camel.model.dataformat.JsonLibrary; import org.apache.cxf.jaxrs.utils.JAXRSUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import javax.ws.rs.core.MultivaluedMap; import java.util.List; import java.util.Map; public class RestToBeanRouter extends RouteBuilder { private static Logger logger= LoggerFactory.getLogger(RouteBuilder.class); @Override public void configure() throws Exception { from ("cxfrs://bean://rsServer") .process(new ParamProcessor()) .multicast() .parallelProcessing() .aggregationStrategy(new ResultAggregator()) .beanRef("restServiceImpl", "getNameEmailResult") .beanRef("restServiceImpl", "getAgePhoneResult") .end() .marshal().json(JsonLibrary.Jackson) .to("log://camelLogger?level=DEBUG"); } private class ParamProcessor implements Processor { @Override public void process(Exchange exchange) throws Exception { String queryString = exchange.getIn().getHeader(Exchange.HTTP_QUERY, String.class); MultivaluedMap queryMap = JAXRSUtils.getStructuredParams(queryString, "&", false, false); for (Map.Entry> eachQueryParam : queryMap.entrySet()) { exchange.setProperty(eachQueryParam.getKey(), eachQueryParam.getValue()); } } } } Interface RestService package me.rerun.karafcxfcamel.rest; import javax.ws.rs.GET; import javax.ws.rs.Path; import javax.ws.rs.Produces; import javax.ws.rs.core.MediaType; public interface RestService { @GET @Path("search") @Produces(MediaType.APPLICATION_JSON) public String sourceResultsFromTwoSources(); } Implementation RestServiceImpl package me.rerun.karafcxfcamel.rest; import me.rerun.karafcxfcamel.model.AgePhoneResult; import me.rerun.karafcxfcamel.model.NameEmailResult; import me.rerun.karafcxfcamel.service.base.AgePhoneService; import me.rerun.karafcxfcamel.service.base.NameEmailService; import me.rerun.karafcxfcamel.service.impl.AgePhoneServiceImpl; import org.apache.camel.Exchange; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.util.List; public class RestServiceImpl implements RestService { private static Logger logger= LoggerFactory.getLogger(AgePhoneServiceImpl.class); private NameEmailService nameEmailService; private AgePhoneService agePhoneService; public RestServiceImpl(){ } //Do nothing. Camel intercepts and routes the requests public String sourceResultsFromTwoSources() { return null; } public NameEmailResult getNameEmailResult(Exchange exchange){ logger.info("Invoking getNameEmailResult from RestServiceImpl"); String name=getFirstEntrySafelyFromList(exchange.getProperty("name", List.class)); String email=getFirstEntrySafelyFromList(exchange.getProperty("email", List.class)); return nameEmailService.getNameAndEmail(name, email); } public AgePhoneResult getAgePhoneResult(Exchange exchange){ logger.info("Invoking getAgePhoneResult from RestServiceImpl"); String age=getFirstEntrySafelyFromList(exchange.getProperty("age", List.class)); String phone=getFirstEntrySafelyFromList(exchange.getProperty("phone", List.class)); return agePhoneService.getAgePhoneResult(age, phone); } public NameEmailService getNameEmailService() { return nameEmailService; } public AgePhoneService getAgePhoneService() { return agePhoneService; } public void setNameEmailService(NameEmailService nameEmailService) { this.nameEmailService = nameEmailService; } public void setAgePhoneService(AgePhoneService agePhoneService) { this.agePhoneService = agePhoneService; } private String getFirstEntrySafelyFromList(List list){ if (list!=null && !list.isEmpty()){ return list.get(0); } return null; } } Reference Camel Mailing List Question
November 17, 2013
by Arun Manivannan
· 18,393 Views
article thumbnail
Revisiting The Purpose of Manager Classes
I came across a debate about whether naming a class [Something]Manager is proper. Mainly, I came across this article, but also found other references here and here. I already had my own thoughts on this topic and decided I would share them with you. I started thinking on this a while ago when working in a Swing-powered desktop app. We decided we would make use of the Listeners model that Swing establishes for its components. To register a listener in a Swing component you say: // Some-Type = {Action, Key, Mouse, Focus, ...} component.add[Some-Type]Listener(listener); That's not a problem until you need to dynamically remove those listeners, or override them (remove and add a new listener that listens to the same event but fires a different action), or do any other 'unnatural' thing with them. This turns out to be a little difficult if you use the interface Swing provides out-of-the-box. For instance, to remove a listener, you need to pass the whole instance you want to remove; you would have to hold the reference to the listener if you intend to remove it some time later. So I decided I would take a step further and try to abstract the mess by creating a Manager class. What you could do with this class was something like this: EventsListenersManager.registerListener( "some unique name", component, listener); EventsListenersManager.overrideListener("some unique name", newListener); EventsListenerManager.disableListener("some unique name"); EventsListenerManager.enableListener("some unique name"); EventsListenerManager.unregisterListener("some unique name"); What we gained here??? We got rid of the explicit specification of [Some-Type] and have a single function for every type of listener; and we defined a unique name for the binding, which will be used as an id to remove, enable/disable, and override listeners; a.k.a. no need to hold references. Obviously, it is now convenient to always use the manager’s interface, since it reduces the need to trick our code and gives us a simple and more readable interface to achieve many things with listeners. But that’s not the only advantage of this manager class. You see, manager classes in general have one big advantage: they create an abstraction of a service and work as a centralized point to add logic of any nature, like security logic or performance logic. In the case of our EventsListenersManager, we put some extra logic to enchance the way we can use listeners by providing an interface that simplifies their use: it is easier to make a switch between two listeners now than it was before. Another thing we could have done was to restrict the number of listeners registered in one component for performance sake (Hint: listeners execution don’t run in parallel). So manager classes seem to be a good thing, but we can’t make a manager for every object we want to control in an application. This would consume time and would make us start thinking in a manager class even when we don’t need it. But we can discover a pattern for the need of defining a manager. I’ve seen them used when the resources they control are expensive, like database connections, media resources, hardware resources, etc; and as I see it, expensiveness can come in many flavors: Listeners are expensive because they may be difficult to play with and can be a bottleneck in performance (we must keep them short and fast). Connections are expensive to construct. Media resources are expensive to load. Hardware resources are expensive because they are scarce. So we create different managers for them: We create a manager for listeners to ease and control their use. We create a connections pool in order to limit and control the number of connections to a database. Games developers create a resources manager to have a single point to access resources and reduce the possibility to load them multiple times. We all know there is a driver for every piece of hardware. That’s their manager. Here we can see multiple variations of manager classes implementations, but they all attempt to solve a single thing: reducing the cost we may incur in when using expensive resources directly. I think people should have this line of thought: make a manager class for every expensive resource you detect in your application. You will be more than thankful when you want to add extra logic and you only have to go to one place. Also, enforce the use of manager objects instead of accessing the resources directly. The next time you are planning to make a manager class, ask yourself: is the 'thing' I want to control expensive in any way???
November 7, 2013
by Martín Proenza
· 10,740 Views
article thumbnail
Deep Dive into Connection Pooling
As your application grows in functionality and/or usage, managing resources becomes increasingly important. Failure to properly utilize connection pooling is one major “gotcha” that we’ve seen greatly impact MongoDB performance and trip up developers of all levels. Connection Pools Creating new authenticated connections to the database is expensive. So, instead of creating and destroying connections for each request to the database, you want to re-use existing connections as much as possible. This is where connection pooling comes in. A Connection Pool is a cache of database connections maintained by your driver so that connections can be re-used when new connections to the database are required. When properly used, connection pools allow you to minimize the frequency and number of new connections to your database. Connection Churn Used improperly however, or not at all, your application will likely open and close new database connections too often, resulting in what we call “connection churn”. In a high-throughput application this can result in a constant flood of new connection requests to your database which will adversely affect the performance of your database and your application. Opening Too Many Connections Alternately, although less common, is the problem of creating too many MongoClient objects that are never closed. In this case, instead of churn, you get a steady increase in the number of connections to your database such that you have tens of thousands of connections open when you application could almost certainly due with far fewer. Since each connection takes RAM, you may find yourself wasting a good portion of your memory on connections which will also adversely affect your application’s performance. Although every application is different and the total number of connections to your database will greatly depend on how many client processes or application servers are connected, in our experience, any connection count great than 1000 – 1500 connections should raise an eyebrow, and most of the time your application will require far fewer than that. MongoClient and Connection Pooling Most MongoDB language drivers implement the MongoClient class which, if used properly, will handle connection pooling for you automatically. The syntax differs per language, but often you do something like this to create a new connection-pool-enabled client to your database: mongoClient = new MongoClient(URI, connectionOptions); Here the mongoClient object holds your connection pool, and will give your app connections as needed. You should strive to create this object once as your application initializes and re-use this object throughout your application to talk to your database. The most common connection pooling problem we see results from applications that create a MongoClient object way too often, sometimes on each database request. If you do this you will not be using your connection pool as each MongoClient object maintains a separate pool that is not being reused by your application. Example with Node.js Let’s look at a concrete example using the Node.js driver. Creating new connections to the database using the Node.js driver is done like this: mongodb.MongoClient.connect(URI, function(err, db) { // database operations }); The syntax for using MongoClient is slightly different here than with other drivers given Node’s single-threaded nature, but the concept is the same. You only want to call ‘connect’ once during your apps initialization phase vs. on each database request. Let’s take a closer look at the difference between doing the right thing vs. doing the wrong thing. Note: If you clone the repo from here, the logger will output your logs in your console so you can follow along. Consider the following examples: var express = require('express'); var mongodb = require('mongodb'); var app = express(); var MONGODB_URI = 'mongo-uri'; app.get('/', function(req, res) { // BAD! Creates a new connection pool for every request mongodb.MongoClient.connect(MONGODB_URI, function(err, db) { if(err) throw err; var coll = db.collection('test'); coll.find({}, function(err, docs) { docs.each(function(err, doc) { if(doc) { res.write(JSON.stringify(doc) + "\n"); } else { res.end(); } }); }); }); }); // App may initialize before DB connection is ready app.listen(3000); console.log('Listening on port 3000'); The first (no pooling): calls connect() in every request handler establishes new connections for every request (connection churn) initializes the app (app.listen()) before database connections are made var express = require('express'); var mongodb = require('mongodb'); var app = express(); var MONGODB_URI = 'mongodb-uri'; var db; var coll; // Initialize connection once mongodb.MongoClient.connect(MONGODB_URI, function(err, database) { if(err) throw err; db = database; coll = db.collection('test'); app.listen(3000); console.log('Listening on port 3000'); }); // Reuse database/collection object app.get('/', function(req, res) { coll.find({}, function(err, docs) { docs.each(function(err, doc) { if(doc) { res.write(JSON.stringify(doc) + "\n"); } else { res.end(); } }); }); }); The second (with pooling): calls connect() once reuses the database/collection variable (reuses existing connections) waits to initialize the app until after the database connection is established If you run the first example and refresh your browser enough times, you’ll quickly see that your MongoDB has a hard time handling the flood of connections and will terminate. Further Consideration – Connection Pool Size Most MongoDB drivers support a parameter that sets the max number of connections (pool size) available to your application. The connection pool size can be thought of as the max number of concurrent requests that your driver can service. The default pool size varies from driver to driver, e.g. for Node it is 5, whereas for Python it is 100. If you anticipate your application receiving many concurrent or long-running requests, we recommend increasing your pool size- adjust accordingly!
November 7, 2013
by Chris Chang
· 24,439 Views · 2 Likes
article thumbnail
Data Access Module using Groovy with Spock testing
This blog is more of a tutorial where we describe the development of a simple data access module, more for fun and learning than anything else. All code can be found here for those who don’t want to type along: https://github.com/ricston-git/tododb As a heads-up, we will be covering the following: Using Groovy in a Maven project within Eclipse Using Groovy to interact with our database Testing our code using the Spock framework We include Spring in our tests with ContextConfiguration A good place to start is to write a pom file as shown here. The only dependencies we want packaged with this artifact are groovy-all and commons-lang. The others are either going to be provided by Tomcat or are only used during testing (hence the scope tags in the pom). For example, we would put the jar with PostgreSQL driver in Tomcat’s lib, and tomcat-jdbc and tomcat-dbcp are already there. (Note: regarding the postgre jar, we would also have to do some minor configuration in Tomcat to define a DataSource which we can get in our app through JNDI – but that’s beyond the scope of this blog. See here for more info). Testing-wise, I’m depending on spring-test, spock-core, and spock-spring (the latter is to get spock to work with spring-test). Another significant addition in the pom is the maven-compiler-plugin. I have tried to get gmaven to work with Groovy in Eclipse, but I have found the maven-compiler-plugin to be a lot easier to work with. With your pom in an empty directory, go ahead and mkdir -p src/main/groovy src/main/java src/test/groovy src/test/java src/main/resources src/test/resources. This gives us a directory structure according to the Maven convention. Now you can go ahead and import the project as a Maven project in Eclipse (install the m2e plugin if you don’t already have it). It is important that you do not mvn eclipse:eclipse in your project. The .classpath it generates will conflict with your m2e plugin and (at least in my case), when you update your pom.xml the plugin will not update your dependencies inside Eclipse. So just import as a maven project once you have your pom.xml and directory structure set up. Okay, so our tests are going to be integration tests, actually using a PostgreSQL database. Since that’s the case, lets set up our database with some data. First go ahead and create a tododbtest database which will only be used for testing purposes. Next, put the following files in your src/test/resources: Note, fill in your username/password: DROP TABLE IF EXISTS todouser CASCADE; CREATE TABLE todouser ( id SERIAL, email varchar(80) UNIQUE NOT NULL, password varchar(80), registered boolean DEFAULT FALSE, confirmationCode varchar(280), CONSTRAINT todouser_pkey PRIMARY KEY (id) ); insert into todouser (email, password, registered, confirmationCode) values ('[email protected]', 'abc123', FALSE, 'abcdefg') insert into todouser (email, password, registered, confirmationCode) values ('[email protected]', 'pass1516', FALSE, '123456') insert into todouser (email, password, registered, confirmationCode) values ('[email protected]', 'anon', FALSE, 'codeA') insert into todouser (email, password, registered, confirmationCode) values ('[email protected]', 'anon2', FALSE, 'codeB') Basically, testContext.xml is what we’ll be configuring our test’s context with. The sub-division into datasource.xml and initdb.xml may be a little too much for this example… but changes are usually easier that way. The gist is that we configure our data source in datasource.xml (this is what we will be injecting in our tests), and the initdb.xml will run the schema.sql and test-data.sql to create our table and populate it with data. So lets create our test, or should I say, our specification. Spock is specification framework that allows us to write more descriptive tests. In general, it makes our tests easier to read and understand, and since we’ll be using Groovy, we might as well make use of the extra readability Spock gives us. package com.ricston.blog.sample.model.spec; import javax.sql.DataSource import org.springframework.beans.factory.annotation.Autowired import org.springframework.test.annotation.DirtiesContext import org.springframework.test.annotation.DirtiesContext.ClassMode import org.springframework.test.context.ContextConfiguration import spock.lang.Specification import com.ricston.blog.sample.model.data.TodoUser import com.ricston.blog.sample.model.dao.postgre.PostgreTodoUserDAO // because it supplies a new application context after each test, the initialize-database in initdb.xml is // executed for each test/specification @DirtiesContext(classMode=ClassMode.AFTER_EACH_TEST_METHOD) @ContextConfiguration('classpath:testContext.xml') class PostgreTodoUserDAOSpec extends Specification { @Autowired DataSource dataSource PostgreTodoUserDAO postgreTodoUserDAO def setup() { postgreTodoUserDAO = new PostgreTodoUserDAO(dataSource) } def "findTodoUserByEmail when user exists in db"() { given: "a db populated with a TodoUser with email [email protected] and the password given below" String email = '[email protected]' String password = 'anon' when: "searching for a TodoUser with that email" TodoUser user = postgreTodoUserDAO.findTodoUserByEmail email then: "the row is found such that the user returned by findTodoUserByEmail has the correct password" user.password == password } } One specification is enough for now, just to make sure that all the moving parts are working nicely together. The specification itself is easy enough to understand. We’re just exercising the findTodoUserByEmail method of PostgreTodoUserDAO – which we will be writing soon. Using the ContextConfiguration from Spring Test we are able to inject beans defined in our context (the dataSource in our case) through the use of annotations. This keeps our tests short and makes them easier to modify later on. Additionally, note the use of DirtiesContext. Basically, after each specification is executed, we cannot rely on the state of the database remaining intact. I am using DirtiesContext to get a new Spring context for each specification run. That way, the table creation and test data insertions happen all over again for each specification we run. Before we can run our specification, we need to create at least the following two classes used in the spec: TodoUser and PostgreTodoUserDAO package com.sample.data import org.apache.commons.lang.builder.ToStringBuilder class TodoUser { long id; String email; String password; String confirmationCode; boolean registered; @Override public String toString() { ToStringBuilder.reflectionToString(this); } } package com.ricston.blog.sample.model.dao.postgre import groovy.sql.Sql import javax.sql.DataSource import com.ricston.blog.sample.model.dao.TodoUserDAO import com.ricston.blog.sample.model.data.TodoUser class PostgreTodoUserDAO implements TodoUserDAO { private Sql sql public PostgreTodoUserDAO(DataSource dataSource) { sql = new Sql(dataSource) } /** * * @param email * @return the TodoUser with the given email */ public TodoUser findTodoUserByEmail(String email) { sql.firstRow """SELECT * FROM todouser WHERE email = $email""" } } package com.ricston.blog.sample.model.dao; import com.ricston.blog.sample.model.data.TodoUser; public interface TodoUserDAO { /** * * @param email * @return the TodoUser with the given email */ public TodoUser findTodoUserByEmail(String email); } We’re just creating a POGO in TodoUser, implementing its toString using common’s ToStringBuilder. In PostgreTodoUserDAO we’re using Groovy’s SQL to access the database, for now, only implementing the findTodoUserByEmail method. PostgreTodoUserDAO implements TodoUserDAO, an interface which specifies the required methods a TodoUserDAO must have. Okay, so now we have all we need to run our specification. Go ahead and run it as a JUnit test from Eclipse. You should get back the following error message: org.codehaus.groovy.runtime.typehandling.GroovyCastException: Cannot cast object '{id=3, [email protected], password=anon, registered=false, confirmationcode=codeA}' with class 'groovy.sql.GroovyRowResult' to class 'com.ricston.blog.sample.model.data.TodoUser' due to: org.codehaus.groovy.runtime.metaclass.MissingPropertyExceptionNoStack: No such property: confirmationcode for class: com.ricston.blog.sample.model.data.TodoUser Possible solutions: confirmationCode at com.ricston.blog.sample.model.dao.postgre.PostgreTodoUserDAO.findTodoUserByEmail(PostgreTodoUserDAO.groovy:23) at com.ricston.blog.sample.model.spec.PostgreTodoUserDAOSpec.findTodoUserByEmail when user exists in db(PostgreTodoUserDAOSpec.groovy:37) Go ahead and connect to your tododbtest database and select * from todouser; As you can see, our confirmationCode varchar(280), ended up as the column confirmationcode with a lower case ‘c’. In PostgreTodoUserDAO’s findTodoUserByEmail, we are getting back GroovyRowResult from our firstRow invocation. GroovyRowResult implements Map and Groovy is able to create a POGO (in our case TodoUser) from a Map. However, in order for Groovy to be able to automatically coerce the GroovyRowResult into a TodoUser, the keys in the Map (or GroovyRowResult) must match the property names in our POGO. We are using confirmationCode in our TodoUser, and we would like to stick to the camel case convention. What can we do to get around this? Well, first of all, lets change our schema to use confirmation_code. That’s a little more readable. Of course, we still have the same problem as before since confirmation_code will not map to confirmationCode by itself. (Note: remember to change the insert statements in test-data.sql too). One way to get around this is to use Groovy’s propertyMissing methods as show below: def propertyMissing(String name, value) { if(isConfirmationCode(name)) { this.confirmationCode = value } else { unknownProperty(name) } } def propertyMissing(String name) { if(isConfirmationCode(name)) { return confirmationCode } else { unknownProperty(name) } } private boolean isConfirmationCode(String name) { 'confirmation_code'.equals(name) } def unknownProperty(String name) { throw new MissingPropertyException(name, this.class) } By adding this to our TodoUser.groovy we are effectively tapping in on how Groovy resolves property access. When we do something like user.confirmationCode, Groovy automatically calls getConfirmationCode(), a method which we got for free when declared the property confirmationCode in our TodoUser. Now, when user.confirmation_code is invoked, Groovy doesn’t find any getters to invoke since we never declared the property confirmation_code, however, since we have now implemented the propertyMissing methods, before throwing any exceptions it will use those methods as a last resort when resolving properties. In our case we are effectively checking whether a get or set on confirmation_code is being made and mapping the respective operations to our confirmationCode property. It’s as simple as that. Now we can keep the auto coercion in our data access object and the property name we choose to have in our TodoUser. Assuming you’ve made the changes to the schema and test-data.sql to use confirmation_code, go ahead and run the spec file and this time it should pass. That’s it for this tutorial. In conclusion, I would like to discuss some finer points which someone who’s never used Groovy’s SQL before might not know. As you can see in PostgreTodoUserDAO.groovy, our database interaction is pretty much a one-liner. What about resource handling (e.g. properly closing the connection when we’re done), error logging, and prepared statements? Resource handling and error logging are done automatically, you just have to worry about writing your SQL. When you do write your SQL, try to stick to using triple quotes as used in the PostgreTodoUserDAO.groovy example. This produces prepared statements, therefore protecting against SQL injection and avoids us having to put ‘?’ all over the place and properly lining up the arguments to pass in to the SQL statement. Note that transaction management is something which the code using our artifact will have to take care of. Finally, note that a bunch of other operations (apart from findTodoUserByEmail) are implemented in the project on GitHub: https://github.com/ricston-git/tododb. Additionally, there is also a specification test for TodoUser, making sure that the property mapping works correctly. Also, in the pom.xml, there is some maven-surefire-plugin configuration in order to get the surefire-plugin to pick up our Spock specifications as well as any JUnit tests which we might have in our project. This allows us to run our specifications when we, for example, mvn clean package. After implementing all the operations you require in PostgreTodoUserDAO.groovy, you can go ahead and compile the jar or include in a Maven multi-module project to get a data access module you can use in other applications.
November 6, 2013
by Justin Calleja
· 21,159 Views
article thumbnail
Modeling Data in Neo4j: Bidirectional Relationships
transitioning from the relational world to the beautiful world of graphs requires a shift in thinking about data. although graphs are often much more intuitive than tables, there are certain mistakes people tend to make when modelling their data as a graph for the first time. in this article, we look at one common source of confusion: bidirectional relationships. directed relationships relationships in neo4j must have a type, giving the relationship a semantic meaning, and a direction. frequently, the direction becomes part of the relationship's meaning. in other words, the relationship would be ambiguous without it. for example, the following graph shows that the czech republic defeated sweden in ice hockey. had the direction of the relationship been reversed, the swedes would be much happier. with no direction at all, the relationship would be ambiguous, since it would not be clear who the winner was. note that the existence of this relationship implies a relationship of a different type going in the opposite direction, as the next graph illustrates. this is often the case. to give another example, the fact that pulp fiction was directed_by quentin tarantino implies that quentin tarantino is_director_of pulp fiction. you could come up with a huge number of such relationship pairs. one common mistake people often make when modelling their domain in neo4j is creating both types of relationships. since one relationship implies the other, this is wasteful, both in terms of space and traversal time. neo4j can traverse relationships in both directions. more importantly, thanks to the way neo4j organizes its data, the speed of traversal does not depend on the direction of the relationships being traversed. bidirectional relationships some relationships, on the other hand, are naturally bidirectional. a classic example is facebook or real-life friendship. this relationship is mutual - when someone is your friend, you are (hopefully) his friend, too. depending on how we look at the model, we could also say such relationship is undirected. graphaware and neo technology are partner companies. since this is a mutual relationship, we could model it as bidirectional or undirected relationship, respectively. but since none of this is directly possible in neo4j, beginners often resort to the following model, which suffers from the exact same problem as the incorrect ice hockey model: an extra unnecessary relationship. neo4j apis allow developers to completely ignore relationship direction when querying the graph, if they so desire. for example, in neo4j's own query language, cypher, the key part of a query finding all partner companies of neo technology would look something like match (neo)-[:partner]-(partner) the result would be the same as executing and merging the results of the following two different queries: match (neo)-[:partner]->(partner) and match (neo)<-[:partner]-(partner) therefore, the correct (or at least most efficient) way of modelling the partner relationships is using a single partner relationship with an arbitrary direction . conclusion relationships in neo4j can be traversed in both directions with the same speed. moreover, direction can be completely ignored. therefore, there is no need to create two different relationships between nodes, if one implies the other.
November 6, 2013
by Michal Bachman
· 28,334 Views · 2 Likes
article thumbnail
Automatically Collect and Process Visitors’ IP Addresses
(NOTE: A version of this article posted previously contained incorrect information. The below version corrects those errors. The author apologises for any inconvenience.) I’m not a programmer. I’m an art collection manager with a fierce DIY streak that has helped me to develop a database application, and build and manage a website that incorporates it. Ever since I accidentally lobotomised my first Windows 3.1 computer, I’ve taught myself how to seek out, find and apply the information I need, sometimes through long hours of trial and error, and I owe almost all of it to the Internet. If it weren’t for people’s willingness to share information for free on innumerable forums and websites like this one, I would not have been able even to scratch the surface of completing the sorts of tasks that are now behind me. In that spirit of sharing, I thought I might humbly offer the solution I’ve cobbled together from various sources, and tweaked to automate tasks related to collecting the IP addresses of visitors to my website. I’m sure it’s nothing earth-shattering to an experienced coder, but it works well for me, and I’ve never seen a complete solution like it presented anywhere on the Internet before. It is a Windows-centric solution, since that’s the platform I’ve always used. The first step is to gather and record the IP addresses of website visitors. I’ve chosen to do this using php. Insert this code into the html of each page for which you’d like to capture IP addresses, just before the closing tag: If you want to record the IP addresses for each webpage to a different file, use a different name each time for filename.txt in the above example. Next, create the blank filename.txt file(s) in the same directory of your web server in which these html files reside. Now each time a visitor loads these pages, their IP address will be written to the text file(s) you’ve indicated. Next, you’ll need a way to download the text file(s) from the server to your local machine. If you’re writing to multiple files on the server, I’ve found it’s helpful to download them separately, then combine them into one list. Also, I like to sort the list and remove duplicate entries (you’ll see why a little later). Following is a Visual Basic script to do all of that. Let’s designate it C:\Folder\Subfolder\DloadCmbnDdupe.vbs. In the script below, substitute YourWebsite.com with the domain name of your website, C:\Folder\Subfolder with the actual location on your local computer and filename*.txt with the file name(s) on the web server to which you’ve chosen to write. Option Explicit On Error Resume Next Download "http://www.YourWebsite.com/filename.txt", _ " C:\Folder\Subfolder\filename.txt " Download "http://www.YourWebsite.com/filename2.txt", _ " C:\Folder\Subfolder\filename2.txt " Download "http://www.YourWebsite.com/filename3.txt", _ " C:\Folder\Subfolder\filename3.txt " CmbnDdupe() If Err <> 0 Then Wscript.echo "Error Type = " & Err.Description End If WScript.Quit '----------------------------------------------------------------------------------------- Function Download(strURL, strPath) Dim i, objFile, objFSO, objHTTP, strFile, strMsg Const ForReading = 1, ForWriting = 2, ForAppending = 8 Set objFSO = CreateObject("Scripting.FileSystemObject") If objFSO.FolderExists(strPath) Then strFile = objFSO.BuildPath(strPath, Mid(strURL, InStrRev(strURL, "/") + 1)) ElseIf objFSO.FolderExists(Left(strPath, InStrRev(strPath, "\") - 1)) Then strFile = strPath Else WScript.Echo "ERROR: Target folder not found." Exit Function End If Set objFile = objFSO.OpenTextFile(strFile, ForWriting, True) Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1") objHTTP.Open "GET", strURL, False objHTTP.Send For i = 1 To LenB(objHTTP.ResponseBody) objFile.Write Chr(AscB(MidB(objHTTP.ResponseBody, i, 1))) Next objFile.Close() End Function '----------------------------------------------------------------------------------------- Function CmbnDdupe() Dim shell Set shell=createobject("wscript.shell") shell.run "CmbnDdupe.bat" Set shell=nothing End Function You’ll notice that Function CmbnDdupe calls a batch file, CmbnDdupe.bat, in the same directory (C:\Folder\Subfolder). Here it is, below. Again, substitute filename*.txt with the file name(s) you used at the beginning. @echo off for %%x in (filename.txt) do type %%x>>templist for %%x in (filename2.txt) do type %%x>>templist for %%x in (filename3.txt) do type %%x>>templist ren templist IPlist.txt setlocal disableDelayedExpansion set file=IPlist.txt set "sorted=%file%.sorted" set "deduped=%file%.deduped" ::Define a variable containing a linefeed character set LF=^ ::The 2 blank lines above are critical, do not remove sort "%file%" >"%sorted%" >"%deduped%" ( set "prev=" for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do ( set "ln=%%A" setlocal enableDelayedExpansion if /i "!ln!" neq "!prev!" ( endlocal (echo %%A) set "prev=%%A" ) else endlocal ) ) >nul move /y "%deduped%" "%file%" del "%sorted%" exit This routine combines the downloaded files into one (templist), then renames it to IPlist.txt. It then sorts the IP addresses into ascending order, saving to IPlist.txt.sorted, and then removes any duplicates, saving to IPlist.txt.deduped. Finally, it moves (overwrites and deletes) IPlist.txt.deduped to IPlist.txt and deletes IPlist.txt.sorted, leaving behind IPlist.txt (the sorted and de-dupe-ified list). Now, we have a list of one IP address per visitor to the pages from which we’re collecting. At this point I like to ping each IP address to collect whatever information is available about it. This is why I remove the duplicate entries, which are caused by a visitor viewing more than one of the collecting pages, or by a visitor returning to the pages. I don’t need to waste time and bandwidth pinging the same IP address more than once. If I want to see which IPs visited which pages multiple times, I can always just look at filename.txt, filename2.txt and filename3.txt. I’ve named the ping routine PingList.vbs, and put it in C:\Folder\Subfolder. Here it is: Option Explicit On Error Resume Next Dim srcFile srcFile = "IPlist.txt" PingList(srcFile) If Err <> 0 Then Wscript.echo "Error Type = " & Err.Description End If WScript.Quit '----------------------------------------------------------------------------------------- Function PingList(srcFile) Dim objFSO Dim objShell Dim strCommand Dim opnFile Dim strText Dim logFile Set objFSO = CreateObject("Scripting.FileSystemObject") Set objShell = Wscript.CreateObject("Wscript.Shell") logFile = "Log.txt" If objFSO.FileExists(srcFile) Then Set opnFile = objFSO.OpenTextFile(srcFile, 1) Do While Not opnFile.AtEndOfStream strText = opnFile.ReadLine If Trim(strText) <> "" Then strCommand = strText objShell.run "%comspec% /c ping -a -n 1 " & strText & " >> " & logFile, , True End If Loop opnFile.Close Else WScript.Echo "File '" & srcFile & "' was not found." End If End Function This script pings each IP address in IPlist.txt, resolves the hostname if possible and writes the results to Log.txt in the same directory. Go ahead and create a blank Log.txt now in C:\Folder\Subfolder. The next thing you’ll want to do is clear the contents of the text files on the server, so that they will retain only the IP addresses from new visits. Create the following file in C:\Folder\Subfolder. I’ve named it Upload.cmd. It's critical that it has the .cmd file type. @echo off echo user YourUsername> ftpcmd.dat echo YourPassword>> ftpcmd.dat echo bin>> ftpcmd.dat echo cd /YourWebDirectory/>> ftpcmd.dat echo prompt>> ftpcmd.dat echo mput %1 %2 %3>> ftpcmd.dat echo rename filename_.txt filename.txt>> ftpcmd.dat echo rename filename 2_.txt filename2.txt>> ftpcmd.dat echo rename filename 3_.txt filename3.txt>> ftpcmd.dat echo quit>> ftpcmd.dat ftp -n -s:ftpcmd.dat ftp.YourWebsite.com del ftpcmd.dat exit Substitute YourUsername and YourPassword with the username and password with which you access your website files, and YourWebDirectory with the location of your website files on the server. In C:\Folder\Subfolder, create the blank text file(s) that will overwrite the ones on the server. Give them a different name (for instance, add an underscore), as you’ll want to distinguish them from the files you downloaded at the beginning of this exercise. Hence, the blank filename_.txt will be copied to the server as filename.txt, overwriting the existing file. The number of per-cent-sign-plus-integer combinations (variables) needs to correspond with the number of files you upload and overwrite; in this case, three (%1 %2 %3 = filename_.txt, filename2_.txt and filename3_.txt). Substitute YourWebsite.com for the domain name of your website. Before moving to the final step, create a blank text file named ErrorLog.txt in C:\Folder\Subfolder. This is where we’ll record any errors encountered during the execution of the combined routines. Now to put it all together and automate it. Create the following batch file (I’ve name it DLPingUL.bat) and put it in C:\. @echo off title Download Ping Upload - Scheduled task, please wait cd "C:\Folder\Subfolder" start "" /wait CScript DloadCmbnDdupe.vbs 2>> ErrorLog.txt start "" /wait Upload.cmd filename_.txt filename2_.txt filename3_.txt 2>> ErrorLog.txt copy /d /y /a IPlist.txt NewIPlist.txt /a 2>> ErrorLog.txt start "" /wait CScript PingList.vbs 2>> ErrorLog.txt del IPlist.txt exit The reason we’ve put this in C:\ is so that Windows Task Scheduler will have no problem with permissions when running it. This routine moves to the directory in which you’ve stored all of the relevant files (C:\Folder\Subfolder, substitute with the actual location); executes the Visual Basic script that creates the list; executes the upload, passing the file names of the blank replacement files to the echo mput %1 %2 %3 command; copies IPlist.txt to NewIPlist.txt, overwriting the latter if it exists (this is so you have a list to which to refer if you want); and executes the VB Script to ping and record the results. Finally, it deletes IPlist.txt, as it needs to be created programmatically each time. Any errors are recorded in C:\Folder\Subfolder\ErrorLog.txt. The final step is to create a new task in Task Scheduler that runs C:\DLPingUL.bat at a time of your choosing. I run it every Saturday at 3:00 am, so that when I wake up I have C:\Folder\Subfolder\Log.txt waiting for me with all of its pinged IP address information. Having the hostname can be especially helpful; it can show you which bots and spiders crawled your site, or from which corporation the visit originated. The only manual task I do is any further research on those IP addresses, like running them through Whois or whatismyipaddress.com/ip-lookup . These sites help me to determine, for example, if an IP address is static or dynamic, or if it is associated with hackers or spammers, among other useful bits of information. When I’m finished, I clear the contents of Log.txt so that it is ready for next time. If you’ve read all the way down to here then you’ve been very patient with me, and for that I thank you kindly. Addendum: For best results, when copying the above code, click on "View Source" in the upper right corner of the code box, and copy and paste the source. This will ensure consistency. _____________________________________________ Phillip Schubert is the founder of Schubert & Associates www.schubertassociates.com.au
November 5, 2013
by Phillip Schubert
· 18,921 Views
article thumbnail
Service Injection in Doctrine DBAL Type
When you think of a Doctrine 2 DBAL Type you think of an atomic thing, but how can you work programmatically on this type without defining an event? A DBAL Type doesn't allow access to the Symfony 2 service container, you must use a hack. But before this let me explain the classic way (using events), why you should use this hack and why you shouldn't. The classic way is defined in the Symfony 2 Cookbook: How to Register Event Listeners and Subscribers Doctrine 2 events unlike Symfony 2 events aren't defined by the developer, the developer can only attach listeners on them. Why? Because Doctrine 2 isn't a framework that you can use for everything, persistence is its only job. When should you use this hack? When your stored object isn't a 1:1 representation of the PHP object and its elaboration can be memoizable or really fast. I use this hack for browscaps: with the BrowscapBundle I can convert from an user agent string to a stdClass object (like the get_browser function). Our object is container = $container; } public function prePersist(LifecycleEventArgs $args) { $this->doObjectToString($args); } public function postPersist(LifecycleEventArgs $args) { $this->doStringToObject($args); } public function preUpdate(LifecycleEventArgs $args) { $this->doObjectToString($args); } public function postUpdate(LifecycleEventArgs $args) { $this->doStringToObject($args); } public function postLoad(LifecycleEventArgs $args) { $this->doStringToObject($args); } private function doStringToObject($args) { $entity = $args->getEntity(); if ($entity instanceof Agent && !is_object($entity->getHeader())) { $browscap = $this->container->get('browscap'); $browser = $browscap->getBrowser($entity->getHeader()); $entity->setHeader($browser); } } private function doObjectToString($args) { $entity = $args->getEntity(); if ($entity instanceof Agent && is_object($entity->getHeader())) { $user_agent = $entity->getHeader()->browser_name; $entity->setHeader($user_agent); } } } With this code, everytime you will persist, update or extract a Agent entity from/to related storage system it'll be converted from string to object. The problem is that these callbacks will be invoked everytime and numerous events aren't recommended for your application. But with this hack I can write: services: acme.demo_bundle.event_listener.container_listener: arguments: - "@service_container" class: "Acme\DemoBundle\EventListener\ContainerListener" tags: - { name: doctrine.event_listener, event: getContainer } Doctrine ignores this event but it exists and results attached! container = $container; } public function getContainer() { return $this->container; } } This listener seems useless, but it's the only way for this hack because Doctrine 2 DBAL Type doesn't allow direct access to the service container but allows access to events listeners. getVarcharTypeDeclarationSQL($fieldDeclaration); } public function convertToPHPValue($value, AbstractPlatform $platform) { if (is_null($value)) { return null; } $listeners = $platform->getEventManager()->getListeners('getContainer'); $listener = array_shift($listeners); $container = $listener->getContainer(); return $container->get('browscap')->getBrowser($value); } public function convertToDatabaseValue($value, AbstractPlatform $platform) { if ($value instanceof Browscap) { return $value->getBrowser()->browser_name; } elseif ($value instanceof stdClass) { return $value->browser_name; } return $value; } public function getName() { return 'browscap'; } public function requiresSQLCommentHint(AbstractPlatform $platform) { return true; } } I use this hack to define only the events related to application flow (less events is better). Now that you know when you can use this, you must read why you shouldn't use it. Let me explain the reason with one simple example: imagine that one day PHP will allow external hooks in native classes constructor, how can you work without knowing what you're doing while initializing a new stdClass? The same reason here: everytime you extract a value from the database you want extract it fast (hopefully you'll extract more than one records), but how can you be sure that extraction is fast if every attribute of a single record depends on external libraries and logics? Quoting Ocramius, member of the Doctrine 2 development team: DBAL types are not designed for Dependency Injection. We explicitly avoided using DI for DBAL types because they have to stay simple. We’ve been asked many many times to change this behaviour, but doctrine believes that complex data manipulation should NOT happen within the very core of the persistence layer itself. That should be handled in your service layer.
November 2, 2013
by Emanuele Minotto
· 7,888 Views
article thumbnail
Securing Docker’s Remote API
One piece to Docker that is interesting AMAZING is the Remote API that can be used to programatically interact with docker. I recently had a situation where I wanted to run many containers on a host with a single container managing the other containers through the API. But the problem I soon discovered is that at the moment when you turn networking on it is an all or nothing type of thing… you can’t turn networking off selectively on a container by container basis. You can disable IPv4 forwarding, but you can still reach the docker remote API on the machine if you can guess the IP address of it. One solution I came up with for this is to use nginx to expose the unix socket for docker over HTTPS and utilize client-side ssl certificates to only allow trusted containers to have access. I liked this setup a lot so I thought I would share how it’s done. Disclaimer: assumes some knowledge of docker! Generate The SSL Certificates We’ll use openssl to generate and self-sign the certs. Since this is for an internal service we’ll just sign it ourselves. We also remove the password from the keys so that we aren’t prompted for it each time we start nginx. # Create the CA Key and Certificate for signing Client Certs openssl genrsa -des3 -out ca.key 4096 openssl rsa -in ca.key -out ca.key # remove password! openssl req -new -x509 -days 365 -key ca.key -out ca.crt # Create the Server Key, CSR, and Certificate openssl genrsa -des3 -out server.key 1024 openssl rsa -in server.key -out server.key # remove password! openssl req -new -key server.key -out server.csr # We're self signing our own server cert here. This is a no-no in production. openssl x509 -req -days 365 -in server.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out server.crt # Create the Client Key and CSR openssl genrsa -des3 -out client.key 1024 openssl rsa -in client.key -out client.key # no password! openssl req -new -key client.key -out client.csr # Sign the client certificate with our CA cert. Unlike signing our own server cert, this is what we want to do. openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out client.crt Another option may be to leave the passphrase in and provide it as an environment variable when running a docker container or through some other means as an extra layer of security. We’ll move ca.crt, server.key and server.crt to /etc/nginx/certs. Setup Nginx The nginx setup for this is pretty straightforward. We just listen for traffic on localhost on port 4242. We require client-side ssl certificate validation and reference the certificates we generated in the previous step. And most important of all, set up an upstream proxy to the docker unix socket. I simply overwrote what was already in /etc/nginx/sites-enabled/default. upstream docker { server unix:/var/run/docker.sock fail_timeout=0; } server { listen 4242; server localhost; ssl on; ssl_certificate /etc/nginx/certs/server.crt; ssl_certificate_key /etc/nginx/certs/server.key; ssl_client_certificate /etc/nginx/certs/ca.crt; ssl_verify_client on; access_log on; error_log /dev/null; location / { proxy_pass http://docker; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; client_max_body_size 10m; client_body_buffer_size 128k; proxy_connect_timeout 90; proxy_send_timeout 120; proxy_read_timeout 120; proxy_buffer_size 4k; proxy_buffers 4 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; } } One important piece to make this work is you should add the user nginx runs as to the docker group so that it can read from the socket. This could be www-data, nginx, or something else! Hack It Up! With this setup and nginx restarted, let’s first run a curl command to make sure that this setup correctly. First we’ll make a call without the client cert to double check that we get denied access then a proper one. # Is normal http traffic denied? curl -v http://localhost:4242/info # How about https, sans client cert and key? curl -v -s -k https://localhost:4242/info # And the final good request! curl -v -s -k --key client.key --cert client.crt https://localhost:4242/info For the first two we should get some run of the mill 400 http response codes before we get a proper JSON response from the final command! Woot! But wait there’s more… let’s build a container that can call the service to launch other containers! For this example we’ll simply build two containers: one that has the client certificate and key and one that doesn’t. The code for these examples are pretty straightforward and to save space I’ll leave the untrusted container out. You can view the untrusted container on github (although it is nothing exciting). First, the node.js application that will connect and display information: https = require 'https' fs = require 'fs' options = host: 172.42.1.62 port: 4242 method: 'GET' path: '/containers/json' key: fs.readFileSync('ssl/client.key') cert: fs.readFileSync('ssl/client.crt') headers: { 'Accept': 'application/json'} # not required, but being semantic here! req = https.request options, (res) -> console.log res req.end() And the Dockerfile used to build the container. Notice we add the client.crt and client.key as part of building it! FROM shykes/nodejs MAINTAINER James R. Carr ADD ssl/client* /srv/app/ssl ADD package.json /srv/app/package.json ADD app.coffee /srv/app/app.coffee RUN cd /srv/app && npm install . CMD cd /srv/app && npm start That’s about it. Run docker build . and docker run -n >IMAGE ID< and we should see a json dump to the console of the actively running containers. Doing the same in the untrusted directory should present us with some 400 error about not providing a client ssl certificate. I’ve shared a project with all this code plus a vagrant file on github for your own prusual. Enjoy!
October 31, 2013
by James Carr
· 14,313 Views
article thumbnail
How to Use MongoDB as a Pure In-memory DB (Redis Style)
The Idea There has been a growing interest in using MongoDB as an in-memory database, meaning that the data is not stored on disk at all. This can be super useful for applications like: a write-heavy cache in front of a slower RDBMS system embedded systems PCI compliant systems where no data should be persisted unit testing where the database should be light and easily cleaned That would be really neat indeed if it was possible: one could leverage the advanced querying / indexing capabilities of MongoDB without hitting the disk. As you probably know the disk IO (especially random) is the system bottleneck in 99% of cases, and if you are writing data you cannot avoid hitting the disk. One sweet design choice of MongoDB is that it uses memory-mapped files to handle access to data files on disk. This means that MongoDB does not know the difference between RAM and disk, it just accesses bytes at offsets in giant arrays representing files and the OS takes care of the rest! It is this design decision that allows MongoDB to run in RAM with no modification. How it is done This is all achieved by using a special type of filesystem called tmpfs. Linux will make it appear as a regular FS but it is entirely located in RAM (unless it is larger than RAM in which case it can swap, which can be useful!). I have 32GB RAM on this server, let’s create a 16GB tmpfs: # mkdir /ramdata # mount -t tmpfs -o size=16000M tmpfs /ramdata/ # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvde1 5905712 4973924 871792 86% / none 15344936 0 15344936 0% /dev/shm tmpfs 16384000 0 16384000 0% /ramdata Now let’s start MongoDB with the appropriate settings. smallfiles and noprealloc should be used to reduce the amount of RAM wasted, and will not affect performance since it’s all RAM based. nojournal should be used since it does not make sense to have a journal in this context! dbpath=/ramdata nojournal = true smallFiles = true noprealloc = true After starting MongoDB, you will find that it works just fine and the files are as expected in the FS: # mongo MongoDB shell version: 2.3.2 connecting to: test > db.test.insert({a:1}) > db.test.find() { "_id" : ObjectId("51802115eafa5d80b5d2c145"), "a" : 1 } # ls -l /ramdata/ total 65684 -rw-------. 1 root root 16777216 Apr 30 15:52 local.0 -rw-------. 1 root root 16777216 Apr 30 15:52 local.ns -rwxr-xr-x. 1 root root 5 Apr 30 15:52 mongod.lock -rw-------. 1 root root 16777216 Apr 30 15:52 test.0 -rw-------. 1 root root 16777216 Apr 30 15:52 test.ns drwxr-xr-x. 2 root root 40 Apr 30 15:52 _tmp Now let’s add some data and make sure it behaves properly. We will create a 1KB document and add 4 million of them: > str = "" > aaa = "aaaaaaaaaa" aaaaaaaaaa > for (var i = 0; i < 100; ++i) { str += aaa; } > for (var i = 0; i < 4000000; ++i) { db.foo.insert({a: Math.random(), s: str});} > db.foo.stats() { "ns" : "test.foo", "count" : 4000000, "size" : 4544000160, "avgObjSize" : 1136.00004, "storageSize" : 5030768544, "numExtents" : 26, "nindexes" : 1, "lastExtentSize" : 536600560, "paddingFactor" : 1, "systemFlags" : 1, "userFlags" : 0, "totalIndexSize" : 129794000, "indexSizes" : { "_id_" : 129794000 }, "ok" : 1 } The document average size is 1136 bytes and it takes up about 5GB of storage. The index on _id takes about 130MB. Now we need to verify something very important: is the data duplicated in RAM, existing both within MongoDB and the filesystem? Remember that MongoDB does not buffer any data within its own process, instead data is cached in the FS cache. Let’s drop the FS cache and see what is in RAM: # echo 3 > /proc/sys/vm/drop_caches # free total used free shared buffers cached Mem: 30689876 6292780 24397096 0 1044 5817368 -/+ buffers/cache: 474368 30215508 Swap: 0 0 0 As you can see there is 6.3GB of used RAM of which 5.8GB is in FS cache (buffers). Why is there still 5.8GB of FS cache even after all caches were dropped?? The reason is that Linux is smart and it does not duplicate the pages between tmpfs and its cache… Bingo! That means your data exists with a single copy in RAM. Let’s access all documents and verify RAM usage is unchanged: > db.foo.find().itcount() 4000000 # free total used free shared buffers cached Mem: 30689876 6327988 24361888 0 1324 5818012 -/+ buffers/cache: 508652 30181224 Swap: 0 0 0 # ls -l /ramdata/ total 5808780 -rw-------. 1 root root 16777216 Apr 30 15:52 local.0 -rw-------. 1 root root 16777216 Apr 30 15:52 local.ns -rwxr-xr-x. 1 root root 5 Apr 30 15:52 mongod.lock -rw-------. 1 root root 16777216 Apr 30 16:00 test.0 -rw-------. 1 root root 33554432 Apr 30 16:00 test.1 -rw-------. 1 root root 536608768 Apr 30 16:02 test.10 -rw-------. 1 root root 536608768 Apr 30 16:03 test.11 -rw-------. 1 root root 536608768 Apr 30 16:03 test.12 -rw-------. 1 root root 536608768 Apr 30 16:04 test.13 -rw-------. 1 root root 536608768 Apr 30 16:04 test.14 -rw-------. 1 root root 67108864 Apr 30 16:00 test.2 -rw-------. 1 root root 134217728 Apr 30 16:00 test.3 -rw-------. 1 root root 268435456 Apr 30 16:00 test.4 -rw-------. 1 root root 536608768 Apr 30 16:01 test.5 -rw-------. 1 root root 536608768 Apr 30 16:01 test.6 -rw-------. 1 root root 536608768 Apr 30 16:04 test.7 -rw-------. 1 root root 536608768 Apr 30 16:03 test.8 -rw-------. 1 root root 536608768 Apr 30 16:02 test.9 -rw-------. 1 root root 16777216 Apr 30 15:52 test.ns drwxr-xr-x. 2 root root 40 Apr 30 16:04 _tmp # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvde1 5905712 4973960 871756 86% / none 15344936 0 15344936 0% /dev/shm tmpfs 16384000 5808780 10575220 36% /ramdata And that verifies it! :) What about replication? You probably want to use replication since a server loses its RAM data upon reboot! Using a standard replica set you will get automatic failover and more read capacity. If a server is rebooted MongoDB will automatically rebuild its data by pulling it from another server in the same replica set (resync). This should be fast enough even in cases with a lot of data and indices since all operations are RAM only :) It is important to remember that write operations get written to a special collection called oplog which resides in the local database and takes 5% of the volume by default. In my case the oplog would take 5% of 16GB which is 800MB. In doubt, it is safer to choose a fixed oplog size using the oplogSize option. If a secondary server is down for a longer time than the oplog contains, it will have to be resynced. To set it to 1GB, use: oplogSize = 1000 What about sharding? Now that you have all the querying capabilities of MongoDB, what if you want to implement a large service with it? Well you can use sharding freely to implement a large scalable in-memory store. Still the config servers (that contain the chunk distribution) should be disk based since their activity is small and rebuilding a cluster from scratch is not fun. What to watch for RAM is a scarce resource, and in this case you definitely want the entire data set to fit in RAM. Even though tmpfs can resort to swapping the performance would drop dramatically. To make best use of the RAM you should consider: usePowerOf2Sizes option to normalize the storage buckets run a compact command or resync the node periodically. use a schema design that is fairly normalized (avoid large document growth) Conclusion Sweet, you can now use MongoDB and all its features as an in-memory RAM-only store! Its performance should be pretty impressive: during the test with a single thread / core I was achieving 20k writes per second, and it should scale linearly over the number of cores.
October 28, 2013
by Antoine Girbal
· 61,019 Views
article thumbnail
JMS-style selectors on Amazon SQS with Apache Camel
This blog post demonstrates how easy it is to use Apache Camel and its new json-path component along with the camel-sqs component to produce and consume messages on Amazon SQS. Amazon Web Services SQS is a message queuing “software as a service” (SaaS) in the cloud. To be able to use it, you need to sign up for AWS. It’s primary access mechanism is XML over HTTP through various AWS SDK clients provided by Amazon. Please check out the SQS documentation for more. And as “luck” would have it, one of the users in the Apache Camel community created a component to be able to integrate with SQS. This makes it trivial to add a producer or consumer to an SQS queue and plugs in nicely with the Camel DSL. SQS, however, is not a “one-size fits all” queueing service; you must be aware of your use case and make sure it fits (current requirements as well as somewhat into the future…). There are limitations that, if not studied and accounted for ahead of time, could come back to sink your project. An example of a viable alternative, and one that more closely fits the profile of a high performance and full featured message queue is Apache ActiveMQ. For example, one limitation to keep in mind is that unlike traditional JMS consumers, you cannot create a subscription to a queue that filters messages based on some predicate (at least not using the AWS-SQS API — you’d have to build that into your solution). Some other things to keep in mind when using SQS: The queue does not preserve FIFO messaging That is, message order is not preserved. They can arrive out of order from when they were sent. Apache Camel can help with its resequencer pattern. Bilgin Ibryam, now a colleague of mine at Red Hat, has written a great blog post about how to restore message order using the resequencer pattern. Message size is limited to 256K This is probably sufficient, but if your message sizes are variable, or contain more data that 256K, you will have to chunk them and send in smaller chunks. No selector or selective consumption If you’re familiar with JMS, you know that you can specify consumers to use a “selector” or a predicate expression that is evaluated on the broker side to determine whether or not a specific message should be dispatched to a specific consumer. For example, Durability constraints Some use cases call for the message broker to store messages until consumers return. SQS allows a limit of up to 14 days. This is most likely sufficient, but something to keep in mind. Binary payloads not allowed SQS only allows text-based messages, e.g., XML, JSON, fixed format text, etc. Binary such as Avro, Protocol Buffers, or Thrift are not allowed. For some of these limitations, you can work around them by building out the functionality yourself. I would always recommend taking a look at how an integration library like Apache Camel can help — which has out-of-the-box support for doing some of these things. Doing JMS-style selectors So the basic problem is we want to subscribe to a SQS queue, but we want to filter which messages we process. For those messages that we do not process, those should be left in the queue. To do this, we will make use of Apache Camel’s Filter EIP as well as the visibility timeouts available on the SQS queue. By default, SQS will dispatch all messages in its queue when it’s queried. We cannot change this, and thus not avoid the message being dispatched to us — we’ll have to do the filtering on our side (this is different than how a full-featured broker like ActiveMQ does it, i.e., filtering is done on the broker side so the consumer doesn’t even see the message it does not want to see). Once SQS dispatches a message, it does not remove it from the queue unless the consumer has acknowledged that it has it and is finished with it. The consumer does this by sending a DeleteMessage command. Until the DeleteMessage command is sent, the message is always in the queue, however visibility comes in to play here. When a message is dispatched to a consumer, there is a period of time which it will not be visible to other consumers. So if you browsed the queue, you would not see it (it should appear in the stats as “in-flight”). However, there is a configurable period of time you can specify for how long this “visibility timeout” should be active. So if you set the visibility to a lower time period (default is 30 seconds), you can more quickly get messages re-dispatched to consumers that would be able to handle the message. Take a look at the following Camel route which does just that: @Override public void configure() throws Exception { // every two seconds, send a message to the "demo" queue in SQS from("timer:kickoff?period=5000") .setBody().method(this, "generateJsonString") .to("aws-sqs://demo?amazonSQSClient=#sqsClient&defaultVisibilityTimeout=2"); } In the above Camel Route, we create a new message every 5 seconds and send it to an SQS queue named demo — note we set the defaultVisibilityTimeout to 2 seconds. This means that after a message gets dispatched to a consumer, SQS will wait about 2 seconds before considering it eligible to be dispatched to another consumer if it has not been deleted. On the consumer side, we take advantage of a couple Apache Camel conveniences Using JSON Path + Filter EIP Camel has an excellent new component named JSON-Path. Claus Ibsen tweeted about it when he hacked it up. This allows you to do Content-Based Routing on a JSON payload very easily by using XPath-style expressions to pick out and evaluate attributes in a JSON encoded object. So in the following example, we can test an attribute named ‘type’ to be equal to ‘LOGIN’ and use Camel’s Filter EIP to allow only those messages that match to go through and continue processing: public class ConsumerRouteBuilder extends RouteBuilder { @Override public void configure() throws Exception { from("aws-sqs://demo?amazonSQSClient=#sqsClient&deleteIfFiltered=false") .setHeader("identity").jsonpath("$['type']") .filter(simple("${header.identity} == 'login'")) .log("We have a message! ${body}") .to("file:target/output?fileName=login-message-${date:now:MMDDyy-HHmmss}.json"); } } To complete the functionality, we have to pay attention to a new configuration option added for the Camel-SQS component: deleteIfFiltered — Whether or not to send the DeleteMessage to the SQS queue if an exchange fails to get through a filter. If ‘false’ and exchange does not make it through a Camel filter upstream in the route, then don’t send DeleteMessage. By default, Camel will send the “DeleteMessage” command to SQS after a route has completed successfully (without an exception). However, in this case, we are specifying to not send the DeleteMessage command if the message had been previously filtered by Camel. This example demonstrates how easy it is to use Apache Camel and its new json-path component along with the camel-sqs component to produce and consume messages on Amazon SQS. Please take a look at the source code on my github repo to play with the live code and try it out yourself.
October 28, 2013
by Christian Posta
· 12,082 Views
article thumbnail
Extracting File Metadata with C# and the .NET Framework
How to extract extended image metadata using C# and the Windows API Code Pack, simplifying access to detailed file properties typically seen in Windows Explorer.
October 26, 2013
by Rob Sanders
· 39,920 Views · 2 Likes
article thumbnail
Examples of the Windows Azure Storage Services REST API
The examples in this post were updated in September to work with the current version of the Windows Azure Storage REST API. In the Windows Azure MSDN Azure Forum there are occasional questions about the Windows Azure Storage Services REST API. I have occasionally responded to these with some code examples showing how to use the API. I thought it would be useful to provide some examples of using the REST API for tables, blobs and queues – if only so I don’t have to dredge up examples when people ask how to use it. This post is not intended to provide a complete description of the REST API. The REST API is comprehensively documented (other than the lack of working examples). Since the REST API is the definitive way to address Windows Azure Storage Services I think people using the higher level Storage Client API should have a passing understanding of the REST API to the level of being able to understand the documentation. Understanding the REST API can provide a deeper understanding of why the Storage Client API behaves the way it does. Fiddler The Fiddler Web Debugging Proxy is an essential tool when developing using the REST (or Storage Client) API since it captures precisely what is sent over the wire to the Windows Azure Storage Services. Authorization Nearly every request to the Windows Azure Storage Services must be authenticated. The exception is access to blobs with public read access. The supported authentication schemes for blobs, queues and tables and these are described here. The requests must be accompanied by an Authorization header constructed by making a hash-based message authentication code using the SHA-256 hash. The following is an example of performing the SHA-256 hash for the Authorization header: public static String CreateAuthorizationHeader(String canonicalizedString) { String signature = String.Empty; using (HMACSHA256 hmacSha256 = new HMACSHA256( Convert.FromBase64String(storageAccountKey) )) { Byte[] dataToHmac = System.Text.Encoding.UTF8.GetBytes(canonicalizedString); signature = Convert.ToBase64String(hmacSha256.ComputeHash(dataToHmac)); } String authorizationHeader = String.Format( CultureInfo.InvariantCulture, "{0} {1}:{2}", AzureStorageConstants.SharedKeyAuthorizationScheme, AzureStorageConstants.Account, signature ); return authorizationHeader; } This method is used in all the examples in this post. AzureStorageConstants is a helper class containing various constants. Key is a secret key for Windows Azure Storage Services account specified by Account. In the examples given here, SharedKeyAuthorizationScheme is SharedKey. The trickiest part in using the REST API successfully is getting the correct string to sign. Fortunately, in the event of an authentication failure the Blob Service and Queue Service responds with the authorization string they used and this can be compared with the authorization string used in generating the Authorization header. This has greatly simplified the us of the REST API. Table Service API The Table Service API supports the following table-level operations: Create Table Delete Table Query Tables The Table Service API supports the following entity-level operations: Delete Entity Insert Entity Merge Entity Update Entity Query Entities These operations are implemented using the appropriate HTTP VERB: DELETE – delete GET – query MERGE – merge POST – insert PUT – update This section provides examples of the Insert Entity and Query Entities operations. Insert Entity The InsertEntity() method listed in this section inserts an entity with two String properties, Artist and Title, into a table. The entity is submitted as an ATOM entry in the body of a request POSTed to the Table Service. In this example, the ATOM entry is generated by the GetRequestContentInsertXml() method. The date must be in RFC 1123 format in the x-ms-date header supplied to the canonicalized resource used to create the Authorization string. Note that the storage service version is set to “2012-02-12″ which requires the DataServiceVersion and MaxDataServiceVersion to be set appropriately. public void InsertEntity(String tableName, String artist, String title) { String requestMethod = "POST"; String urlPath = tableName; String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String contentMD5 = String.Empty; String contentType = "application/atom+xml"; String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n{1}\n{2}\n{3}\n{4}", requestMethod, contentMD5, contentType, dateInRfc1123Format, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] content = utf8Encoding.GetBytes(GetRequestContentInsertXml(artist, title)); Uri uri = new Uri(AzureStorageConstants.TableEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Accept = "application/atom+xml,application/xml"; request.ContentLength = content.Length; request.ContentType = contentType; request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Headers.Add("Accept-Charset", "UTF-8"); request.Headers.Add("DataServiceVersion", "2.0;NetFx"); request.Headers.Add("MaxDataServiceVersion", "2.0;NetFx"); using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(content, 0, content.Length); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } private String GetRequestContentInsertXml(String artist, String title) { String defaultNameSpace = "http://www.w3.org/2005/Atom"; String dataservicesNameSpace = "http://schemas.microsoft.com/ado/2007/08/dataservices"; String metadataNameSpace = "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"; XmlWriterSettings xmlWriterSettings = new XmlWriterSettings(); xmlWriterSettings.OmitXmlDeclaration = false; xmlWriterSettings.Encoding = Encoding.UTF8; StringBuilder entry = new StringBuilder(); using (XmlWriter xmlWriter = XmlWriter.Create(entry)) { xmlWriter.WriteProcessingInstruction("xml", "version=\"1.0\" encoding=\"UTF-8\""); xmlWriter.WriteWhitespace("\n"); xmlWriter.WriteStartElement("entry", defaultNameSpace); xmlWriter.WriteAttributeString("xmlns", "d", null, dataservicesNameSpace); xmlWriter.WriteAttributeString("xmlns", "m", null, metadataNameSpace); xmlWriter.WriteElementString("title", null); xmlWriter.WriteElementString("updated", String.Format("{0:o}", DateTime.UtcNow)); xmlWriter.WriteStartElement("author"); xmlWriter.WriteElementString("name", null); xmlWriter.WriteEndElement(); xmlWriter.WriteElementString("id", null); xmlWriter.WriteStartElement("content"); xmlWriter.WriteAttributeString("type", "application/xml"); xmlWriter.WriteStartElement("properties", metadataNameSpace); xmlWriter.WriteElementString("PartitionKey", dataservicesNameSpace, artist); xmlWriter.WriteElementString("RowKey", dataservicesNameSpace, title); xmlWriter.WriteElementString("Artist", dataservicesNameSpace, artist); xmlWriter.WriteElementString("Title", dataservicesNameSpace, title + "\n" + title); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.Close(); } String requestContent = entry.ToString(); return requestContent; } This generates the following request (as captured by Fiddler): POST https://STORAGE_ACCOUNT.table.core.windows.net/authors HTTP/1.1 Accept: application/atom+xml,application/xml Content-Type: application/atom+xml x-ms-date: Sun, 08 Sep 2013 06:31:12 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:w7Uu4wHZx4fFwa2bsxd/TJVZZ1AqMPwxvW+pYtoWHd0= Accept-Charset: UTF-8 DataServiceVersion: 2.0;NetFx MaxDataServiceVersion: 2.0;NetFx Host: STORAGE_ACCOUNT.table.core.windows.net Content-Length: 514 Expect: 100-continue Connection: Keep-Alive The body of the request is: 2013-09-08T07:19:07Z Beckett Molloy 2013-09-08T07:19:07.2189243Z Beckett Molloy Molloy Note that I should have URLEncoded the PartitionKey and RowKey but did not do so for simplicity. There are, in fact, some issues with the URL encoding of spaces and other symbols. Get Entity The GetEntity() method described in this section retrieves the single entity inserted in the previous section. The particular entity to be retrieved is identified directly in the URL. public void GetEntity(String tableName, String partitionKey, String rowKey) { String requestMethod = "GET"; String urlPath = String.Format("{0}(PartitionKey='{1}',RowKey='{2}')", tableName, partitionKey, rowKey); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n{2}", requestMethod, dateInRfc1123Format, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.TableEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Headers.Add("Accept-Charset", "UTF-8"); request.Accept = "application/atom+xml,application/xml"; request.Headers.Add("DataServiceVersion", "2.0;NetFx"); request.Headers.Add("MaxDataServiceVersion", "2.0;NetFx"); using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } This generates the following request (as captured by Fiddler): GET https://STORAGE_ACCOUNT.table.core.windows.net/authors(PartitionKey='Beckett',RowKey='Molloy') HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:31:14 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:1hWbr4aNq4JWCpNJY3rsLH1SkIyeFTJflbqyKMPQ1Gk= Accept-Charset: UTF-8 Accept: application/atom+xml,application/xml DataServiceVersion: 2.0;NetFx MaxDataServiceVersion: 2.0;NetFx Host: STORAGE_ACCOUNT.table.core.windows.net The Table Service generates the following response: HTTP/1.1 200 OK Cache-Control: no-cache Content-Type: application/atom+xml;charset=utf-8 ETag: W/"datetime'2013-09-08T06%3A31%3A14.1579056Z'" Server: Windows-Azure-Table/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: f4bd4c77-6fb6-42a8-8dff-81ea8d28fa2e x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:31:15 GMT Content-Length: 1108 The returned entities, in this case a single entity, are returned in ATOM entry format in the response body: https://STORAGE_ACCOUNT.table.core.windows.net/authors(PartitionKey='Beckett',RowKey='Molloy') 2013-09-08T06:31:15Z Beckett Molloy 2013-09-08T06:31:14.1579056Z Beckett Molloy Molloy Blob Service API The Blob Service API supports the following account-level operation: List Containers The Blob Service API supports the following container-level operation: Create Container Delete Container Get Container ACL Get Container Properties Get Container Metadata List Blobs Set Container ACL Set Container Metadata The Blob Service API supports the following blob-level operation: Copy Blob Delete Blob Get Blob Get Blob Metadata Get Blob Properties Lease Blob Put Blob Set Blob Metadata Set Blob Properties Snapshot Blob The Blob Service API supports the following operations on block blobs: Get Block List Put Block Put Block List The Blob Service API supports the following operations on page blobs: Get Page Regions Put Page This section provides examples of the Put Blob and Lease Blob operations. Put Blob The Blob Service and Queue Service use a different form of shared-key authentication from the Table Service so care should be taken in creating the string to be signed for authorization. The blob type, BlockBlob or PageBlob, must be specified as a request header and consequently appears in the authorization string. public void PutBlob(String containerName, String blobName) { String requestMethod = "PUT"; String urlPath = String.Format("{0}/{1}", containerName, blobName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String content = "Andrew Carnegie was born in Dunfermline"; UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] blobContent = utf8Encoding.GetBytes(content); Int32 blobLength = blobContent.Length; const String blobType = "BlockBlob"; String canonicalizedHeaders = String.Format( "x-ms-blob-type:{0}\nx-ms-date:{1}\nx-ms-version:{2}", blobType, dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, blobLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.BlobEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-blob-type", blobType); request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = blobLength; using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(blobContent, 0, blobLength); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String ETag = response.Headers["ETag"]; } } This generates the following request: PUT https://STORAGE_ACCOUNT.blob.core.windows.net/fife/dunfermline HTTP/1.1 x-ms-blob-type: BlockBlob x-ms-date: Sun, 08 Sep 2013 06:28:29 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:ntvh/lamVmikvwHhy6vRVBIh87kibkPlEOiHyLDia6g= Host: STORAGE_ACCOUNT.blob.core.windows.net Content-Length: 39 Expect: 100-continue Connection: Keep-Alive The body of the request is: Andrew Carnegie was born in Dunfermline The Blob Service generates the following response: HTTP/1.1 201 Created Transfer-Encoding: chunked Content-MD5: RYJnWGXLyt94l5jG82LjBw== Last-Modified: Sun, 08 Sep 2013 06:28:31 GMT ETag: "0x8D07A73C5704A86" Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: b74ef0a2-294d-4581-b8f1-6cda724bbdbf x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:28:30 GMT Lease Blob The Blob Service allows a user to lease a blob for a minute at a time and so acquire a write lock on it. The use case for this is the locking of a page blob used to store the VHD backing an writeable Azure Drive. The LeaseBlob() example in this section demonstrates a subtle issue with the creation of authorization strings. The URL has a query string, comp=lease. Rather than using this directly in creating the authorization string it must be converted into comp:lease with a colon replacing the equal symbol – see modifiedURL in the example. Furthermore, the Lease Blob operation requires the use of an x-ms-lease-action to indicate whether the lease is being acquired, renewed, released or broken. public void LeaseBlob(String containerName, String blobName) { String requestMethod = "PUT"; String urlPath = String.Format("{0}/{1}?comp=lease", containerName, blobName); String modifiedUrlPath = String.Format("{0}/{1}\ncomp:lease", containerName, blobName); const Int32 contentLength = 0; String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String leaseAction = "acquire"; String leaseDuration = "60"; String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-lease-action:{1}\nx-ms-lease-duration:{2}\nx-ms-version:{3}", dateInRfc1123Format, leaseAction, leaseDuration, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, modifiedUrlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, contentLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.BlobEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-lease-action", leaseAction); request.Headers.Add("x-ms-lease-duration", leaseDuration); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = contentLength; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String leaseId = response.Headers["x-ms-lease-id"]; } } This generates the following request: PUT https://STORAGE_ACCOUNT.blob.core.windows.net/fife/dunfermline?comp=lease HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:28:31 GMT x-ms-lease-action: acquire x-ms-lease-duration: 60 x-ms-version: 2012-02-12 Authorization: SharedKey rebus:+SQ5+RFZg3hUaws5XCRHxsDgXb1ycdRIz5EKyHJWP7s= Host: rebus.blob.core.windows.net Content-Length: 0 The Blob Service generates the following response: HTTP/1.1 201 Created Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: 4b6ff77f-f885-4f74-803a-c92920d225c3 x-ms-version: 2012-02-12 x-ms-lease-id: b1320c2c-65ad-41d6-a7bd-85a4242c0ac5 Date: Sun, 08 Sep 2013 06:28:31 GMT Content-Length: 0 Queue Service API The Queue Service API supports the following queue-level operation: List Queues The Queue Service API supports the following queue-level operation: Create Queue Delete Queue Get Queue Metadata Set Queue Metadata The Queue Service API supports the following message-level operations: Clear Messages Delete Message Get Messages Peek Messages Put Message This section provides examples of the Put Message and Get Message operations. Put Message The most obvious curiosity about Put Message is that it uses the HTTP verb POST rather than PUT. The issue is presumably the interaction of the English language and the HTTP standard which states that PUT should be idempotent and that the Put Message operation is clearly not since each invocation merely adds another message to the queue. Regardless, it did catch me out when I failed to read the documentation well enough – so take that as a warning. The content of a message posted to the queue must be formatted in a specified XML schema and must then be UTF8 encoded. public void PutMessage(String queueName, String message) { String requestMethod = "POST"; String urlPath = String.Format("{0}/messages", queueName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String messageText = String.Format( "{0}", message); UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] messageContent = utf8Encoding.GetBytes(messageText); Int32 messageLength = messageContent.Length; String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-version:{1}", dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, messageLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.QueueEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = messageLength; using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(messageContent, 0, messageLength); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String requestId = response.Headers["x-ms-request-id"]; } } This generates the following request: POST https://rebus.queue.core.windows.net/revolution/messages HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:34:08 GMT x-ms-version: 2012-02-12 Authorization: SharedKey rebus:nyASTVWifnxHKnj2wXwuzzzXz5CxUBZj58SToV5QFK8= Host: rebus.queue.core.windows.net Content-Length: 76 Expect: 100-continue Connection: Keep-Alive The body of the request is: Saturday in the cafe The Queue Service generates the following response: HTTP/1.1 201 Created Server: Windows-Azure-Queue/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: 14c6e73b-15d9-480c-b251-c4c01b48e529 x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:34:09 GMT Content-Length: 0 Get Messages The Get Messages operation described in this section retrieves a single message with the default message visibility timeout of 30 seconds. public void GetMessage(String queueName) { string requestMethod = "GET"; String urlPath = String.Format("{0}/messages", queueName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-version:{1}", dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n\n\n\n\n\n\n\n\n\n{1}\n{2}", requestMethod, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.QueueEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Accept = "application/atom+xml,application/xml"; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } This generates the following request: GET https://rebus.queue.core.windows.net/revolution/messages HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:34:11 GMT x-ms-version: 2012-02-12 Authorization: SharedKey rebus:K67XooYhokw0i0AlCzYQ4GeLLrJih1r1vSqiO9DBo0c= Accept: application/atom+xml,application/xml Host: rebus.queue.core.windows.net The Queue Service generates the following response: HTTP/1.1 200 OK Content-Type: application/xml Server: Windows-Azure-Queue/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: efb21a86-7d66-47fd-b13d-7aa74fce0568 x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:34:12 GMT Content-Length: 484 The message is returned in the response body as follows: 05fd902f-6031-4ef4-8298-ef3844ec3bc6Sun, 08 Sep 2013 06:34:11 GMTSun, 15 Sep 2013 06:34:11 GMT1AgAAAAMAAAAAAAAAAL+zgF2szgE=Sun, 08 Sep 2013 06:34:43 GMTSaturday in the cafe I noticed that some newline specifiers in strings (\n) were lost when the blog was auto-ported from Windows Live Spaces to WordPress. I have put them back in but it is possible I missed some. Consequently, in the event of a problem you should check the newlines in canonicalizedHeaders and stringToSign.
October 24, 2013
by Neil Mackenzie
· 38,782 Views
article thumbnail
PostgreSQL to SQLite: The Journey
This article will be useful if you want to support both PostgreSQL and SQLite using JDBC. It will be especially useful if you: Are already accessing values from your (PostgreSQL) database using the regular JDBC ResultSet interface, like: Date d = rs.getDate("date_field"); BigDecimal bd = rs.getBigDecimal("bigdecimal_field"); And it is creating trouble when doing the same for SQLite, but you don't want to change that code. Are already retrieving autogenerated keys in PostgreSQL with a RETURNING clause, but this won't work in SQLite. You want a unified solution that works for both databases. Thought foreign keys are enforced in SQLite by default (like in PostgreSQL) and crashed with a wall. SQLite is allowing you to delete entries from your tables even when they are referenced in another table and you have explicitly told SQLite about it with a REFERENCES table_name(field_name) clause. Are having trouble with the differences between PostgreSQL and SQLite dialects (mostly concerning data types), for example, when making query filters with boolean values. Had your own way to manage exceptions for PostgreSQL and it is not working for SQLite (obviously). You want SQLite to fit into the model you already have. Other stuff might appear if you keep up... A few months ago I wanted to migrate an app to use SQLite as a data backend. In fact, I wanted it to work with both PostgreSQL and SQLite indistinctly (but not at the same time). I wanted to switch between these two databases easily without changing any code. I did it, but along the way I had to solve some problems that might be interesting to many other people. Many solutions I found were spread across the web, but there was no single place that explained how to completely achieve what I wanted. So, the aim of this post is to try to condense my learning into one article that may be of help to others as a (semi) complete guide. This guide might be useful not only to those creating their own frameworks, but for anyone who doesn't use any and are willing to try some quirks and tricks to make their app work. THE BEGINNING There are many cross-database incompatibilities between PostgreSQL and SQLite, most notably on data types. If you want to have the same code to work for both databases, you better use a framework that manages this for you. But here's the thing: the framework I use is created by myself, and didn't (completely) take these differences into account, since I mainly use PostgreSQL as database; that's how and why my problems arose. My framework conveys many things, but I focus here in the data access part. It uses some JDBC driver to connect to the databases, but it provides more abstract ways to do it; that's pretty much the data access part of the framework. A basic DAO class for my framework would look like this: public class MyDAO extends BaseDAO { public MyDAO() { super("context_alias", new DefaultDataMappingStrategy() { @Override public Object createResultObject(ResultSet rs) throws SQLException { MyModel model = (MyModel)ObjectsFactory.getObject("my_model_alias"); model.setStringField(rs.getString("string_field")); model.setIntegerField(rs.getInt("integer_field")); model.setBigDecimalField(rs.getBigDecimal("bigdecimal_field")); model.setDateField(rs.getDate("date_field")); model.setBooleanField(rs.getBoolean("boolean_field")); return model; } }); } @Override public String getTableName() { return "table_name"; } @Override public String getKeyFields() { return "string_field|integer_field"; } @Override protected Map getInsertionMap(Object obj) { Map map = new HashMap(); MyModel model = (MyModel) obj; map.put("string_field", model.getStringField()); map.put("integer_field", model.getIntegerField()); map.put("bigdecimal_field", model.getBigDecimalField()); map.put("date_field", model.getDateField()); map.put("boolean_field", model.getBooleanField()); return map; } @Override protected Map getUpdateMap(Object obj) { Map map = new HashMap(); MyModel model = (MyModel) obj; map.put("bigdecimal_field", model.getBigDecimalField()); map.put("date_field", model.getDateField()); map.put("boolean_field", model.getBooleanField()); return map; } @Override public String getFindAllStatement() { return "SELECT * FROM :@ "; } So, that I wanted to switch between databases without changing code means that I wanted to switch without changing my DAO classes. For SQLite, I used the xerial-jdbc-sqlite driver. I talk about drivers because there are some things that might be driver-specific when solving some problems; so when I say 'SQlite does it this way', I generally mean 'xerial-jdbc-sqlite driver does it this way'. Now, let's start. WARNING: Some of the solutions I give here fit into my framework, but might not directly fit into your code. It's up to you to imagine how to adapt what I provide here. DATA TYPES Since there are some differences between PostgreSQL and SQLite regarding data types, and I wanted to continue to access database values through the regular ResultSet interface, I had to have some mechanism to intercept the call to, for instance, resultset.getDate("date_field"). So I created a ResultSetWrapper class that would redefine the methods I was interested in, like this: public class ResultSetWrapper implements ResultSet { // The wrappped ResultSet ResultSet wrapped; /* I will use this DateFormat to format dates. I'm assuming an SQLite style pattern. I should not */ SimpleDateFormat df = new SimpleDateFormat("yyyy-mm-dd"); public ResultSetWrapper(ResultSet wrapped) { this.wrapped = wrapped; } /* Lots of ResultSet methods implementations go here, but this is an example of redefining a method I'm interested in changing its behavior: */ public Date getDate(String columnLabel) throws SQLException { Object value = this.wrapped.getObject(columnLabel); return (Date)TypesInferreer.inferDate(value); } } The getDate() method in ResultSetWrapper relies on TypesInferreer to convert the value retrieved to a Date value. All data types convertions would be encapsulated inside TypesInferreer, which would have methods to convert from different data types as needed. For instance, it would have a method like this one: public static Object inferDate(Object value) { java.util.Date date; // Do convertions here (convert value and asign to date) return date; } Which tries to convert any value to a Date (I'll show the actual implementation further). Now, instead of using the original resultset retrieved from saying preparedStatement.executeQuery(), you use new ResultSetWrapper(preparedStatement.executeQuery()). That's what my framework does: it passes this new resultset to DAO objects. Now let's see some type conversions. Mixing PostgreSQL Date and SQLite Long/String You could store Date values as text in a SQLite database (eg. '2013-10-09'); this you can do manually when creating the database, but when SQLite stores a Date object, by default it converts it to a Long value. There is no problem with this when saving the value to the SQLite database, but if you try to retrieve it using resultset.getDate("date_field"), then things get messy; It simply won't work (CastException). How do you access Date values, then? You create this method in TypesInfereer, which covers both String and Long variations: public static Object inferDate(Object value) { java.util.Date date = null; if(value == null) return null; if(value instanceof String) { try { date = df.parse((String)value); } catch (ParseException ex) { // Deal with ex } } else if(value instanceof Long) { date = new java.util.Date((Long)value); } else { date = (Date)value; } return new Date(date.getTime()); } And as you saw, the getDate() function in ResultSetWrapper is redefined like this: @Override public Date getDate(String columnLabel) throws SQLException { Object value = this.wrapped.getObject(columnLabel); return (Date)TypesInferreer.inferDate(value); } Now all DAOs can retrieve Date values from both databases indistinctly, using resultset.getDate("date_field"). Mixing PostgreSQL Numeric and SQLite Integer/Double/... My SQLite driver didn't implement the getBigDecimal() function. It complained like this when I called it: java.sql.SQLException: not implemented by SQLite JDBC driver. So I had to come up with a solution that was valid for both PostgreSQL and SQlite. This is what I did in ResultSetWrapper: @Override public BigDecimal getBigDecimal(String columnLabel) throws SQLException { Object value = this.wrapped.getObject(columnLabel); return (BigDecimal)TypesInferreer.inferBigDecimal(value); } But value would get different types depending on the actual value stored in the database; it could be an Integer, or a Double, or perhaps something else. I solved all the cases by doing this in TypesInfereer: public static Object inferBigDecimal(Object value) { if(value == null) return null; if(value instanceof BigDecimal == false) { return new BigDecimal(String.valueOf(value)); } return value; } Anyway, the String constructor of BigDecimal is the recommended one, so everything's fine with this. Now you can retrieve BigDecimal values using resultset.getBigDecimal("bigdecimal_field") from both databases. Mixing PostgreSQL Boolean and SQLite Integer SQLite doesn't have boolean values. Instead, it interprets any other value as boolean by following some rules. When SQLite saves a Boolean value to the database, it saves it as 0 or 1 for false or true respectively. Also, because drivers can interpret any value as boolean, you can use resultset.getBoolean("boolean_field") and it will work as expected by the rules. But the problem I faced was when creating filters. If a value for true is stored as 1 in the SQLite database, you can't expect the clause WHERE boolean_field = true to work. You will never find a match. Instead, you should have said WHERE boolean_field = 1. In my app, I created filters like this: dao.addFilter(new FilterSimple("boolean_field", true)); Now I needed FilterSimple to infer that, for SQLite, I meant 1 instead of true. So I created what I called a DatasourceVariation. These are objects that are specific for each type of database and are used accross all data accesses, by DAOs, Filters, and other objects. These objects would take care of managing all my cross-database incompatibilities, including: The way to reference a database object: in PostgreSQL you must prepend the schema name to every database object you refer in your queries. In SQLite you don't. The way to manage exeptions: explained further in this post. The way to backup and restore data: explained further in this post. Expressing BETWEEN clauses: Explained further in this post. And also, infering boolean values. For VariationSQLite, I did this: @Override public Object getReplaceValue(Object value) { if(value instanceof Boolean) { if((Boolean)value == true) return new Integer(1); else return new Integer(0); } return value; } Now we can say dao.addFilter(new FilterSimple("boolean_field", true)) for both databases, assuming that FilterSimple uses the variation to adapt the value before constructing the clause. RETRIEVING AUTOGENERATED KEYS When you have autonumeric fields (eg. Serial), in PostgreSQL you can specify a RETURNING clause at the end of an INSERT statement to automatically retrieve the values of autogenerated fields by doing this: PreparedStatement pstm = conn.prepareStatement(queryWithReturningClause); // ex. select * from table_x returning field_x ResultSet rs = statement.executeQuery(); if(rs.next()) { // Get autogenerated fields from rs } But that won't work with SQLite. In SQLite, retrieving autogenerated fields conveys a process that goes from creating the statement, executing the query and explicitly asking for the generated values. Like this: PreparedStatement pstm = conn.prepareStatement(queryWITHOUTreturningClause, Statement.RETURN_GENERATED_KEYS); pstm.executeUpdate(); ResultSet rs = pstm.getGeneratedKeys(); if (rs != null && rs.next()) { // Get autogenerated fields from rs } The good news is that this code works both for PostgreSQL and SQLite, so I replaced my previous code for this, and didn't have to make any distinction between databases. ENFORCING FOREIGN KEYS You'd think that using a REFERENCES table_name(field_name) clause when creating a SQLite database table makes foreign keys to be checked when deleting, updating, etc. You're wrong! Foreign keys are not enforced in SQLite by default. You have to explicitly say it, and it's done when creating the connection (WARNING: This is very driver-specific): SQLiteConfig config = new SQLiteConfig(); config.enforceForeignKeys(true); Connection conn = DriverManager.getConnection("jdbc:sqlite:" + dataSourcePath, config.toProperties()); For PostgreSQL it's different, so you better have a connection pool for each type of database, and decide which one to use at runtime. My framework does exactly that. NOTE: If you are capable of getting the connection depending on the database type, then you can enforce foreign keys transparently for both databases (for PostgreSQL it happens naturally without extra code). For instance, you could have an abstract getConnection() method, and each database's connection pool would return the connection in its own way. MANAGING EXCEPTIONS I had defined some different types of database exceptions in my framework: ExceptionDBDuplicateEntry, ExceptionDBEntryReferencedElsewhere, etc, which would be thrown and raised to upper layers in my architecture. For PostgreSQL, these exceptions directly mapped to some constant codes (which normally are vendor/driver specific): UNIQUE_VIOLATION = "23505", FOREIGN_KEY_VIOLATION = "23503", etc. So, for PostgreSQL, I managed database exceptions something like this: @Override public void manageException(SQLException ex) throws ExceptionDBDuplicateEntry, ExceptionDBEntryReferencedElsewhere { if (ex.getSQLState() == null) { ex = (SQLException) ex.getCause(); } if (ex.getSQLState().equals(UNIQUE_VIOLATION)) { throw new ExceptionDBDuplicateEntry(); } else if(ex.getSQLState().equals(FOREIGN_KEY_VIOLATION)) { throw new ExceptionDBEntryReferencedElsewhere(); } else { DAOPackage.log(ex); throw new ExceptionDBUnknownError(ex); } } That won't work for SQLite, obviously! So, what I did was move the database exceptions management to the DataSourceVariation. The VariationPostgresql class would have a method similar to the one above. For VarialtionSQLite, I did sort of a hack, but it's something that has worked until now (maybe until I change my driver). @Override public void manageException(SQLException ex) throws ExceptionDBDuplicateEntry, ExceptionDBEntryReferencedElsewhere { // This is a hack (is it???) String message = ex.getMessage().toLowerCase(); if(message.contains("sqlite_constraint")) { if(message.contains("is not unique")) throw new ExceptionDBDuplicateEntry(); else if(message.contains("foreign key constraint failed")) throw new ExceptionDBEntryReferencedElsewhere(); else { DAOPackage.log(ex); throw new ExceptionDBUnknownError(ex); } } else { DAOPackage.log(ex); throw new ExceptionDBUnknownError(ex); } } Update: This technique might have some flaws. But hey, can you find a better approach right away? FIXING BETWEEN CLAUSE The problem with the BETWEEN clause appeared while using a filter like this: dao.addFilter(new FilterBetween("date_field", date1, date2)); // date1 and date2 are java.util.Date objects FilterBetween would create a BETWEEN clause by formatting Dates as Strings, normally with the format 'yyyy-MM-dd' (although this should be configurable). Since dates in SQLite are long values, we can't create a clause like date_field BETWEEN '2013-01-01' AND '2013-02-01'. It had to be something like date_field >=1357016400000 AND date_field <= 1359694800000. So, I moved the creation of BETWEEN clauses to.... that's right, to DataSourceVariation. VariationSQLite does it like this: @Override public String getBetweenExpression(String fieldName, Object d1, Object d2) { String filter = ""; try { Date dd1 = null; Date dd2 = null; SimpleDateFormat df = new SimpleDateFormat("yyyy-mm-dd"); // Remember, this should be configurable if(d1 instanceof String) dd1 = df.parse((String)d1); else dd1 = (Date)d1; if(d2 instanceof String)dd2 = df.parse((String)d2); else dd2 = (Date)d2; filter = fieldName + " >= " + dd1.getTime() + " AND " + fieldName + " <= " + dd2.getTime(); } catch (ParseException ex) { DAOPackage.log(ex); throw new ExceptionDBUnknownError(ex); } return filter; } CONCLUSIONS As you can see, there are many intricacies when making an app support multiple database types. All I did here was only to support PostgreSQL and SQLite, but who knows what is needed to support other databases at the same time too. You can't expect JDBC alone will do all the work, so be prepared to solve some problems (and another problem, and another, ...) to make a database migration. And please, share your journey.
October 21, 2013
by Martín Proenza
· 12,642 Views
  • Previous
  • ...
  • 501
  • 502
  • 503
  • 504
  • 505
  • 506
  • 507
  • 508
  • 509
  • 510
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×