Databases Resources

The Latest Databases Topics

Creating a backup in Team Foundation Server 2010 using the Power Tools

over the last few years the product team has been putting their finishing touches on a backup module for the team foundation server administration console. why you might ask do you need another way to backup? surely you can just backup the bits? well, you could, but as tfs has a lot of moving parts it can get very complicated to creating a backup . required permissions identify databases create tables in databases create a stored procedure for marking tables create a stored procedure for marking all tables at once create a stored procedure to automatically mark tables create a scheduled job to run the table-marking procedure create a maintenance plan for full backups create a maintenance plan for differential backups create a maintenance plan for transaction backups back up additional lab management components -from “back up team foundation server” on msdn there are a heck of a lot of databases that, depending on your environment, might be spread over your entire network. figure: deployment topologies (where is my data?) from msdn so, how is this problem solved. well the tfs team have create a tool to create all of the backups and all of the job as well as managing the backup location for you. this sounds fantastic, but how about in practice. was it really that easy? well….not really…here is the extra stuff i found out: your account must own the share owning the folder does not cut it (see error #1- tf254027). sql must be running under a domain account or network service sql must also have permission to the share, and the validation will get confused if you use “localsystem” instead of network service or a domain account (see error #2- tf254027) the account running sql must have permission to create spn’s the account that is used for sql must be able to both see and create service principal names in active directory (see error #3: terminating your tfs server). once you learn how to google without keywords and read your servers mind you will have a nice backup system going… error #1- tf254027 i initially got an error because the accounts did not really have full control over the target location. this is a problem with the share. although i have full permission for \\fileserver1\share\tfsbackups it is just a folder under the \\fileserver1\share\ location and i do not have permission to change the sharing settings there. figure: tf254027 is caused by permission issues [info @16:36:34.342] granting account root_company\tfssqlbox$ permission on folder \\fileserver1\share\tfsbackups [info @16:36:34.348] system.unauthorizedaccessexception: attempted to perform an unauthorized operation. at system.security.accesscontrol.win32.setsecurityinfo(resourcetype type, string name, safehandle handle, securityinfos securityinformation, securityidentifier owner, securityidentifier group, genericacl sacl, genericacl dacl) at system.security.accesscontrol.nativeobjectsecurity.persist(string name, safehandle handle, accesscontrolsections includesections, object exceptioncontext) at system.security.accesscontrol.filesystemsecurity.persist(string fullpath) at microsoft.teamfoundation.powertools.admin.helpers.filehelper.grantfolderpermission(string account, string path) [info @16:36:34.350] granting account root_company\tfs.services permission on folder \\fileserver1\share\tfsbackups [info @16:36:34.352] system.unauthorizedaccessexception: attempted to perform an unauthorized operation. at system.security.accesscontrol.win32.setsecurityinfo(resourcetype type, string name, safehandle handle, securityinfos securityinformation, securityidentifier owner, securityidentifier group, genericacl sacl, genericacl dacl) at system.security.accesscontrol.nativeobjectsecurity.persist(string name, safehandle handle, accesscontrolsections includesections, object exceptioncontext) at system.security.accesscontrol.filesystemsecurity.persist(string fullpath) at microsoft.teamfoundation.powertools.admin.helpers.filehelper.grantfolderpermission(string account, string path) [error @16:36:34.352] granting permission to account root_company\tfssqlbox$ on path \\fileserver1\share\tfsbackups failed figure: the log files get to the root of the problem, but not the reason after much messing around i have found that you can’t use a sub-folder of a share that you do not have permission for. you require permission to the share itself to apply permissions. error #2- tf254027 lets try this again with a share that we control. i will create a backup share on the tfs server and at least then i control then permissions. figure: the next error looks the same, but it is subtly different [info @18:12:05.813] "verify: grant backup plan permissions\root\verifybackuppathpermissionsgrantedsuccessfully(verifybackuppathpermissionsgrantedsuccessfully): exiting verification with state completed and result success" [info @18:12:05.813] verify: grant backup plan permissions\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): starting verification [info @18:12:05.813] verify test backup created successfully [info @18:12:05.813] starting creating backup test validation [error @18:12:06.132] microsoft.sqlserver.management.smo.failedoperationexception: backup failed for server 'sqlserver1'. ---> microsoft.sqlserver.management.common.executionfailureexception: an exception occurred while executing a transact-sql statement or batch. ---> system.data.sqlclient.sqlexception: cannot open backup device '\\tfsserver1\tfsbackup\temp_20111104111205.bak'. operating system error 5(access is denied.). backup database is terminating abnormally. at microsoft.sqlserver.management.common.connectionmanager.executetsql(executetsqlaction action, object execobject, dataset filldataset, boolean catchexception) at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) --- end of inner exception stack trace --- at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) at microsoft.sqlserver.management.common.serverconnection.executenonquery(stringcollection sqlcommands, executiontypes executiontype) at microsoft.sqlserver.management.smo.executionmanager.executenonquery(stringcollection queries) at microsoft.sqlserver.management.smo.backuprestorebase.executesql(server server, stringcollection queries) at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) --- end of inner exception stack trace --- at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) at microsoft.teamfoundation.powertools.admin.helpers.backupfactory.testbackupcreation(string path) [error @18:12:06.184] !verify error!: account root_comapny\martin.hinshelwood failed to create backups using path \\tfsserver1\tfsbackup [info @18:12:06.184] "verify: grant backup plan permissions\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): exiting verification with state completed and result error" [info @18:12:06.184] !verify result!: 5 completed, 0 skipped: 4 success, 1 errors, 0 warnings [info @18:12:06.197] verify: backup tasks verifications(vcontainer): starting verification [info @18:12:06.197] a generic container node that does not contribute to results [info @18:12:06.197] "verify: backup tasks verifications(vcontainer): exiting verification with state ignore and result ignore" [info @18:12:06.197] verify: backup tasks verifications\root(vcontainer): starting verification [info @18:12:06.197] a generic container node that does not contribute to results [info @18:12:06.197] "verify: backup tasks verifications\root(vcontainer): exiting verification with state ignore and result ignore" [info @18:12:06.197] verify: backup tasks verifications\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): starting verification [info @18:12:06.197] verify test backup created successfully [info @18:12:06.197] starting creating backup test validation [error @18:12:06.389] microsoft.sqlserver.management.smo.failedoperationexception: backup failed for server sqlserver1'. ---> microsoft.sqlserver.management.common.executionfailureexception: an exception occurred while executing a transact-sql statement or batch. ---> system.data.sqlclient.sqlexception: cannot open backup device '\\tfsserver1\tfsbackup\temp_20111104111206.bak'. operating system error 5(access is denied.). backup database is terminating abnormally. at microsoft.sqlserver.management.common.connectionmanager.executetsql(executetsqlaction action, object execobject, dataset filldataset, boolean catchexception) at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) --- end of inner exception stack trace --- at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) at microsoft.sqlserver.management.common.serverconnection.executenonquery(stringcollection sqlcommands, executiontypes executiontype) at microsoft.sqlserver.management.smo.executionmanager.executenonquery(stringcollection queries) at microsoft.sqlserver.management.smo.backuprestorebase.executesql(server server, stringcollection queries) at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) --- end of inner exception stack trace --- at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) at microsoft.teamfoundation.powertools.admin.helpers.backupfactory.testbackupcreation(string path) figure: this time the error is lying and is from sql not locally as it implies it looks like the problem is that sql server can’t write to that folder, but i can and the machine account can. lets try this from the sql server itself, and with a native backup. figure: sql server can’t write to that location dam… so even a native sql backup can’t write to this location. title: microsoft sql server management studio ------------------------------ backup failed for server 'sqlserver1'. (microsoft.sqlserver.smoextended) for help, click: http://go.microsoft.com/fwlink?prodname=microsoft+sql+server&prodver=10.50.2500.0+((kj_pcu_main).110617-0038+)&evtsrc=microsoft.sqlserver.management.smo.exceptiontemplates.failedoperationexceptiontext&evtid=backup+server&linkid=20476 ------------------------------ additional information: system.data.sqlclient.sqlerror: cannot open backup device '\\tfsserver1\tfsbackup\moo.bak'. operating system error 5(access is denied.). (microsoft.sqlserver.smo) for help, click: http://go.microsoft.com/fwlink?prodname=microsoft+sql+server&prodver=10.50.2500.0+((kj_pcu_main).110617-0038+)&linkid=20476 ------------------------------ buttons: ok ------------------------------ figure: sql server errors suck even more as it turns out, sql server is running under “localserivce” which is not authenticating against our share. so we need to change the service that tfs runs under. error #3: terminating your tfs server as we should always use the sql server configuration manager to change these things i fired it up and since i already have a domain account for running tfs under i decided to use that one. figure: this is easy when you apply it will ask you to restart sql, but it should be all complete. lets check tfs and make sure that everything is running… figure: omg! what just happened! oh shit: i think i just broke tfs. why can’t tfs connect? lets try the sql management studio and see. figure: what is a sspi? this does not look good… after i have hastily changed the service account back to the original value and made use that his fixed tfs i wanted to also figure out why it broke. usually i would just ask shad (one of my extremely technical colleagues) but alas he is on his honeymoon. some googling turned up an spn issue. the account that sql runs under must be able to both read and write service principal names for itself in active directory. this can be set, but only be a domain admin. dynamically set spn’s for sql service accounts so lets go with network service instead. if we change the account that sql server runs under to “network service” then i can add permission for “root_company\sqlserver1$” to my share and get it working. yes, servers have ad accounts as well.

November 5, 2011

by Martin Hinshelwood

· 10,029 Views

Recommendation Engine Models

In a classical model of recommendation system, there are "users" and "items". User has associated metadata (or content) such as age, gender, race and other demographic information. Items also has its metadata such as text description, price, weight ... etc. On top of that, there are interaction (or transaction) between user and items, such as userA download/purchase movieB, userX give a rating 5 to productY ... etc. Now given all the metadata of user and item, as well as their interaction over time, can we answer the following questions ... What is the probability that userX purchase itemY ? What rating will userX give to itemY ? What is the top k unseen items that should be recommended to userX ? Content-based Approach In this approach, we make use of the metadata to categorize user and item and then match them at the category level. One example is to recommend jobs to candidates, we can do a IR/text search to match the user's resume with the job descriptions. Another example is to recommend an item that is "similar" to the one that the user has purchased. Similarity is measured according to the item's metadata and various distance function can be used. The goal is to find k nearest neighbors of the item we know the user likes. Collaborative Filtering Approach In this approach, we look purely at the interactions between user and item, and use that to perform our recommendation. The interaction data can be represented as a matrix. Notice that each cell represents the interaction between user and item. For example, the cell can contain the rating that user gives to the item (in the case the cell is a numeric value), or the cell can be just a binary value indicating whether the interaction between user and item has happened. (e.g. a "1" if userX has purchased itemY, and "0" otherwise. The matrix is also extremely sparse, meaning that most of the cells are unfilled. We need to be careful about how we treat these unfilled cells, there are 2 common ways ... Treat these unknown cells as "0". Make them equivalent to user giving a rate "0". This may or may not be a good idea depends on your application scenarios. Guess what the missing value should be. For example, to guess what userX will rate itemA given we know his has rate on itemB, we can look at all users (or those who is in the same age group of userX) who has rate both itemA and itemB, then compute an average rating from them. Use the average rating of itemA and itemB to interpolate userX's rating on itemA given his rating on itemB. User-based Collaboration Filter In this model, we do the following Find a group of users that is “similar” to user X Find all movies liked by this group that hasn’t been seen by user X Rank these movies and recommend to user X This introduces the concept of user-to-user similarity, which is basically the similarity between 2 row vectors of the user/item matrix. To compute the K nearest neighbor of a particular users. A naive implementation is to compute the "similarity" for all other users and pick the top K. Different similarity functions can be used. Jaccard distance function is defined as the number of intersections of movies that both users has seen divided by the number of union of movies they both seen. Pearson similarity is first normalizing the user's rating and then compute the cosine distance. There are two problems with this approach Compare userX and userY is expensive as they have millions of attributes Find top k similar users to userX require computing all pairs of userX and userY Location Sensitive Hashing and Minhash To resolve problem 1, we approximate the similarity using a cheap estimation function, called minhash. The idea is to find a hash function h() such that the probability of h(userX) = h(userY) is proportion to the similarity of userX and userY. And if we can find 100 of h() function, we can just count the number of such function where h(userX) = h(userY) to determine how similar userX is to userY. The idea is depicted as follows ... It will be expensive to permute the rows if the number of rows is large. Remember that the purpose of h(c1) is to return row number of the first row that is 1. So we can scan each row of c1 to see if it is 1, if so we apply a function newRowNum = hash(rowNum) to simulate a permutation. Take the minimum of the newRowNum seen so far. As an optimization, instead of doing one column at a time, we can do it a row at the time, the algorithm is as follows To solve problem 2, we need to avoid computing all other users' similarity to userX. The idea is to hash users into buckets such that similar users will be fall into the same bucket. Therefore, instead of computing all users, we only compute the similarity of those users who is in the same bucket of userX. The idea is to horizontally partition the column into b bands, each with r rows. By pick the parameter b and r, we can control the likelihood (function of similarity) that they will fall into the same bucket in at least one band. Item-based Collaboration Filter If we transpose the user/item matrix and do the same thing, we can compute the item to item similarity. In this model, we do the following ... Find the set of movies that user X likes (from interaction data) Find a group of movies that is similar to these set of movies that we know user X likes Rank these movies and recommend to user X It turns out that computing item-based collaboration filter has more benefit than computing user to user similarity for the following reasons ... Number of items typically smaller than number of users While user's taste will change over time and hence the similarity matrix need to be updated more frequent, item to item similarity tends to be more stable and requires less update. Singular Value Decomposition If we look back at the matrix, we can see the matrix multiplication is equivalent to mapping an item from the item space to the user space. In other words, if we view each of the existing item as an axis in the user space (notice, each user is a vector of their rating on existing items), then multiplying a new item with the matrix gives the same vector like the user. So we can then compute a dot product with this projected new item with user to determine its similarity. It turns out that this is equivalent to map the user to the item space and compute a dot product there. In other words, multiply the matrix is equivalent to mapping between item space and user space. Now lets imagine there is a hidden concept space in between. Instead of jumping directly from user space to item space, we can think of jumping from user space to a concept space, and then to the item space. Notice that here we first map the user space to the concept space and also map the item space to the concept space. Then we match both user and item at the concept space. This is a generalization of our recommender. We can use SVD to factor the matrix into 2 parts. Let P be the m by n matrix (m rows and n columns). P = UDV where U is an m by m matrix, each column represents the eigenvectors of P*transpose(P). And V is an n by n matrix with each row represents the eigenvector of transpose(P)*P. D is a diagonal matrix containing eigenvalues of P*transpose(P), or transpose(P)*P. In other words, we can decompose P into U*squareroot(D) and squareroot(D)*V. Notice that D can be thought as the strength of each "concept" in the concept space. And the value is order in terms of their magnitude in decreasing order. If we remove some of the weakest concept by making them zero, we reduce the number of non-zero elements in D, which effective generalize the concept space (make them focus in the important concepts). Calculate SVD decomposition for matrix with large dimensions is expensive. Fortunately, if our goal is to compute an SVD approximation (with k diagonal non-zero value), we can use the random projection mechanism as describer here. Association Rule Based In this model, we use the market/basket association rule algorithm to discover rule like ... {item1, item2} => {item3, item4, item5} We represent each user as a basket and each viewing as an item (notice that we ignore the rating and use a binary value). After that we use association rule mining algorithm to detect frequent item set and the association rules. Then for each user, we match the user's previous viewing items to the set of rules to determine what other movies should we recommend. Evaluate the recommender After we have a recommender, how do we evaluate the performance of it ? The basic idea is to use separate the data into the training set and the test set. For the test set, we remove certain user-to-movies interaction (change certain cells from 1 to 0) and pretending the user hasn't seen the item. Then we use the training set to train a recommender and then fit the test set (with removed interaction) to the recommender. The performance is measured by how much overlap between the recommended items with the one that we have removed. In other words, a good recommender should be able to recover the set of items that we have removed from the test set. Leverage tagging information on items In some cases, items has explicit tags associated with them (we can considered the tags is a user-annotated concept space added to the items). Consider each item is described with a vector of tags. Now user can also be auto-tagged based on the items they have interacted. For example, if userX purchase itemY which is tagged with Z1, and Z2. Then user will increase her tag Z1 and Z2 in her existing tag vector. We can use a time decay mechanism to update the user's tag vector as follows ... current_user_tag = alpha * item_tag + (1 - alpha) * prev_user_tag To recommend an item to the user, we simply need to calculate the top k items by computing the dot product (ie: cosine distance) of the user tag vector and the item tag vector. Source: http://horicky.blogspot.com/2011/09/recommendation-engine.html

November 2, 2011

by Ricky Ho

· 26,874 Views · 2 Likes

Avoid Lazy JPA Collections

Hibernate (and actually JPA) has collection mappings: @OneToMany, @ManyToMany, @ElementCollection. All of these are by default lazy. This means the collections are specific implementations of the List or Set interface that hold a reference to the persistent session and the values are loaded from the database only if the collection is accessed. That saves unnecessary database queries if you only occasionally use the collection. However, there’s a problem with that. The problem that manifests itself through the exception that in my observations is 2nd most commonly asked exception (after NullPointerException) – the LazyInitializationException. The problem is that the session is usually open for your service layer and is closed as soon as you return the entity to the view layer. And when you try to iterate the uninitialized collection in your view (jsp for example), the collection throws LazyInitializationException, because the session that they hold a reference to is already closed and they can’t fetch the items. How is this solved? The so called OpenSessionInView / OpenEntityManagerInView “patterns”. In short: you make a filter that opens the session when the request starts and closes it after the view has been rendered (and not after the service layer finishes). Some people call that an anti-pattern, because it leaks persistence handling into the view layer, and complicates the setup. I wouldn’t say it’s that bad: generally it solves the problem without introducing other problems. But in all recent project I’ve been involved, we aren’t using OpenSessionInView, and it works fine. It works fine because we aren’t using lazy collections. But then, you’ll rightly point, you will be fetching “the whole world” when you load a single entity. Well, no. There are two types of *ToMany mappings: value-type mappings where the collection logically does not hold more than a dozen elements. This is in most cases @ElementCollection, and also @*ToMany with items like “Category” or “Price” that are just more complex value objects, but that do not hold any other mappings themselves. Another common feature of these types of collections is that they are usually displayed in the UI together with their owning entity. It is most likely that you want to display the categories of an article, for example. For this type of collections EAGER is the better option. You’ll have to fetch them anyway, why not let hibernate (or any jpa implementation) think of some clever join? As I said – the collections are logically not bigger than a dozen or two, so fetching them won’t be a performance hit. And, logically, they won’t fetch a big object graph with them. mappings across the big, core entities. This can be “all orders made by the user” or “all users for the organization”, “all items of the supplier”, etc. You certainly don’t want to fetch them eagerly. Because if you fetch 2000 users for an organization, which in turn have 1000 orders each, and an order has 3 items on average which in turn have a collection of all people who have purchased it.. you’ll end up with your entire database in memory. Obviously you need lazy collections, right? Well, no. In that case you should not be using collection mappings at all. These types of relations are, in 99% of the cases, displayed in paged lists in the UI. Or in search results. They are never (and should never) be displayed all on one screen (or should rarely be returned in one API call, if your application provides something like a REST API). You have to make queries for them, and use query.setMaxResults and query.setFirstResult() (or limit them with some restrictive criteria). Furthermore having the collections mapped means someone will try to use them at some point, which may fail. And if the object is serialized (xml, json, etc.) the collection contents will be fetched. Something you almost certainly don’t want to happen. (A draft idea here: JPA could have a PagedList collection that would allow paged lazy fetching, thus eliminating the need for a query) So what did I just say – that you should never use lazy collections. Use eager collections for very simple, shallow mappings, and use paged queries for the bigger ones. Well, not exactly. Lazy collections are there and they have application, though it is rather limited. Or at least they are way less applicable than they are used. Here’s an example scenario where I found it applicable. In my side-project I have a Message entity, and it holds a collection of Picture entities. When a user uploads a picture, it is stored in that collection. A message can have no more than 10 pictures, so the collection could very well be eager. But then, Message is the most commonly used entity – it’s fetched virtually on every request. But only some messages have pictures (how many of the tweets on your stream have a a picture upload?). So I don’t want hibernate to make queries just to find out there are no pictures for a given message. Hence I store the number of pictures in a separate field, make the pictures collection lazy, and Hibernate.initialize(..) it manually only if the number of pictures is > 0. So there are scenarios, when the entity has optional collections that fall into the first category above (“small, shallow collections”). So if it is small, shallow and optional (say, used in less than 20% of the cases), then you should go with Lazy to save unnecessary queries. For everything else – having lazy collections will make your life harder. From http://techblog.bozho.net/?p=645

October 28, 2011

by Bozhidar Bozhanov

· 21,478 Views · 1 Like

Magic: The Gathering in JavaScript and HTML5

as a user interface fan, i could not miss the development with html 5. so the goal of this post is to walk through a graphic application that uses javascript and html 5. we will see through examples one way (among others) to develop this kind of project. application overview tools the html 5 page data gathering cards loading & cache handling cards display mouse management state storage animations handling multi-devices conclusion to go further application overview we will produce an application that will let us display a magic the gathering ©(courtesy of www.wizards.com/magic ) cards collection. users will be able to scroll and zoom using the mouse (like bing maps, for example). you can see the final result here: http://bolaslenses.catuhe.com the project source files can be downloaded here: http://www.catuhe.com/msdn/bolaslenses.zip cards are stored on windows azure storage and use the azure content distribution network ( cdn : a service that deploys data near the final users) in order to achieve maximum performances. an asp.net service is used to return cards list (using json format). tools to write our application, we will use visual studio 2010 sp1 with web standards update . this extension adds intellisense support in html 5 page (which is a really important thing ). so, our solution will contain an html 5 page side by side with .js files (these files will contain javascript scripts). about debug, it is possible to set a breakpoint directly in the .js files under visual studio. it is also possible to use the developer bar of internet explorer 9 (use f12 key to display it). debug with visual studio 2010 debug with internet explorer 9 (f12/developer bar) so, we have a modern developer environment with intellisense and debug support. therefore, we are ready to start and first of all, we will write the html 5 page. the html 5 page our page will be built around an html 5 canvas which will be used to draw the cards: cards scanned by mwshq team magic the gathering official site : http://www.wizards.com/magic bolas lenses your browser does not support html5 canvas. loading data... if we dissect this page, we can note that it is divided into two parts: the header part with the title, the logo and the special mentions the main part (section) holds the canvas and the tooltips that will display the status of the application. there is also a hidden image ( backimage ) used as source for not yet loaded cards. to build the layout of the page, a style sheet ( full.css ) is applied. style sheets are a mechanism used to change the tags styles (in html, a style defines the entire display options for a tag): html, body { height: 100%; } body { background-color: #888888; font-size: .85em; font-family: "segoe ui, trebuchet ms" , verdana, helvetica, sans-serif; margin: 0; padding: 0; color: #696969; } a:link { color: #034af3; text-decoration: underline; } a:visited { color: #505abc; } a:hover { color: #1d60ff; text-decoration: none; } a:active { color: #12eb87; } header, footer, nav, section { display: block; } table { width: 100%; } header, #header { position: relative; margin-bottom: 0px; color: #000; padding: 0; } #title { font-weight: bold; color: #fff; border: none; font-size: 60px !important; vertical-align: middle; margin-left: 70px } #legal { text-align: right; color: white; font-size: 14px; width: 50%; position: absolute; top: 15px; right: 10px } #leftheader { width: 50%; vertical-align: middle; } section { margin: 20px 20px 20px 20px; } #maincanvas{ border: 4px solid #000000; } #cardscount { font-weight: bolder; font-size: 1.1em; } .tooltip { position: absolute; bottom: 5px; color: black; background-color: white; margin-right: auto; margin-left: auto; left: 35%; right: 35%; padding: 5px; width: 30%; text-align: center; border-radius: 10px; -webkit-border-radius: 10px; -moz-border-radius: 10px; box-shadow: 2px 2px 2px #333333; } #bolaslogo { width: 64px; height: 64px; } #picturecell { float: left; width: 64px; margin: 5px 5px 5px 5px; vertical-align: middle; } thus, this sheet is responsible for setting up the following display: style sheets are powerful tools that allow an infinite number of displays. however, they are sometimes complicated to setup (for example if a tag is affected by a class, an identifier and its container). to simplify this setup, the development bar of internet explorer 9 is particularly useful because we can use it to see styles hierarchy that is applied to a tag. for example let’s take a look at the waittext tooltip with the development bar. to do this, you must press f12 in internet explorer and use the selector to choose the tooltip: once the selection is done, we can see the styles hierarchy: thus, we can see that our div received its styles from the body tag and the . tooltip entry of the style sheet. with this tool, it becomes possible to see the effect of each style (which can be disabled). it is also possible to add new style on the fly. another important point of this window is the ability to change the rendering mode of internet explorer 9. indeed, we can test how, for example, internet explorer 8 will handle the same page. to do this, go to the [ browser mode ] menu and select the engine of internet explorer 8. this change will especially impact our tooltip as it uses border-radius (rounded edge) and box-shadow that are features of css 3: internet explorer 9 internet explorer 8 our page provides a graceful degradation as it still works (with no annoying visual difference) when the browser does not support all the required technologies. now that our interface is ready, we will take a look at the data source to retrieve the cards to display. the server provides the cards list using json format on this url: http://bolaslenses.catuhe.com/ home/listofcards/?colorstring=0 it takes one parameter ( colorstring ) to select a specific color (0 = all). when developing with javascript, there is a good reflex to have (reflex also good in other languages too, but really important in javascript): one must ask whether what we want to develop has not been already done in an existing framework. indeed, there is a multitude of open source projects around javascript. one of them is jquery which provides a plethora of convenient services. thus, in our case to connect to the url of our server and get the cards list, we could go through a xmlhttprequest and have fun to parse the returned json. or we can use jquery . so we will use the getjson function which will take care of everything for us: function getlistofcards() { var url = "http://bolaslenses.catuhe.com/home/listofcards/?jsoncallback=?"; $.getjson(url, { colorstring: "0" }, function (data) { listofcards = data; $("#cardscount").text(listofcards.length + " cards displayed"); $("#waittext").slidetoggle("fast"); }); } as we can see, our function stores the cards list in the listofcards variable and calls two jquery functions: text that change the text of a tag slidetoggle that hides (or shows) a tag by animating its height the listofcards list contains objects whose format is: id : unique identifier of the card path : relative path of the card (without the extension) it should be noted that the url of the server is called with the “ ?jsoncallback=? ” suffix. indeed, ajax calls are constrained in terms of security to connect only to the same address as the calling script. however, there is a solution called jsonp that will allow us to make a concerted call to the server (which of course must be aware of the operation). and fortunately, jquery can handle it all alone by just adding the right suffix. once we have our cards list, we can set up the pictures loading and caching. cards loading & cache handling the main trick of our application is to draw only the cards effectively visible on the screen. the display window is defined by a zoom level and an offset (x, y) in the overall system. var visucontrol = { zoom : 0.25, offsetx : 0, offsety : 0 }; the overall system is defined by 14819 cards that are spread over 200 columns and 75 rows. also, we must be aware that each card is available in three versions: high definition: 480x680 without compression (.jpg suffix) medium definition: 240x340 with standard compression (.50.jpg suffix) low definition: 120x170 with strong compression (.25.jpg suffix) thus, depending on the zoom level, we will load the correct version to optimize networks transfer. to do this we will develop a function that will give an image for a given card. this function will be configured to download a certain level of quality. in addition it will be linked with lower quality level to return it if the card for the current level is not yet uploaded: function imagecache(substr, replacementcache) { var extension = substr; var backimage = document.getelementbyid("backimage"); this.load = function (card) { var localcache = this; if (this[card.id] != undefined) return; var img = new image(); localcache[card.id] = { image: img, isloaded: false }; currentdownloads++; img.onload = function () { localcache[card.id].isloaded = true; currentdownloads--; }; img.onerror = function() { currentdownloads--; }; img.src = "http://az30809.vo.msecnd.net/" + card.path + extension; }; this.getreplacementfromlowercache = function (card) { if (replacementcache == undefined) return backimage; return replacementcache.getimageforcard(card); }; this.getimageforcard = function(card) { var img; if (this[card.id] == undefined) { this.load(card); img = this.getreplacementfromlowercache(card); } else { if (this[card.id].isloaded) img = this[card.id].image; else img = this.getreplacementfromlowercache(card); } return img; }; } an imagecache is built by giving the associated suffix and the underlying cache. here you can see two important functions: load : this function will load the right picture and will store it in a cache (the msecnd.net url is the azure cdn address of the cards) getimageforcard : this function returns the card picture from the cache if already loaded. otherwise it requests the underlying cache to return its version (and so on) so to handle our 3 levels of caches, we have to declare three variables: var imagescache25 = new imagecache(".25.jpg"); var imagescache50 = new imagecache(".50.jpg", imagescache25); var imagescachefull = new imagecache(".jpg", imagescache50); selecting the right cover is only depending on zoom: function getcorrectimagecache() { if (visucontrol.zoom <= 0.25) return imagescache25; if (visucontrol.zoom <= 0.8) return imagescache50; return imagescachefull; } to give a feedback to the user, we will add a timer that will manage a tooltip that indicates the number of images currently loaded: function updatestats() { var stats = $("#stats"); stats.html(currentdownloads + " card(s) currently downloaded."); if (currentdownloads == 0 && statsvisible) { statsvisible = false; stats.slidetoggle("fast"); } else if (currentdownloads > 1 && !statsvisible) { statsvisible = true; stats.slidetoggle("fast"); } } setinterval(updatestats, 200); again we note the use of jquery to simplify animations. we will now discuss the display of cards. cards display to draw our cards, we need to actually fill the canvas using its 2d context (which exists only if the browser supports html 5 canvas): var maincanvas = document.getelementbyid("maincanvas"); var drawingcontext = maincanvas.getcontext('2d'); the drawing will be made by processlistofcards function (called 60 times per second): function processlistofcards() { if (listofcards == undefined) { drawwaitmessage(); return; } maincanvas.width = document.getelementbyid("center").clientwidth; maincanvas.height = document.getelementbyid("center").clientheight; totalcards = listofcards.length; var localcardwidth = cardwidth * visucontrol.zoom; var localcardheight = cardheight * visucontrol.zoom; var effectivetotalcardsinwidth = colscount * localcardwidth; var rowscount = math.ceil(totalcards / colscount); var effectivetotalcardsinheight = rowscount * localcardheight; initialx = (maincanvas.width - effectivetotalcardsinwidth) / 2.0 - localcardwidth / 2.0; initialy = (maincanvas.height - effectivetotalcardsinheight) / 2.0 - localcardheight / 2.0; // clear clearcanvas(); // computing of the viewing area var initialoffsetx = initialx + visucontrol.offsetx * visucontrol.zoom; var initialoffsety = initialy + visucontrol.offsety * visucontrol.zoom; var startx = math.max(math.floor(-initialoffsetx / localcardwidth) - 1, 0); var starty = math.max(math.floor(-initialoffsety / localcardheight) - 1, 0); var endx = math.min(startx + math.floor((maincanvas.width - initialoffsetx - startx * localcardwidth) / localcardwidth) + 1, colscount); var endy = math.min(starty + math.floor((maincanvas.height - initialoffsety - starty * localcardheight) / localcardheight) + 1, rowscount); // getting current cache var imagecache = getcorrectimagecache(); // render for (var y = starty; y < endy; y++) { for (var x = startx; x < endx; x++) { var localx = x * localcardwidth + initialoffsetx; var localy = y * localcardheight + initialoffsety; // clip if (localx > maincanvas.width) continue; if (localy > maincanvas.height) continue; if (localx + localcardwidth < 0) continue; if (localy + localcardheight < 0) continue; var card = listofcards[x + y * colscount]; if (card == undefined) continue; // get from cache var img = imagecache.getimageforcard(card); // render try { if (img != undefined) drawingcontext.drawimage(img, localx, localy, localcardwidth, localcardheight); } catch (e) { $.grep(listofcards, function (item) { return item.image != img; }); } } }; // scroll bars drawscrollbars(effectivetotalcardsinwidth, effectivetotalcardsinheight, initialoffsetx, initialoffsety); // fps computefps(); } this function is built around many key points: if the cards list is not yet loaded, we display a tooltip indicating that download is in progress: var pointcount = 0; function drawwaitmessage() { pointcount++; if (pointcount > 200) pointcount = 0; var points = ""; for (var index = 0; index < pointcount / 10; index++) points += "."; $("#waittext").html("loading...please wait" + points); subsequently, we define the position of the display window (in terms of cards and coordinates), then we proceed to clean the canvas: function clearcanvas() { maincanvas.width = document.body.clientwidth - 50; maincanvas.height = document.body.clientheight - 140; drawingcontext.fillstyle = "rgb(0, 0, 0)"; drawingcontext.fillrect(0, 0, maincanvas.width, maincanvas.height); } then we browse the cards list and call the drawimage function of the canvas context. the current image is provided by the active cache (depending on the zoom): // get from cache var img = imagecache.getimageforcard(card); // render try { if (img != undefined) drawingcontext.drawimage(img, localx, localy, localcardwidth, localcardheight); } catch (e) { $.grep(listofcards, function (item) { return item.image != img; }); we also have to draw the scroll bar with the roundedrectangle function that uses quadratic curves: function roundedrectangle(x, y, width, height, radius) { drawingcontext.beginpath(); drawingcontext.moveto(x + radius, y); drawingcontext.lineto(x + width - radius, y); drawingcontext.quadraticcurveto(x + width, y, x + width, y + radius); drawingcontext.lineto(x + width, y + height - radius); drawingcontext.quadraticcurveto(x + width, y + height, x + width - radius, y + height); drawingcontext.lineto(x + radius, y + height); drawingcontext.quadraticcurveto(x, y + height, x, y + height - radius); drawingcontext.lineto(x, y + radius); drawingcontext.quadraticcurveto(x, y, x + radius, y); drawingcontext.closepath(); drawingcontext.stroke(); drawingcontext.fill(); } function drawscrollbars(effectivetotalcardsinwidth, effectivetotalcardsinheight, initialoffsetx, initialoffsety) { drawingcontext.fillstyle = "rgba(255, 255, 255, 0.6)"; drawingcontext.linewidth = 2; // vertical var totalscrollheight = effectivetotalcardsinheight + maincanvas.height; var scaleheight = maincanvas.height - 20; var scrollheight = maincanvas.height / totalscrollheight; var scrollstarty = (-initialoffsety + maincanvas.height * 0.5) / totalscrollheight; roundedrectangle(maincanvas.width - 8, scrollstarty * scaleheight + 10, 5, scrollheight * scaleheight, 4); // horizontal var totalscrollwidth = effectivetotalcardsinwidth + maincanvas.width; var scalewidth = maincanvas.width - 20; var scrollwidth = maincanvas.width / totalscrollwidth; var scrollstartx = (-initialoffsetx + maincanvas.width * 0.5) / totalscrollwidth; roundedrectangle(scrollstartx * scalewidth + 10, maincanvas.height - 8, scrollwidth * scalewidth, 5, 4); } and finally, we need to compute the number of frames per second: function computefps() { if (previous.length > 60) { previous.splice(0, 1); } var start = (new date).gettime(); previous.push(start); var sum = 0; for (var id = 0; id < previous.length - 1; id++) { sum += previous[id + 1] - previous[id]; } var diff = 1000.0 / (sum / previous.length); $("#cardscount").text(diff.tofixed() + " fps. " + listofcards.length + " cards displayed"); } drawing cards relies heavily on the browser's ability to speed up canvas rendering. for the record, here are the performances on my machine with the minimum zoom level (0.05): browser fps internet explorer 9 30 firefox 5 30 chrome 12 17 ipad (with a zoom level of 0.8) 7 windows phone mango (with a zoom level of 0.8) 20 (!!) the site even works on mobile phones and tablets as long as they support html 5. here we can see the inner power of html 5 browsers that can handle a full screen of cards more than 30 times per second! mouse management to browse our cards collection, we have to manage the mouse (including its wheel). for the scrolling, we'll just handle the onmouvemove , onmouseup and onmousedown events. onmouseup and onmousedown events will be used to detect if the mouse is clicked or not: var mousedown = 0; document.body.onmousedown = function (e) { mousedown = 1; getmouseposition(e); previousx = posx; previousy = posy; }; document.body.onmouseup = function () { mousedown = 0; }; the onmousemove event is connected to the canvas and used to move the view: var previousx = 0; var previousy = 0; var posx = 0; var posy = 0; function getmouseposition(eventargs) { var e; if (!eventargs) e = window.event; else { e = eventargs; } if (e.offsetx || e.offsety) { posx = e.offsetx; posy = e.offsety; } else if (e.clientx || e.clienty) { posx = e.clientx; posy = e.clienty; } } function onmousemove(e) { if (!mousedown) return; getmouseposition(e); mousemovefunc(posx, posy, previousx, previousy); previousx = posx; previousy = posy; } this function (onmousemove) calculates the current position and provides also the previous value in order to move the offset of the display window: function move(posx, posy, previousx, previousy) { currentaddx = (posx - previousx) / visucontrol.zoom; currentaddy = (posy - previousy) / visucontrol.zoom; } mousehelper.registermousemove(maincanvas, move); note that jquery also provides tools to manage mouse events. for the management of the wheel, we will have to adapt to different browsers that do not behave the same way on this point: function wheel(event) { var delta = 0; if (event.wheeldelta) { delta = event.wheeldelta / 120; if (window.opera) delta = -delta; } else if (event.detail) { /** mozilla case. */ delta = -event.detail / 3; } if (delta) { wheelfunc(delta); } if (event.preventdefault) event.preventdefault(); event.returnvalue = false; } we can see that everyone does what he wants :). the function to register with this event is: mousehelper.registerwheel = function (func) { wheelfunc = func; if (window.addeventlistener) window.addeventlistener('dommousescroll', wheel, false); window.onmousewheel = document.onmousewheel = wheel; }; and we will use this function to change the zoom with the wheel: // mouse mousehelper.registerwheel(function (delta) { currentaddzoom += delta / 500.0; }); finally we will add a bit of inertia when moving the mouse (and the zoom) to give some kind of smoothness: // inertia var inertia = 0.92; var currentaddx = 0; var currentaddy = 0; var currentaddzoom = 0; function doinertia() { visucontrol.offsetx += currentaddx; visucontrol.offsety += currentaddy; visucontrol.zoom += currentaddzoom; var effectivetotalcardsinwidth = colscount * cardwidth; var rowscount = math.ceil(totalcards / colscount); var effectivetotalcardsinheight = rowscount * cardheight var maxoffsetx = effectivetotalcardsinwidth / 2.0; var maxoffsety = effectivetotalcardsinheight / 2.0; if (visucontrol.offsetx < -maxoffsetx + cardwidth) visucontrol.offsetx = -maxoffsetx + cardwidth; else if (visucontrol.offsetx > maxoffsetx) visucontrol.offsetx = maxoffsetx; if (visucontrol.offsety < -maxoffsety + cardheight) visucontrol.offsety = -maxoffsety + cardheight; else if (visucontrol.offsety > maxoffsety) visucontrol.offsety = maxoffsety; if (visucontrol.zoom < 0.05) visucontrol.zoom = 0.05; else if (visucontrol.zoom > 1) visucontrol.zoom = 1; processlistofcards(); currentaddx *= inertia; currentaddy *= inertia; currentaddzoom *= inertia; // epsilon if (math.abs(currentaddx) < 0.001) currentaddx = 0; if (math.abs(currentaddy) < 0.001) currentaddy = 0; } this kind of small function does not cost a lot to implement, but adds a lot to the quality of user experience. state storage also to provide a better user experience, we will save the display window’s position and zoom. to do this, we will use the service of localstorage (which saves pairs of keys / values for the long term (the data is retained after the browser is closed) and only accessible by the current window object): function saveconfig() { if (window.localstorage == undefined) return; // zoom window.localstorage["zoom"] = visucontrol.zoom; // offsets window.localstorage["offsetx"] = visucontrol.offsetx; window.localstorage["offsety"] = visucontrol.offsety; } // restore data if (window.localstorage != undefined) { var storedzoom = window.localstorage["zoom"]; if (storedzoom != undefined) visucontrol.zoom = parsefloat(storedzoom); var storedoffsetx = window.localstorage["offsetx"]; if (storedoffsetx != undefined) visucontrol.offsetx = parsefloat(storedoffsetx); var storedoffsety = window.localstorage["offsety"]; if (storedoffsety != undefined) visucontrol.offsety = parsefloat(storedoffsety); } animations to add even more dynamism to our application we will allow our users to double-click on a card to zoom and focus on it. our system should animate three values: the two offsets (x, y) and the zoom. to do this, we will use a function that will be responsible of animating a variable from a source value to a destination value with a given duration: var animationhelper = function (root, name) { var paramname = name; this.animate = function (current, to, duration) { var offset = (to - current); var ticks = math.floor(duration / 16); var offsetpart = offset / ticks; var tickscount = 0; var intervalid = setinterval(function () { current += offsetpart; root[paramname] = current; tickscount++; if (tickscount == ticks) { clearinterval(intervalid); root[paramname] = to; } }, 16); }; }; the use of this function is: // prepare animations parameters var zoomanimationhelper = new animationhelper(visucontrol, "zoom"); var offsetxanimationhelper = new animationhelper(visucontrol, "offsetx"); var offsetyanimationhelper = new animationhelper(visucontrol, "offsety"); var speed = 1.1 - visucontrol.zoom; zoomanimationhelper.animate(visucontrol.zoom, 1.0, 1000 * speed); offsetxanimationhelper.animate(visucontrol.offsetx, targetoffsetx, 1000 * speed); offsetyanimationhelper.animate(visucontrol.offsety, targetoffsety, 1000 * speed); the advantage of the animationhelper function is that it is able to animate as many parameters as you wish (and that only with the settimer function!) handling multi-devices finally we will ensure that our page can also be seen on tablets pc and even on phones. to do this, we will use a feature of css 3: the media-queries . with this technology, we can apply style sheets according to some queries such as a specific display size: here we see that if the screen width is less than 480 pixels, the following style sheet will be added: #legal { font-size: 8px; } #title { font-size: 30px !important; } #waittext { font-size: 12px; } #bolaslogo { width: 48px; height: 48px; } #picturecell { width: 48px; } finally we will ensure that our page can also be seen on tablets pc and even on phones. to do this, we will use a feature of css 3: #legal { font-size: 8px; } #title { font-size: 30px !important; } #waittext { font-size: 12px; } #bolaslogo { width: 48px; height: 48px; } #picturecell { width: 48px; } conclusion html 5 / css 3 / javascript and visual studio 2010 allow to develop portable and efficient solutions (within the limits of browsers that support html 5 of course) with some great features such as hardware accelerated rendering. this kind of development is also simplified by the use of frameworks like jquery. also, i am especially fan of javascript that turns out to be a very powerful dynamic language. of course, c# or vb.net developers have to change theirs reflexes but for the development of web pages it's worth. in conclusion, i think that the best to be convinced is to try! to go further internet explorer test drive: http://ie.microsoft.com/testdrive/ internet explorer 9 guide for developer : http://msdn.microsoft.com/en-us/ie/ff468705 w3c site for html 5 : http://dev.w3.org/html5/spec/overview.html internet explorer site : http://msdn.microsoft.com/en-us/ie/aa740469 about the author david catuhe is a developer evangelist for microsoft france in charge of user experience development tools (from xaml to directx/xna and html5). he defines himself as a geek and likes coding all that refer to graphics. before working for microsoft, he founded a company that developed a realtime 3d engine written with directx ( www.vertice.fr ). source: http://blogs.msdn.com/b/eternalcoding/archive/2011/07/25/feedback-of-a-graphic-development-using-html5-amp-javascript.aspx

October 24, 2011

by David Catuhe

· 25,200 Views · 1 Like

RDF data in Neo4J - the Tinkerpop story

My previous blog post discussed the use of Neo4J as a RDF triple store. Michael Hunger however informed me that the neo-rdf-sail component is no longer under active development and advised me to have a look at Tinkerpop’s Sail implementation. As mentioned in my previous blog post, I recently got asked to implement a storage and querying platform for biological RDF (Resource Description Framework) data. Traditional RDF stores are not really an option as my solution should also provide the ability to calculate shortest paths between random subjects. Calculating shortest path is however one of the strong selling points of Graph Databases and more specifically Neo4J. Unfortunately, the neo-rdf-sail component, which suits my requirements perfectly, is no longer under active development. Tinkerpop’s Sail implementation however, fills the void with an even better alternative! 1. What is Tinkerpop? Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the Blueprints framework. Blueprints can be considered as the JDBC of Graph Databases. By providing a collection of generic interfaces, it allows to develop graph-based applications, without introducing explicit dependencies on concrete Graph Database implementations. Additionally, Blueprints provides concrete bindings for the Neo4J, OrientDB and Dex Graph Databases. On top of Blueprints, the Tinkerpop team developed an entire range of graph technologies, including Gremlin, a powerful, domain-specific language designed for traversing graphs. Hence, once a Blueprints binding is available for a particular Graph Database, an entire range of technologies can be leveraged. 2. Tinkerpop and Sail Last time, I talked about exposing a Neo4J Graph Database (containing RDF triples) through the Sail interface, which is part of the openrdf.org project. By doing so, we can reuse an entire range of RDF utilities (parsers and query evaluators) that are part of the openrdf.org project. The Blueprints framework provides us with a similar ability: each Graph Database binding that implements the Tinkerpop TransactionalGraph and IndexableGraph interfaces can be exposed as a GraphSail, which is Tinkerpop’s implementation of the Sail interface. Once you have your Sail available, storing and querying RDF is analogous to the piece of code shown in my previous blog article. // Create the sail graph database graph = new MyNeo4jGraph("var/flights", 100000); graph.setTransactionMode(TransactionalGraph.Mode.MANUAL); sail = new GraphSail(graph); // Initialize the sail store sail.initialize(); // Get the sail repository connection connection = new SailRepository(sail).getConnection(); // Import the data connection.add(getResource("sneeair.rdf"), null, RDFFormat.RDFXML); // Execute SPARQL query TupleQuery durationquery = connection.prepareTupleQuery(QueryLanguage.SPARQL, "PREFIX io: " + "PREFIX fl: " + "SELECT ?number ?departure ?destination " + "WHERE { " + "?flight io:flight ?number . " + "?flight fl:flightFromCityName ?departure . " + "?flight fl:flightToCityName ?destination . " + "?flight io:duration \"1:35\" . " + "}"); TupleQueryResult result = durationquery.evaluate(); The two first lines of code require some more clarification. A TransactionalGraph can be run in MANUAL or AUTOMATIC transaction mode. In AUTOMATIC mode, transactions are basically ignored, in the sense that each item that gets created is immediately persisted in the underlying Graph Database. Although this fits my needs, AUTOMATIC mode is extremely slow in case of Neo4J because of the continuous IO access. MANUAL mode on the other hand is very fast; a new transaction is created at the moment the import of the RDF data file starts and is only committed to the Neo4J data store once all RDF triples are parsed and created. Unfortunately, MANUAL mode does not scale either in my specific situation; as some of my RDF data files contain over 50 million RDF triples, they can not fit into memory (i.e. Java heap space error). Requiring fast imports, I extended the default Neo4J Blueprints binding to support intermediate commits. I based my implementation on Neo4J’s best practices for big transactions. The idea is rather simple: you specify the maximum number of items that can be kept in memory, before they should be committed to the Neo4J data store. Once this number is reached, the current transaction is committed and a new one is automatically started. Simple, but very effective! public class MyNeo4jGraph extends Neo4jGraph { private long numberOfItems = 0; private long maxNumberOfItems = 1; public MyNeo4jGraph(final String directory, long maxNumberOfItems) { super(directory, null); this.maxNumberOfItems = maxNumberOfItems; } public MyNeo4jGraph(final String directory, final Map configuration, long maxNumberOfItems) { super(directory, configuration); this.maxNumberOfItems = maxNumberOfItems; } public Vertex addVertex(final Object id) { Vertex vertex = super.addVertex(id); commitIfRequired(); return vertex; } public Edge addEdge(final Object id, final Vertex outVertex, final Vertex inVertex, final String label) { Edge edge = super.addEdge(id, outVertex, inVertex, label); commitIfRequired(); return edge; } private void commitIfRequired() { // Check whether commit should be executed if (++numberOfItems % maxNumberOfItems == 0) { // Stop the transaction stopTransaction(Conclusion.SUCCESS); // Immediately start a new one startTransaction(); } } } 3. Shortest path calculation Although Blueprints allows you to abstract away the Neo4J implementation details, it still provides you with access to the raw Neo4J data store if needed. Hence, one can still use the graph algorithms provided in the neo4j-graph-algo component to calculate shortest paths between random subjects. The complete source code can be found on the Datablend public GitHub repository.

October 24, 2011

by Davy Suvee

· 25,364 Views

How to Load or Save Image using Hibernate – MySQL

This tutorial will walk you throughout how to save and load an image from database (MySQL) using Hibernate. Requirements For this sampel project, we are going to use: Eclipse IDE (you can use your favorite IDE); MySQL (you can use any other database, make sure to change the column type if required); Hibernate jars and dependencies (you can download the sample project with all required jars); JUnit - for testing (jar also included in the sample project). PrintScreen When we finish implementing this sample projeto, it should look like this: Database Model Before we get started with the sample projet, we have to run this sql script into MySQL: DROP SCHEMA IF EXISTS `blog` ; CREATE SCHEMA IF NOT EXISTS `blog` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci ; USE `blog` ; -- ----------------------------------------------------- -- Table `blog`.`BOOK` -- ----------------------------------------------------- DROP TABLE IF EXISTS `blog`.`BOOK` ; CREATE TABLE IF NOT EXISTS `blog`.`BOOK` ( `BOOK_ID` INT NOT NULL AUTO_INCREMENT , `BOOK_NAME` VARCHAR(45) NOT NULL , `BOOK_IMAGE` MEDIUMBLOB NOT NULL , PRIMARY KEY (`BOOK_ID`) ) ENGINE = InnoDB; This script will create a table BOOK, which we are going to use in this tutorial. Book POJO We are going to use a simple POJO in this project. A Book has an ID, a name and an image, which is represented by an array of bytes. As we are going to persist an image into the database, we have to use the BLOB type. MySQLhas some variations of BLOBs, you can check the difference between them here. In this example, we are going to use the Medium Blob, which can store L + 3 bytes, where L < 2^24. Make sure you do not forget to add the column definition on the Column annotation. package com.loiane.model; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.Id; import javax.persistence.Lob; import javax.persistence.Table; @Entity @Table(name="BOOK") public class Book { @Id @GeneratedValue @Column(name="BOOK_ID") private long id; @Column(name="BOOK_NAME", nullable=false) private String name; @Lob @Column(name="BOOK_IMAGE", nullable=false, columnDefinition="mediumblob") private byte[] image; public long getId() { return id; } public void setId(long id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public byte[] getImage() { return image; } public void setImage(byte[] image) { this.image = image; } } Hibernate Config This configuration file contains the required info used to connect to the database. com.mysql.jdbc.Driver jdbc:mysql://localhost/blog root root org.hibernate.dialect.MySQLDialect 1 true Hibernate Util The HibernateUtil class helps in creating the SessionFactory from the Hibernate configuration file. package com.loiane.hibernate; import org.hibernate.SessionFactory; import org.hibernate.cfg.AnnotationConfiguration; import com.loiane.model.Book; public class HibernateUtil { private static final SessionFactory sessionFactory; static { try { sessionFactory = new AnnotationConfiguration() .configure() .addPackage("com.loiane.model") //the fully qualified package name .addAnnotatedClass(Book.class) .buildSessionFactory(); } catch (Throwable ex) { System.err.println("Initial SessionFactory creation failed." + ex); throw new ExceptionInInitializerError(ex); } } public static SessionFactory getSessionFactory() { return sessionFactory; } } DAO In this class, we created two methods: one to save a Book instance into the database and another one to load a Book instance from the database. package com.loiane.dao; import org.hibernate.HibernateException; import org.hibernate.Session; import org.hibernate.Transaction; import com.loiane.hibernate.HibernateUtil; import com.loiane.model.Book; public class BookDAOImpl { /** * Inserts a row in the BOOK table. * Do not need to pass the id, it will be generated. * @param book * @return an instance of the object Book */ public Book saveBook(Book book) { Session session = HibernateUtil.getSessionFactory().openSession(); Transaction transaction = null; try { transaction = session.beginTransaction(); session.save(book); transaction.commit(); } catch (HibernateException e) { transaction.rollback(); e.printStackTrace(); } finally { session.close(); } return book; } /** * Delete a book from database * @param bookId id of the book to be retrieved */ public Book getBook(Long bookId) { Session session = HibernateUtil.getSessionFactory().openSession(); try { Book book = (Book) session.get(Book.class, bookId); return book; } catch (HibernateException e) { e.printStackTrace(); } finally { session.close(); } return null; } } Test To test it, first we need to create a Book instance and set an image to the image attribute. To do so, we need to load an image from the hard drive, and we are going to use the one located in the images folder. Then we can call the DAO class and save into the database. Then we can try to load the image. Just to make sure it is the same image we loaded, we are going to save it in the hard drive. package com.loiane.test; import static org.junit.Assert.assertNotNull; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import org.junit.AfterClass; import org.junit.BeforeClass; import org.junit.Test; import com.loiane.dao.BookDAOImpl; import com.loiane.model.Book; public class TestBookDAO { private static BookDAOImpl bookDAO; @BeforeClass public static void runBeforeClass() { bookDAO = new BookDAOImpl(); } @AfterClass public static void runAfterClass() { bookDAO = null; } /** * Test method for {@link com.loiane.dao.BookDAOImpl#saveBook()}. */ @Test public void testSaveBook() { //File file = new File("images\\extjsfirstlook.jpg"); //windows File file = new File("images/extjsfirstlook.jpg"); byte[] bFile = new byte[(int) file.length()]; try { FileInputStream fileInputStream = new FileInputStream(file); fileInputStream.read(bFile); fileInputStream.close(); } catch (Exception e) { e.printStackTrace(); } Book book = new Book(); book.setName("Ext JS 4 First Look"); book.setImage(bFile); bookDAO.saveBook(book); assertNotNull(book.getId()); } /** * Test method for {@link com.loiane.dao.BookDAOImpl#getBook()}. */ @Test public void testGetBook() { Book book = bookDAO.getBook((long) 1); assertNotNull(book); try{ //FileOutputStream fos = new FileOutputStream("images\\output.jpg"); //windows FileOutputStream fos = new FileOutputStream("images/output.jpg"); fos.write(book.getImage()); fos.close(); }catch(Exception e){ e.printStackTrace(); } } } To verify if it was really saved, let’s check the table Book: and if we right click… and choose to see the image we just saved, we will see it: Source Code Download You can download the complete source code (or fork/clone the project – git) from: Github: https://github.com/loiane/hibernate-image-example BitBucket: https://bitbucket.org/loiane/hibernate-image-example/downloads Happy Coding! From http://loianegroner.com/2011/10/how-to-load-or-save-image-using-hibernate-mysql/

October 24, 2011

by Loiane Groner

· 98,102 Views · 2 Likes

OpenStreetMap API framework for PHP

OpenStreetMap is a global project with an aim of collaboratively collecting map data, and today Ken Guest has submitted his PHP package for communitcating with the OSM API to the public and the PEAR PEPr review process: So over the last while, I’ve been working on a PHP package imaginatively named Services_Openstreetmap, for interacting with the openstreetmap API. I initially needed it so I could search for certain POIs and tabulate the results; it’s now also capable of adding data to the openstreetmap database – nodes and other elements can be created, updated and so on. It will even access the details of the user that is being used to modify that data, which is one difference between it and the other single purpose OSM frameworks. --Ken Guest You can view the submission here, and you should definitely take a look at openstreetmap.org if you haven't already. Good news for PHP developers looking to use this project more heavily in their applications.

October 22, 2011

by Mitch Pronschinske

· 16,324 Views

How to retrieve/extract metadata information from audio files using Java and Apache Tika API?

i guess, i’m writing this post after a long time. this time, i’m writing about apache tika api that a friend of mine and i tried out to extract/retrieve metadata information from audio files supported by it – .mp3, .aiff, .au, .midi, .wav. to make it clear, here’s a screenshot of the information shown by windows vista about an audio file: we wanted to extract this using java and with googling, found that apache tika would help. we needed this metadata to index audio files for it to be searchable in a search application that we’re building using apache lucene . here’s a sample java program that extracts metadata from an mp3 file: package singz.samples.search.audio.metadata; import java.io.file; import java.io.fileinputstream; import java.io.filenotfoundexception; import java.io.ioexception; import java.io.inputstream; import org.apache.tika.exception.tikaexception; import org.apache.tika.metadata.metadata; import org.apache.tika.parser.parsecontext; import org.apache.tika.parser.parser; import org.apache.tika.parser.mp3.mp3parser; import org.xml.sax.contenthandler; import org.xml.sax.saxexception; import org.xml.sax.helpers.defaulthandler; /** * @author singaram subramanian * extract metadata of an audio file using apache tika api * */ public class audiometadataextractordemo { public static void main(string[] args) { // this audio file has metadata embedded in xmp (extensible metadata platform) standard // created by adobe systems inc. xmp standardizes the definition, creation, and // processing of extensible metadata. string audiofileloc = "c:\\pop\\backstreetboys_showmethemeaningofbeinglonely.mp3"; try { inputstream input = new fileinputstream(new file(audiofileloc)); contenthandler handler = new defaulthandler(); metadata metadata = new metadata(); parser parser = new mp3parser(); parsecontext parsectx = new parsecontext(); parser.parse(input, handler, metadata, parsectx); input.close(); // list all metadata string[] metadatanames = metadata.names(); for(string name : metadatanames){ system.out.println(name + ": " + metadata.get(name)); } // retrieve the necessary info from metadata // names - title, xmpdm:artist etc. - mentioned below may differ based // on the standard used for processing and storing standardized and/or // proprietary information relating to the contents of a file. system.out.println("title: " + metadata.get("title")); system.out.println("artists: " + metadata.get("xmpdm:artist")); system.out.println("genre: " + metadata.get("xmpdm:genre")); } catch (filenotfoundexception e) { e.printstacktrace(); } catch (ioexception e) { e.printstacktrace(); } catch (saxexception e) { e.printstacktrace(); } catch (tikaexception e) { e.printstacktrace(); } } } maven pom xml 4.0.0 singz.samples.search.audio audiometadataextractor 0.0.1 jar audiometadataextractor http://maven.apache.org utf-8 org.apache.tika tika-core 0.10 org.apache.tika tika-parsers 0.10 output xmpdm:releasedate: 2001 xmpdm:audiochanneltype: stereo xmpdm:album: top 100 pop author: backstreet boys xmpdm:artist: backstreet boys channels: 2 xmpdm:audiosamplerate: 44100 xmpdm:logcomment: eng xmpdm:tracknumber: 04 version: mpeg 3 layer iii version 1 xmpdm:composer: null xmpdm:audiocompressor: mp3 title: show me the meaning of being lonely samplerate: 44100 xmpdm:genre: pop content-type: audio/mpeg title: show me the meaning of being lonely artists: backstreet boys genre: pop about apache tika http://tika.apache.org/index.html “the apache tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.” http://www.lucidimagination.com/devzone/technical-articles/content-extraction-tika#article.tika “apache tika is a content type detection and content extraction framework. tika provides a general application programming interface that can be used to detect the content type of a document and also parse textual content and metadata from several document formats. tika does not try to understand the full variety of different document formats by itself but instead delegates the real work to various existing parser libraries such as apache poi for microsoft formats, pdfbox for adobe pdf, neko html for html etc. the grand idea behind tika is that it offers a generic interface for parsing multiple formats. the tika api hides the technical differences of the various parser implementations. this means that you don’t have to learn and consume one api for every format you use but can instead use a single api – the tika api. internally tika usually delegates the parsing work to existing parsing libraries and adapts the parse result so that client applications can easily manage variety of formats. tika aims to be efficient in using available resources (mainly ram) while parsing. the tika api is stream oriented so that the parsed source document does not need to be loaded into memory all at once but only as it is needed. ultimately, however, the amount of resources consumed is mandated by the parser libraries that tika uses. at the time of writing this, tika supports directly around 30 document formats. see list of supported document formats . the list of supported document formats is not limited by tika in any way. in the simplest case you can add support for new document formats by implementing a thin adapter that that implements the parser interface for the new document format.” about xmp standard http://en.wikipedia.org/wiki/extensible_metadata_platform “the adobe extensible metadata platform ( xmp ) is a standard, created by adobe systems inc. , for processing and storing standardized and proprietary information relating to the contents of a file. xmp standardizes the definition, creation, and processing of extensible metadata . serialized xmp can be embedded into a significant number of popular file formats, without breaking their readability by non-xmp-aware applications. embedding metadata avoids many problems that occur when metadata is stored separately. xmp is used in pdf , photography and photo editing applications. xmp can be used in several file formats such as pdf , jpeg , jpeg 2000 , jpeg xr , gif , png , html , tiff , adobe illustrator , psd , mp3 , mp4 , audio video interleave , wav , rf64 , audio interchange file format , postscript , encapsulated postscript , and proposed for djvu . in a typical edited jpeg file, xmp information is typically included alongside exif and iptc information interchange model data.” from http://singztechmusings.wordpress.com/2011/10/17/how-to-retrieveextract-metadata-information-from-audio-files-using-java-and-apache-tika-api/

October 20, 2011

by Singaram Subramanian

· 34,334 Views

Handling PHP Sessions in Windows Azure

One of the challenges in building a distributed web application is in handling sessions. When you have multiple instances of an application running and session data is written to local files (as is the default behavior for the session handling functions in PHP) a user session can be lost when a session is started on one instance but subsequent requests are directed (via a load balancer) to other instances. To successfully manage sessions across multiple instances, you need a common data store. In this post I’ll show you how the Windows Azure SDK for PHP makes this easy by storing session data in Windows Azure Table storage. In the 4.0 release of the Windows Azure SDK for PHP, session handling via Windows Azure Table and Blob storage was included in the newly added SessionHandler class. Note: The SessionHandler class supports storing session data in Table storage or Blob storage. I will focus on using Table storage in this post largely because I haven’t been able to come up with a scenario in which using Blob storage would be better (or even necessary). If you have ideas about how/why Blob storage would be better, I’d love to hear them. The SessionHandler class makes it possible to write code for handling sessions in the same way you always have, but the session data is stored on a Windows Azure Table instead of local files. To accomplish this, precede your usual session handling code with these lines: require_once 'Microsoft/WindowsAzure/Storage/Table.php'; require_once 'Microsoft/WindowsAzure/SessionHandler.php'; $storageClient = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', 'your storage account name', 'your storage account key'); $sessionHandler = new Microsoft_WindowsAzure_SessionHandler($storageClient , 'sessionstable'); $sessionHandler->register(); Now you can call session_start() and other session functions as you normally would. Nicely, it just works. Really, that’s all there is to using the SessionHandler, but I found it interesting to take a look at how it works. The first interesting thing to note is that the register method is simply calling the session_set_save_handler function to essentially map the session handling functionality to custom functions. Here’s what the method looks like from the source code: public function register() { return session_set_save_handler(array($this, 'open'), array($this, 'close'), array($this, 'read'), array($this, 'write'), array($this, 'destroy'), array($this, 'gc') ); } The reading, writing, and deleting of session data is only slightly more complicated. When writing session data, the key-value pairs that make up the data are first serialized and then base64 encoded. The serialization of the data allows for lots of flexibility in the data you want to store (i.e. you don’t have to worry about matching some schema in the data store). When storing data in a table, each entry must have a partition key and row key that uniquely identify it. The partition key is a string (“sessions” by default, but this is changeable in the class constructor) and the the row key is the session ID. (For more information about the structure of Tables, see this post.) Finally, the data is either updated (it it already exists in the Table) or a new entry is inserted. Here’s a portion of the write function: $serializedData = base64_encode(serialize($serializedData)); $sessionRecord = new Microsoft_WindowsAzure_Storage_DynamicTableEntity($this->_sessionContainerPartition, $id); $sessionRecord->sessionExpires = time(); $sessionRecord->serializedData = $serializedData; try { $this->_storage->updateEntity($this->_sessionContainer, $sessionRecord); } catch (Microsoft_WindowsAzure_Exception $unknownRecord) { $this->_storage->insertEntity($this->_sessionContainer, $sessionRecord); } Not surprisingly, when session data is read from the table, it is retrieved by session ID, base64 decoded, and unserialized. Again, here’s a snippet that show’s what is happening: $sessionRecord = $this->_storage->retrieveEntityById( $this->_sessionContainer, $this->_sessionContainerPartition, $id ); return unserialize(base64_decode($sessionRecord->serializedData)); As you can see, the SessionHandler class makes good use of the storage APIs in the SDK. To learn more about the SessionHandler class (and the storage APIs), check out the documentation on Codeplex. You can, of course, get the complete source code here: http://phpazure.codeplex.com/SourceControl/list/changesets. As I investigated the session handling in the Windows Azure SDK for PHP, I noticed that the absence of support for SQL Azure as a session store was conspicuous. I’m curious about how many people would prefer to use SQL Azure over Azure Tables as a session store. If you have an opinion on this, please let me know in the comments.

October 19, 2011

by Brian Swan

· 7,927 Views

The Goal of software development

The Goal by Eli Goldratt is a business book in the form of a novel, where the protagonist must save his factory from closing due to very low productivity. The Goal is not limited to the management of a large organization (not even to for-profit companies): you simply have to define different units of measurement, like goal units instead of making money, the default goal. In fact, from the applications of the Theory of Constraints in our field I think it applies to software development too. What follows is my translation of the themes of The Goal to our field. The Goal is... Mking money, of course. For the ones of you with knowledge in accounting, the original goal is: raising throghput, the amounts of items sold (not produced) in the unit of time. Lowering investment/inventory, all the money tied up in the system in the form of assets that could be sold or products that stay in a warehouse. Lowering operational expense, all the money that we have spent as support and that cannot be recovered. How does these measurements apply to software development? A team does not always have an impact on contract negotiation, so often talking about money is far from everyday reality (kudos to you if you can apply that point of the Agile Manifesto.) The goal for software development can be translated, in my opinion, to: raising the throughput, the amount of features delivered (deployed, not implemented or tested) in the unit of time. You can measure this amount in story points, since feature vary in size. lowering investment/inventory, all the time tied up in the system in the form of undeployed or untested features that clutter the code base. In a minor part, also investment in the form of hardware, but that's by far less important than the team's time. lower operational expense, the time spent by developers every day in order to support the development. Automation is a kind of time investment that will bring more time (and quality) in the future, lowering operational expense. Like for material products, WIP has storage and opportunity costs which goes into the operational expense. Kanban is a tool that tries to reduce WIP in order to foster the two latter points. Throughput accounting This kind of throughput accounting is emphasized in the novel, over the use of cost accounting, where each developer (ehm, factory worker) has to be occupied all the time, even if the work he is doing isn't moving towards the goal: neverending refactoring. Specification of a feature which cannot be implemented until two months are gone, and will have to be rewritten. Implementation of features which won't be merged with the main branch any time soon. With Test-Driven Development, we are getting good at moving a feature from implemented to tested directly in the same commit. Yet the missing step is getting the feature to the users: maybe that's also what Continuous Deployment is all about... Dependent events Dependent events and statistical fluctuations are production systems topics that make a balanced plant close to bankruptcy: however, we're not at the point in which we can model our team as precisely as a factory. The basic point is that a plant in which everyone is working all the time is inefficient: when an early stage (like defining a specification or implementing a feature) gets delayed, downstream step such as deployment are dalayed too. Converely, when an upstream step finish earlier, the downstream stage is already at maximum efficiency and cannot process the intermediate result faster. I wonder if this applies to software development too. In a factory, workers are specialized and can do just a few jobs across the plant. Since workers and machines have different production rates, there will be just one bottleneck: the slowest one. If products have to pass from the bottleneck, anyone producing faster than the bottleneck will just accumulate WIP in front of him. Continuing with our example, if the analyst or domain expert is churning out specifications for new features every day, most of them are just WIP in front of the development team. Once there is an established buffer, any additional specification won't raise throughput any faster; instead, it will raise the inventory (partial features) and the time spent in managing it. I think this is not always true in the most technical phases of development instead. For example, in a small team a developer may be moved to testing or refactoring, or setting up Continuous Integration or evaluation of a new library. Unless you have a DBA which can just manage databases, your developer is not fixed into a stage of the system. The bottleneck The previous example featured a bottleneck, the most famous concept of the Theory of Constraints introduced in the book. This translation to the software development case is mine and could be incomplete. A feature (or a user story) has to pass in a series of stations where different people will work on it to make it real: specification from a domain expert, implementation from a technical team, extended testing with optional customer validation, deployment which should be fast but at the same time must not kill the current version of the application. Each station has an average velocity. By definition, there is a station which is the slowest and can process fewest story points in the unit of time. This is the bottleneck. (This is not always true, as velocity may vary greatly in time with the addition of new people or hidden lines discovered in a feature. Becoming good at estimation and stabilizing a team are two objectives that help reach the assumption.) You can identify a bottleneck by looking at where is the WIP: it will accumulate in front of it. If you already have a kanban board, this phase is simpler... Once identified, the throughput of the system can only be improved by raising the bottleneck's capacity enough so that it is no more a bottleneck. You can move people to it (keeping an eye on communication costs); ensure it is used at maximum efficiency by freeing the specialized developers from other mundane tasks. Now you can restart and find a new bottleneck... Conclusions This is just a little introduction to the themes of the Theory of Constraints and Goldratt's teachings; I don't pretend to explain the whole book in an article. It is also against the Socratic method: you should reach the answers yourself, and these are just examples from my experience. There is more to Goldratt and The Goal than bottlenecks and throughput, such as continuous improvement. If you're working or managing in a software development team, I suggest you to read this book if you have the opportunity. Even when freelancing, it is an eye-opener in moving towards a Goal instead of busyworking; and it's written with a never boring teaching method.

October 4, 2011

by Giorgio Sironi

· 21,732 Views

Hibernate by Example - Part 1 (Orphan removal)

So i thought to do a series of hibernate examples showing various features of hibernate. In the first part i wanted to show about the Delete Orphan feature and how it may be used with the use of a story line. So let us begin :) Prerequisites: In order for you to try out the following example you will need the below mentioned JAR files: org.springframework.aop-3.0.6.RELEASE.jar org.springframework.asm-3.0.6.RELEASE.jar org.springframework.aspects-3.0.6.RELEASE.jar org.springframework.beans-3.0.6.RELEASE.jar org.springframework.context.support-3.0.6.RELEASE.jar org.springframework.context-3.0.6.RELEASE.jar org.springframework.core-3.0.6.RELEASE.jar org.springframework.jdbc-3.0.6.RELEASE.jar org.springframework.orm-3.0.6.RELEASE.jar org.springframework.transaction-3.0.6.RELEASE.jar. org.springframework.expression-3.0.6.RELEASE.jar commons-logging-1.0.4.jar log4j.jar aopalliance-1.0.jar dom4j-1.1.jar hibernate-commons-annotations-3.2.0.Final.jar hibernate-core-3.6.4.Final.jar hibernate-jpa-2.0-api-1.0.0.Final.jar javax.persistence-2.0.0.jar jta-1.1.jar javassist-3.1.jar slf4j-api-1.6.2.jar mysql-connector-java-5.1.13-bin.jar commons-collections-3.0.jar For anyone who want the eclipse project to try this out, you can download it with the above mentioned JAR dependencies here. Introduction: Its year 2011. And The Justice League has grown out of proportion and are searching for a developer to help with creating a super hero registering system. A developer competent in Hibernate and ORM is ready to do the system and handle the persistence layer using Hibernate. For simplicity, He will be using a simple stand alone application to persist super heroes. This is how this example will layout: Table design Domain classes and Hibernate mappings DAO & Service classes Spring configuration for the application A simple main class to show how it all works Let the Journey Begin...................... Table Design: The design consists of three simple tables as illustrated by the diagram below; As you can see its a simple one-to-many relationship linked by a Join Table. The Join Table will be used by Hibernate to fill the Super hero list which is in the domain class which we will go on to see next. Domain classes and Hibernate mappings: There are mainly only two domain classes as the Join table in linked with the primary owning entity which is the Justice League entity. So let us go on to see how the domain classes are constructed with annotations; package com.justice.league.domain; import java.io.Serializable; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.Table; import org.hibernate.annotations.Type; @Entity @Table(name = "SuperHero") public class SuperHero implements Serializable { /** * */ private static final long serialVersionUID = -6712720661371583351L; @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "super_hero_id") private Long superHeroId; @Column(name = "super_hero_name") private String name; @Column(name = "power_description") private String powerDescription; @Type(type = "yes_no") @Column(name = "isAwesome") private boolean isAwesome; public Long getSuperHeroId() { return superHeroId; } public void setSuperHeroId(Long superHeroId) { this.superHeroId = superHeroId; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getPowerDescription() { return powerDescription; } public void setPowerDescription(String powerDescription) { this.powerDescription = powerDescription; } public boolean isAwesome() { return isAwesome; } public void setAwesome(boolean isAwesome) { this.isAwesome = isAwesome; } } As i am using MySQL as the primary database, i have used the GeneratedValue strategy as GenerationType.AUTO which will do the auto incrementing whenever a new super hero is created. All other mappings are familiar to everyone with the exception of the last variable where we map a boolean to a Char field in the database. We use Hibernate's @Type annotation to represent true & false as Y & N within the database field. Hibernate has many @Type implementations which you can read about here. In this instance we have used this type. Ok now that we have our class to represent the Super Heroes, lets go on to see how our Justice League domain class looks like which keeps tab of all super heroes who have pledged allegiance to the League. package com.justice.league.domain; import java.io.Serializable; import java.util.ArrayList; import java.util.List; import javax.persistence.CascadeType; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.FetchType; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.JoinColumn; import javax.persistence.JoinTable; import javax.persistence.OneToMany; import javax.persistence.Table; @Entity @Table(name = "JusticeLeague") public class JusticeLeague implements Serializable { /** * */ private static final long serialVersionUID = 763500275393020111L; @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "justice_league_id") private Long justiceLeagueId; @Column(name = "justice_league_moto") private String justiceLeagueMoto; @Column(name = "number_of_members") private Integer numberOfMembers; @OneToMany(cascade = { CascadeType.ALL }, fetch = FetchType.EAGER, orphanRemoval = true) @JoinTable(name = "JUSTICE_LEAGUE_SUPER_HERO", joinColumns = { @JoinColumn(name = "justice_league_id") }, inverseJoinColumns = { @JoinColumn(name = "super_hero_id") }) private List superHeroList = new ArrayList(0); public Long getJusticeLeagueId() { return justiceLeagueId; } public void setJusticeLeagueId(Long justiceLeagueId) { this.justiceLeagueId = justiceLeagueId; } public String getJusticeLeagueMoto() { return justiceLeagueMoto; } public void setJusticeLeagueMoto(String justiceLeagueMoto) { this.justiceLeagueMoto = justiceLeagueMoto; } public Integer getNumberOfMembers() { return numberOfMembers; } public void setNumberOfMembers(Integer numberOfMembers) { this.numberOfMembers = numberOfMembers; } public List getSuperHeroList() { return superHeroList; } public void setSuperHeroList(List superHeroList) { this.superHeroList = superHeroList; } } The important fact to note here is the annotation @OneToMany(cascade = { CascadeType.ALL }, fetch = FetchType.EAGER, orphanRemoval = true). Here we have set orphanRemoval = true. So what does that do exactly? Ok so say that you have a group of Super Heroes in your League. And say one Super Hero goes haywire. So we need to remove Him/Her from the League. With JPA cascade this is not possible as it does not detect Orphan records and you will wind up with the database having the deleted Super Hero(s) whereas your collection still has a reference to it. Prior to JPA 2.0 you did not have the orphanRemoval support and the only way to delete orphan records was to use the following Hibernate specific(or ORM specific) annotation which is now deprecated; @org.hibernate.annotations.Cascade(org.hibernate.annotations.CascadeType.DELETE_ORPHAN) But with the introduction of the attribute orphanRemoval, we are now able to handle the deletion of orphan records through JPA. Now that we have our Domain classes DAO & Service classes: To keep with good design standards i have separated the DAO(Data access object) layer and the service layer. So let us see the DAO interface and implementation. Note that i have used HibernateTemplatethrough HibernateDAOSupportso as to keep away any Hibernate specific detail out and access everything in a unified manner using Spring. package com.justice.league.dao; import org.springframework.transaction.annotation.Propagation; import org.springframework.transaction.annotation.Transactional; import com.justice.league.domain.JusticeLeague; @Transactional(propagation = Propagation.REQUIRED, readOnly = false) public interface JusticeLeagueDAO { public void createOrUpdateJuticeLeagure(JusticeLeague league); public JusticeLeague retrieveJusticeLeagueById(Long id); } In the interface layer i have defined the Transaction handling as Required. This is done so that whenever you do not need a transaction you can define that at the method level of that specific method and in more situations you will need a transaction with the exception of data retrieval methods. According to the JPA spec you need a valid transaction for insert/delete/update functions. So lets take a look at the DAO implementation; package com.justice.league.dao.hibernate; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.orm.hibernate3.support.HibernateDaoSupport; import org.springframework.transaction.annotation.Propagation; import org.springframework.transaction.annotation.Transactional; import com.justice.league.dao.JusticeLeagueDAO; import com.justice.league.domain.JusticeLeague; @Qualifier(value="justiceLeagueHibernateDAO") public class JusticeLeagueHibernateDAOImpl extends HibernateDaoSupport implements JusticeLeagueDAO { @Override public void createOrUpdateJuticeLeagure(JusticeLeague league) { if (league.getJusticeLeagueId() == null) { getHibernateTemplate().persist(league); } else { getHibernateTemplate().update(league); } } @Transactional(propagation = Propagation.NOT_SUPPORTED, readOnly = false) public JusticeLeague retrieveJusticeLeagueById(Long id){ return getHibernateTemplate().get(JusticeLeague.class, id); } } Here i have defined an @Qualifier to let Spring know that this is the Hibernate implementation of the DAO class. Note the package name which ends with hibernate. This as i see is a good design concept to follow where you separate your implementation(s) into separate packages to keep the design clean. Ok lets move on to the service layer implementation. The service layer in this instance is just acting as a mediation layer to call the DAO methods. But in a real world application you will probably have other validations, security related procedures etc handled within the service layer. package com.justice.league.service; import com.justice.league.domain.JusticeLeague; public interface JusticeLeagureService { public void handleJusticeLeagureCreateUpdate(JusticeLeague justiceLeague); public JusticeLeague retrieveJusticeLeagueById(Long id); } package com.justice.league.service.impl; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.stereotype.Component; import com.justice.league.dao.JusticeLeagueDAO; import com.justice.league.domain.JusticeLeague; import com.justice.league.service.JusticeLeagureService; @Component("justiceLeagueService") public class JusticeLeagureServiceImpl implements JusticeLeagureService { @Autowired @Qualifier(value = "justiceLeagueHibernateDAO") private JusticeLeagueDAO justiceLeagueDAO; @Override public void handleJusticeLeagureCreateUpdate(JusticeLeague justiceLeague) { justiceLeagueDAO.createOrUpdateJuticeLeagure(justiceLeague); } public JusticeLeague retrieveJusticeLeagueById(Long id){ return justiceLeagueDAO.retrieveJusticeLeagueById(id); } } Few things to note here. First of all the @Component binds this service implementation with the name justiceLeagueService within the spring context so that we can refer to the bean as a bean with an id of name justiceLeagueService. And we have auto wired the JusticeLeagueDAO and defined an @Qualifier so that it will be bound to the Hibernate implementation. The value of the Qualifier should be the same name we gave the class level Qualifier within the DAO Implementation class. And Lastly let us look at the Spring configuration which wires up all these together; Spring configuration for the application: com.justice.league.**.* org.hibernate.dialect.MySQLDialect com.mysql.jdbc.Driver jdbc:mysql://localhost:3306/my_test root password true org.hibernate.dialect.MySQLDialect Note that i have used the HibernateTransactionManager in this instance as i am running it stand alone. If you are running it within an application server you will almost always use a JTA Transaction manager. I have also used auto creation of tables by hibernate for simplicity purposes. The packagesToScan property instructs to scan through all sub packages(including nested packaged within them) under the root package com.justice.league.**.* to be scanned for @Entity annotated classes. We have also bounded the session factory to the justiceLeagueDAO so that we can work with the Hibernate Template. For testing purposes you can have the tag create initially if you want, and let hibernate create the tables for you. Ok so now that we have seen the building blocks of the application, lets see how this all works by first creating some super heroes within the Justice League A simple main class to show how it all works: As the first example lets see how we are going to persist the Justice League with a couple of Super Heroes; package com.test; import java.util.ArrayList; import java.util.List; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.justice.league.domain.JusticeLeague; import com.justice.league.domain.SuperHero; import com.justice.league.service.JusticeLeagureService; public class TestSpring { /** * @param args */ public static void main(String[] args) { ApplicationContext ctx = new ClassPathXmlApplicationContext( "spring-context.xml"); JusticeLeagureService service = (JusticeLeagureService) ctx .getBean("justiceLeagueService"); JusticeLeague league = new JusticeLeague(); List superHeroList = getSuperHeroList(); league.setSuperHeroList(superHeroList); league.setJusticeLeagueMoto("Guardians of the Galaxy"); league.setNumberOfMembers(superHeroList.size()); service.handleJusticeLeagureCreateUpdate(league); } private static List getSuperHeroList() { List superHeroList = new ArrayList(); SuperHero superMan = new SuperHero(); superMan.setAwesome(true); superMan.setName("Clark Kent"); superMan.setPowerDescription("Faster than a speeding bullet"); superHeroList.add(superMan); SuperHero batMan = new SuperHero(); batMan.setAwesome(true); batMan.setName("Bruce Wayne"); batMan.setPowerDescription("I just have some cool gadgets"); superHeroList.add(batMan); return superHeroList; } } And if we go to the database and check this we will see the following output; mysql> select * from superhero; +---------------+-----------+-----------------+-------------------------------+ | super_hero_id | isAwesome | super_hero_name | power_description | +---------------+-----------+-----------------+-------------------------------+ | 1 | Y | Clark Kent | Faster than a speeding bullet | | 2 | Y | Bruce Wayne | I just have some cool gadgets | +---------------+-----------+-----------------+-------------------------------+ mysql> select * from justiceleague; +-------------------+-------------------------+-------------------+ | justice_league_id | justice_league_moto | number_of_members | +-------------------+-------------------------+-------------------+ | 1 | Guardians of the Galaxy | 2 | +-------------------+-------------------------+-------------------+ So as you can see we have persisted two super heroes and linked them up with the Justice League. Now let us see how that delete orphan works with the below example; package com.test; import java.util.ArrayList; import java.util.List; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.justice.league.domain.JusticeLeague; import com.justice.league.domain.SuperHero; import com.justice.league.service.JusticeLeagureService; public class TestSpring { /** * @param args */ public static void main(String[] args) { ApplicationContext ctx = new ClassPathXmlApplicationContext( "spring-context.xml"); JusticeLeagureService service = (JusticeLeagureService) ctx .getBean("justiceLeagueService"); JusticeLeague league = service.retrieveJusticeLeagueById(1l); List superHeroList = league.getSuperHeroList(); /** * Here we remove Batman(a.k.a Bruce Wayne) out of the Justice League * cos he aint cool no more */ for (int i = 0; i < superHeroList.size(); i++) { SuperHero superHero = superHeroList.get(i); if (superHero.getName().equalsIgnoreCase("Bruce Wayne")) { superHeroList.remove(i); break; } } service.handleJusticeLeagureCreateUpdate(league); } } Here we first retrieve the Justice League record by its primary key. Then we loop through and remove Batman off the League and again call the createOrUpdate method. As we have the remove orphan defined, any Super Hero not in the list which is in the database will be deleted. Again if we query the database we will see that batman has been removed now as per the following; mysql> select * from superhero; +---------------+-----------+-----------------+-------------------------------+ | super_hero_id | isAwesome | super_hero_name | power_description | +---------------+-----------+-----------------+-------------------------------+ | 1 | Y | Clark Kent | Faster than a speeding bullet | +---------------+-----------+-----------------+-------------------------------+ So that's it. The story of how Justice League used Hibernate to remove Batman automatically without being bothered to do it themselves. Next up look forward to how Captain America used Hibernate Criteria to build flexible queries in order to locate possible enemies. Watch out!!!! Have a great day people and thank you for reading!!!! If you have any suggestions or comments pls do leave them by. From http://dinukaroshan.blogspot.com/2011/09/hibernate-by-example-part-1-orphan.html

September 24, 2011

by Dinuka Arseculeratne

· 47,863 Views

Practical PHP Refactoring: Replace Record with Data Class

We often find ourselves tempted by the shortcut of using directly a record-like data structure provided by the language or a framework. There are many example scenarios where a record emerges: when you use a database for persistence (not only a with relational one) there can be a data structure containing the results of a query. when you use associative arrays, often they have a small number of fixed keys. Record-like structures are the equivalent of C structs, or Ruby hashes. This refactoring is a generalization of Replace Array with Object: in this case the starting point is not just an array which always has the same number and type of fields, but any data structure homoegeneous to it: Zend_Db_Table_Row, implementing a Row Data Gateway and giving access to a row in the database. stdClass instances (or arrays) fetched by PDOStatement. In some languages (see C) the record is a particular data structure which is not an object; in PHP, it is always an array or an object of some vendor class. Even ORMs based on Active Record are more advanced than this kind of usage: they usually let you add methods on the model classes, which are populated with data by the ORM itself. In this refactoring, we are talking about a data structure managed by the language of the persistence layer, and whose code you cannot modify. Why replacing a record-like structure? Generic classes do not let you add methods to manipulate their data: when using directly a Zend_Db_Table_Row or an associative array for storage of the result of a query, you have to resort continuously to foreign methods. Each time you need new logic, you have to compromise encapsulation on the object itself, and these methods get duplicated in various instance of client code. Solutions There are different ways to eliminate the coupling to a record-like structure. The first is to refactor to subclassing, where the data structure becomes an Active Record. You extends the vendor class with your own one: this option is only available if the structure is defined as an object. Moreover, the name of the class to instantiate must be configurable in the persistence mechanism. Zend_Db_Table_Record supports this kind of usage. A second option is to refactor to composition: Zend_Db examples exist also for this case. This approach can be used to avoid large hierarchies: your model classes compose the Zend_Db objects and hide the database; methods can be added for once at your own models. A third and final alternative is to use hydration: data is copied to your model objects, and records are thrown away after the fact. Doctrine 2 and Data Mappers in general choose this approach. Steps I will describe a short procedure for refactoring to hydration since it is the most complex approach and it is always applicable. The other are instead specific for the particular data structure (for example with Zend_Db you have to write some subclasses and configure some protected fields to contain the right class names.) Create a new class, as one of your models. Its state should be represented by one row (or more rows joined into one) in the database. This class should gain a private field for each of the record's fields, usually with getters and setters. This class should accept in the constructor or in a Factory Method an instance of the record, so that it can produce a new instance. If you want to decouple from the persistsnce, or you want to go two ways (also save and not only visualize, since this is not just a presentation model), look for a Data Mapper such as Doctrine 2, which will even hide all the record structures from you and manager associations where objects compose other ones. Example In the example, the data of a single user are returned in an array. I chose an array as the record-like structure to minimize the external dependencies of this code. find(42); $this->assertEquals('Giorgio', $giorgio['name']); } } /** * This is a Fake Table Data Gateway. The machinery for making it work with * a database will be distracting for our purposes, so they will be omitted. */ class UsersTable { /** * @return mixed the returned value can be a Zend_Db_Table_Row, * an Active Record, a stdClass, an associative array... * It should just represent a single entity. */ public function find($id) { // execute a PDOStatement and fetch the data return array('id' => 42, 'name' => 'Giorgio'); } } After the refactoring, we have a User class where we can add all the methods we need: find(42); $this->assertEquals('Giorgio', $giorgio->getName()); } } /** * This is a Fake Table Data Gateway. The machinery for making it work with * a database will be distracting for our purposes, so they will be omitted. */ class UsersTable { /** * @return mixed the returned value can be a Zend_Db_Table_Row, * an Active Record, a stdClass, an associative array... * It should just represent a single entity. */ public function find($id) { // execute a PDOStatement and fetch the data return User::fromRecord(array('id' => 42, 'name' => 'Giorgio')); } } class User { private $id; private $name; public static function fromRecord(array $record) { $object = new self(); $object->id = $record['id']; $object->name = $record['name']; return $object; } public function getName() { return $this->name; } }

September 19, 2011

by Giorgio Sironi

· 11,134 Views

EC2 Interview – AWS Interview – Cloud Interview – 8 Questions

If you're looking for a cloud expert, specifically someone who knows Amazon Web Services and EC2, you'll want to have a battery of questions to assess their knowledge.

September 15, 2011

by Sean Hull

· 111,875 Views · 1 Like

Memory Barriers/Fences

In this article I'll discuss the most fundamental technique in concurrent programming known as memory barriers, or fences, that make the memory state within a processor visible to other processors. CPUs have employed many techniques to try and accommodate the fact that CPU execution unit performance has greatly outpaced main memory performance. In my “Write Combining” article I touched on just one of these techniques. The most common technique employed by CPUs to hide memory latency is to pipeline instructions and then spend significant effort, and resource, on trying to re-order these pipelines to minimise stalls related to cache misses. When a program is executed it does not matter if its instructions are re-ordered provided the same end result is achieved. For example, within a loop it does not matter when the loop counter is updated if no operation within the loop uses it. The compiler and CPU are free to re-order the instructions to best utilise the CPU provided it is updated by the time the next iteration is about to commence. Also over the execution of a loop this variable may be stored in a register and never pushed out to cache or main memory, thus it is never visible to another CPU. CPU cores contain multiple execution units. For example, a modern Intel CPU contains 6 execution units which can do a combination of arithmetic, conditional logic, and memory manipulation. Each execution unit can do some combination of these tasks. These execution units operate in parallel allowing instructions to be executed in parallel. This introduces another level of non-determinism to program order if it was observed from another CPU. Finally, when a cache-miss occurs, a modern CPU can make an assumption on the results of a memory load and continue executing based on this assumption until the load returns the actual data. Provided “program order” is preserved the CPU, and compiler, are free to do whatever they see fit to improve performance. Figure 1. Loads and stores to the caches and main memory are buffered and re-ordered using the load, store, and write-combining buffers. These buffers are associative queues that allow fast lookup. This lookup is necessary when a later load needs to read the value of a previous store that has not yet reached the cache. Figure 1 above depicts a simplified view of a modern multi-core CPU. It shows how the execution units can use the local registers and buffers to manage memory while it is being transferred back and forth from the cache sub-system. In a multi-threaded environment techniques need to be employed for making program results visible in a timely manner. I will not cover cache coherence in this article. Just assume that once memory has been pushed to the cache then a protocol of messages will occur to ensure all caches are coherent for any shared data. The techniques for making memory visible from a processor core are known as memory barriers or fences. Memory barriers provide two properties. Firstly, they preserve externally visible program order by ensuring all instructions either side of the barrier appear in the correct program order if observed from another CPU and, secondly, they make the memory visible by ensuring the data is propagated to the cache sub-system. Memory barriers are a complex subject. They are implemented very differently across CPU architectures. At one end of the spectrum there is a relatively strong memory model on Intel CPUs that is more simple than say the weak and complex memory model on a DEC Alpha with its partitioned caches in addition to cache layers. Since x86 CPUs are the most common for multi-threaded programming I’ll try and simplify to this level. Store Barrier A store barrier, “sfence” instruction on x86, forces all store instructions prior to the barrier to happen before the barrier and have the store buffers flushed to cache for the CPU on which it is issued. This will make the program state visible to other CPUs so they can act on it if necessary. A good example of this in action is the following simplified code from the BatchEventProcessor in the Disruptor. When the sequence is updated other consumers and producers know how far this consumer has progressed and thus can take appropriate action. All previous updates to memory that happened before the barrier are now visible. private volatile long sequence = RingBuffer.INITIAL_CURSOR_VALUE; // from inside the run() method T event = null; long nextSequence = sequence + 1L; while (running) { try { final long availableSequence = dependencyBarrier.waitFor(nextSequence); while (nextSequence <= availableSequence) { event = dependencyBarrier.getEvent(nextSequence); eventHandler.onEvent(event, nextSequence == availableSequence); nextSequence++; } sequence = event.getSequence(); // store barrier inserted here !!! } catch (final Exception ex) { exceptionHandler.handle(ex, event); sequence = event.getSequence(); // store barrier inserted here !!! nextSequence = event.getSequence() + 1L; } } Load Barrier A load barrier, “lfence” instruction on x86, forces all load instructions after the barrier to happen after the barrier and then wait on the load buffer to drain for that CPU. This makes program state exposed from other CPUs visible to this CPU before making further progress. A good example of this is when the BatchEventProcessor sequence referenced above is read by producers, or consumers, in the corresponding barriers of the Disruptor. Full Barrier A full barrier, "mfence" instruction on x86, is a composite of both load and store barriers happening on a CPU. Java Memory Model In the Java Memory Model a volatile field has a store barrier inserted after a write to it and a load barrier inserted before a read of it. Qualified final fields of a class have a store barrier inserted after their initialisation to ensure these fields are visible once the constructor completes when a reference to the object is available. Atomic Instructions and Software Locks Atomic instructions, such as the “lock ...” instructions on x86, are effectively a full barrier as they lock the memory sub-system to perform an operation and have guaranteed total order, even across CPUs. Software locks usually employ memory barriers, or atomic instructions, to achieve visibility and preserve program order. Performance Impact of Memory Barriers Memory barriers prevent a CPU from performing a lot of techniques to hide memory latency therefore they have a significant performance cost which must be considered. To achieve maximum performance it is best to model the problem so the processor can do units of work, then have all the necessary memory barriers occur on the boundaries of these work units. Taking this approach allows the processor to optimise the units of work without restriction. There is an advantage to grouping necessary memory barriers in that buffers flushed after the first one will be less costly because no work will be under way to refill them. From http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html

September 12, 2011

by Martin Thompson

· 26,155 Views · 8 Likes

On DTOs

DTOs, or data-transfer objects, are commonly used. What is not со commonly-known is that they originate from DDD (Domain-driven design). There it makes a lot of sense – domain objects have state, identity and business logic while DTOs have only state. But many projects today are using the anemic data model approach (my opinion) and still use DTOs. They are used whenever an object “leaves” the service layer or “leaves” the system (through web services, rmi, etc.). There are three approaches: every entity has at least one corresponding DTO. Usually more than one, for different scenarios in the view layer. When you display a user in a list you have one DTO, when you display it in a “user details” window you need a more extended DTO. I am not in favour of this approach because in too many cases the DTO and the domain structure have exactly the same structure and as a result there’s a lot of duplicated code + redundant mapping. Another thing is the variability of multiple DTOs. Even if they differ from the entity, they differ from one another with one or two fields. Why duplication is a bad thing? Because changes are to be made in two places, issues are traced harder when data passes through multiple objects, and because..it is duplication. Copy & paste within the same project is a sin. DTOs are only created when their structure significantly differs from the that of the entity. In all other cases the entity itself is used. The cases when you don’t want to show some fields (especially when exposing via web services to 3rd parties) exist, but are not that common. This can sometimes be handled via the serialization mechanism – mark them as @JsonIgnore or @XmlTransient for example – but in other cases the structures are just different. In these cases a DTO is due. For example you have a User and UserDetails, where UserDetails holds the details + the relations of the currently logged user to the given user. The latter has nothing to do with the entity, so you create a DTO. However in the case of a DirectMessage you have sender, recipient, text and datetime both in the DB and in the UI. No need to have a DTO. One caveat with this approach (as well as with the next one). Anemic entities usually come with an ORM (JPA in the case of Java). Whenever they exit the service layer they may be invalid, because of lazy collections that require an open session. You have two options here: use OpenSessionInView / OpenEntityManagerInView – thus your session stays open until you are finished preparing the response. This is easy to configure but is not my preferred option – it violates layer boundaries in a subtle way, and this sometimes leads to problems especially for novice developers Don’t use lazy collections. Lazy collections are unneeded. Either make them eager, if they are supposed to hold a small list of items (for example – the list of roles for a user), or if the data is likely to grow use queries. Yes, you are not going to show 1000 records at on go anyway, you will have to page it. Without lazy associations (@*ToOne are eager by default) you won’t have invalid objects when the session is closed Don’t use DTOs at all. Applicable a soon as there aren’t significantly varying structures. For smaller projects this is usually a good way to go. Everything mentioned in the above point applies here. So my preferred approach is the “middle way”. But it requires a lot of consideration in each case, which may not be applicable for bigger and/or less experienced teams. So one of the two “extremes” should be picked. Since the “no DTOs” approach also requires consideration – what to make @Transient, how does lazy collections affect the flow, etc, the “All DTOs” is usually chosen. But even though it is seemingly the safest approach, it has many pitfalls. First, how do you map from DTOs to entities and vice-versa? Three options: dedicated mapper classes constructors – the DTO constructor takes the entity and fills itself, and vice-versa (remember to also provide a default constructor) declarative mapping (e.g. Dozer). This is practically the same as the first option – it externalizes the mapping. It can even be used together with a dedicated mapper class map them in-line (whenever needed). This can generate unmaintainable code and is not preferred I prefer the constructor approach, at least because fewer classes are created. But they are essentially the same (DTOs are not famous for encapsulation, so all of your properties are exposed anyway). Here is a list of guidelines when using DTOs and either of the “mapping” approaches: Don’t generate too much redundant code. If two scenarios require slightly different DTOs, reuse. No need to create a new DTO for a difference of one or two fields Don’t put presentation logic in mappers/constructors. For example if (entity.isActive()) dto.setStatus("Active"); This should happen in the view layer Don’t sneak entities together with DTOs. DTOs should not have members which are entities. Generally, entities should not be used outside the service layer (this is a bit extreme, but if we use DTOs everywhere we should be consistent and stick to that practice) Don’t use the mappers/entity-to-dto constructors in controllers, use them in the service layer. The reason DTOs are used in the first place is that entities may be ORM-bound, and they may not valid outside a session (i.e. outside the service layer). If using mappers, prefer static mapper methods. Mappers don’t have state, so no need for them to be instantiated. (And they don’t have to be mocked, wrapped, etc). If using mappers, there’s no need for a separate mapper for each entity(+its multiple DTOs). Related entities can be grouped in one mapper. For example Company, CompanyProfile, CompanySubsidiary can use the same mapper class Just make sure you make all these decisions at the beginning of a project and figure out which is applicable in your scenario (team size and experience, project size, domain complexity). From http://techblog.bozho.net/?p=427

September 10, 2011

by Bozhidar Bozhanov

· 27,790 Views · 3 Likes

JSON data migration

JSON data format is simple and still powerful. Nowadays you can encounter more and more web applications communicating using JSON format then a couple of years ago. It is simple for a developer to read the format, it is effective for a web browser to parse the format and there are databases using it as its primary data format. But what happens when the data structure changes? You need to migrate. And that is where acris-json-migration might help you! Example situation Let's shed a light into it and assume we have a data like this: { "firstName":"John" "secondName":"Doe" "street":"Over the rainbow" "streetNr":21 } Such data can be represented by following Java domain object: public class Person { String firstName; String secondName; String street; Integer streetNr; // ... and getters and setters... } Well, this seems like data about a person named John Doe. We stored it in a database and you can clearly see, that secondName is probably not the field name we really like to have. But a developer made a mistake and in second version of our domain model we are going to fix it: { "firstName":"John" "surname":"Doe" "street":"Over the rainbow" "streetNr":21 } Now you can see the point - thousands of data stored in the format defined by Person class in its version #1 but our program communicating in version #2 with changed secondName to surname in Person class. Clients can wonder why the don't see surnames, can't they? ;) One thing to remember (for the following context) - the class Person changed and there is only Person class in version #2. Simple migration script In this situation I would like to write a script: public class PersonV1toV2Script extends JacksonTransformationScript { @Override public void process(ObjectNode node) { rename(node, "secondName", "surname"); } } From the above example it is clear that the script will do the job. And you can do pretty anything with the whole tree of JSON data - adding new nodes, removing existing ones, transforming here and there - all thanks to Jackson's tree model. How can I execute it? There is a Transformer abstract class representing a transofmer responsible for passing JSON data to a script and writing it back. Currently there are tow kinds of transformers: Jackson-based JSONT-based Jackson-based is the preferred one and is more developed then JSONT-based. So to execute a transformation on a data set you have to specify only two lines of code: JacksonTransformer t = new JacksonTransformer(input, output); t.transform(PersonV1toV2Script.class.getName()); ... where input and output represent directories. In the input directory all files are treated as files containing JSON data and are transformed and written to the output directory. For a detailed test you can look into TransformerTest in the project. Conclusion The script's helper API is evolving and provides you with nice methods like removeIfExists or addNonExistent methods. We would like to hear about your use-cases which are not handled by acris-json-migration yet so the project can generally serve the purpose of JSON data migration.

September 9, 2011

by Ladislav Gažo

· 12,616 Views

Testing Databases with JUnit and Hibernate Part 1: One to Rule them

There is little support for testing the database of your enterprise application. We will describe some problems and possible solutions based on Hibernate and JUnit.

September 6, 2011

by Jens Schauder

· 123,057 Views · 2 Likes

Cloud Integration with Apache Camel and Amazon Web Services (AWS): S3, SQS and SNS

The integration framework Apache Camel already supports several important cloud services (see my overview article at http://www.kai-waehner.de/blog/2011/07/09/cloud-computing-heterogeneity-will-require-cloud-integration-apache-camel-is-already-prepared for more details). This article describes the combination of Apache Camel and the Amazon Web Services (AWS) interfaces of Simple Storage Service (S3), Simple Queue Service (SQS) and Simple Notification Service (SNS). Thus, The concept of Infrastructure as a Service (IaaS) is used to access messaging systems and data storage without any need for configuration. Registration to AWS and Setup of Camel First, you have to register to the Amazon Web Services (for free). Most AWS services include a free monthly quota, which is absolutely sufficient to play around and develop some simple applications. As its name states, AWS uses technology-independent web services. Besides, APIs for several different programming languages are available to ease development. By the way, Camel uses the AWS SDK for Java (http://aws.amazon.com/sdkforjava), of course. The documentation is detailed and easy to understand, including tutorials, screenshots and code examples . Hint 1: You should read the introductions to S3, SQS and SNS (go to http://aws.amazon.com and click on „products“) and play around with the AWS Management Console (http://aws.amazon.com/console) before you continue. This step is very easy and takes less than one hour. Then, you will have a much better understanding about AWS and where Camel can help you! Hint 2: It really helps to look at the source code of the camel-aws component, It helps you to understand how Camel uses the AWS Java API internally. If you want to write tests, you can do it the same way. In the past, I was afraid of looking at „complex“ source code of open source frameworks. But there is no need to be scared! The camel-aws component (and most other camel components) contain only of a few classes. Everything is easy to understand. It helps you to understand Camel internals, the AWS API, and to spot and solve errors due to exceptions in your code. In the meanwhile, the current Camel version 2.8 supports three AWS services: S3, SQS and SNS. All of them use similar concepts. Therefore, they are included in one single camel component: „camel-aws“. You have to add the libraries to your existing Camel project. As always, the simplest way is to use Maven and add the following dependency to the pom.xml: org.apache.camel camel-aws ${camel-version} Configuration of the Camel Endpoint The implementation and configuration of all three services is very similar. The URI looks like this (the code shows the SQS service): aws-sqs://queue-name[?options] There are two alternatives to configure your endpoint. Using Parameters The easy way is to use two paramters in the URI of your endpoint: „accessKey“ and „secretKey“ (you receive both after your AWS registration). “aws-sqs://unique-queue-name?accessKey=“INSERT_ME“&secretKey=INSERT_ME” Be aware of the following problem, which can result in a strange, non-speaking exception (thanks to Brendan Long): You’ll need to URL encode any +’s in your secret key (otherwise, they’ll be treated as spaces). + = %2B, so if your secretkey was “my+secret\key”, your Camel URL should have “secretKey=my%2Bsecret\key”. “Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.” Source: WC3 URI Recommendations Adding a configured AmazonClient to the Registry If you need to do more configuration (e.g. because your system is behind a firewall), you have to add an AmazonClient object to your registry. The following code shows an example using SQS, but SNS and S3 use exactly the same concept. @Override protected JndiRegistry createRegistry() throws Exception { JndiRegistry registry = super.createRegistry(); AWSCredentials awsCredentials = new BasicAWSCredentials(“INSERT_ME”, “INSERT_ME”); ClientConfiguration clientConfiguration = new ClientConfiguration(); clientConfiguration.setProxyHost(“http://myProxyHost”); clientConfiguration.setProxyPort(8080); AmazonSQSClient client = new AmazonSQSClient(awsCredentials, clientConfiguration); registry.bind(“amazonSQSClient”, client); return registry; } This example overwrites the createRegistry() method of a JUnit test (extending CamelTestSupport). You can also add this information to your runtime Camel application, of course. Apache Camel and the Simple Storage Service (S3) Simple Storage Service (S3) is a key-value-store. You can store small to very large data. The usage is very easy. You create buckets and put key-value data into these buckets. You can also create folders within buckets to organize your data. That’s it. You can monitor your buckets using the AWS Management Console – an intuitive GUI supporting most AWS services. The following example shows both alternatives for accessing the Amazon services (as described above): Paramenters and the AmazonClient. // Transfer data from your file inbox to the AWS S3 service from(“file:files/inbox”) // This is the key of your key-value data .setHeader(S3Constants.KEY, simple(“This is a static key”)) // Using parameters for accessing the AWS service .to(“aws-s3://camel-integration-bucket-mwea-kw?accessKey=INSERT_ME&secretKey=INSERT_ME&region=eu-west-1″); // Transfer data from the AWS S3 service to your file outbox from(“aws-s3://camel-integration-bucket-mwea-kw?amazonS3Client=#amazonS3Client&region=eu-wes”) .to(“file:files/outbox”); There are some additional parameters, for instance you can submit the desired AWS region or delete data after receiving it (see http://camel.apache.org/aws-s3.html and the corresponding SQS and SNS sites for more details about parameters and message headers). As you see in the code, you can use the AWS-S3 endpoint for producing and for consuming messages. Each bucket must be unique, thus you have to add some specific information such as your company to its name. Hint: If a bucket does not exist, Camel is creating it automatically (as the AWS API does). This concept is also used for SQS queues and SNS topics. Apache Camel and the Simple Queue Service (SQS) The Simple Queue Service (SQS) is similar to a JMS provider such as WebSphere MQ or ActiveMQ (but with some differences). You create queues and send messages to them. Consumers receive the messages. Contrary to most other AWS services, you cannot monitor queues by using the AWS management console directly. You have to use the service „Cloudwatch“ (http://aws.amazon.com/cloudwatch) and start an EC2 instance to monitor queues and its content. As you can see in the following code example, the syntax and concepts are almost the same as for the S3 service: from(“file:inbox”) .to(“aws-sqs://camel-integration-queue-mwea-kw?accessKey=INSERT_ME&secretKey=INSERT_ME”); from(“aws-sqs://camel-integration-queue-mwea-kw?amazonSQSClient=#amazonSQSClient”) .to(“file:outbox?fileName=sqs-${date:now:yyyy.MM.dd-hh:mm:ss:SS}”); Again, you can use the AWS-SQS endpoint for producing and for consuming messages. Each queue name must be unique. There exist two important differences to JMS (copy & paste from the AWS documentation): Q: How many times will I receive each message? Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies. Q: Why are there separate ReceiveMessage and DeleteMessage operations? When Amazon SQS returns a message to you, that message stays in the queue, whether or not you actually received the message. You are responsible for deleting the message; the delete request acknowledges that you’re done processing the message. If you don’t delete the message, Amazon SQS will deliver it again on another receive request. Apache Camel and the Simple Notification Service (SNS) The Simple Notification Service (SNS) acts like JMS topics. You create a topic, consumers subscribe to the topic and then receive notifications. Several transport protocols are supported: HTTP(S), Email and SQS. Further interfaces will be added in the future, e.g. the Short Message Service (SMS) for mobile phones. Contrary to S3 and SQS, Camel only offers a producer endpoint for this AWS service. You can only create topics and send messages via Camel. The reason is simple: Camel already offers endpoints for consuming these messages: HTTP, Email and SQS are already available. There is one tradeoff: A consumer cannot subscribe to topics using Camel – at the moment. The AWS Management Console has to be used. A very interesting discussion can be read on the Camel JIRA issue regarding the following questions: Should Camel be able to subscribe to topics? Should the producer contain this feature or should there be a consumer? In my opinion, there should be a consumer which is able to subscribe to topics, otherwise Camel is missing a key part of the AWS SNS service! Please read the discussion and contribute your opinion: https://issues.apache.org/jira/browse/CAMEL-3476. Apache Camel is already ready for the Cloud Computing Era AWS offers many more services for the cloud. Probably, it does not make sense to integrate everyone into Camel, but more AWS services will be supported in the future. For instance, SimpleDB and the Relational Database Service (RDS) are already planned and make sende, too: http://camel.apache.org/aws.html. The conclusion is easy: Apache Camel is already ready for the cloud computing era. Several important cloud services are already supported. Cloud integration will become very important in the future. Thus, Camel is on a very good way. Hopefully, we will see more cloud components, soon. I will continue to write articles about other Camel cloud components (and new AWS addons, ouf course). For instance, a component for the Platform as a Service (PaaS) product Google App Engine (GAE) is already available. If you have any additional important information, questions or other feedback, please write a comment. Thank you in advance… Best regards, Kai Wähner (Twitter: @KaiWaehner) [Content from my Blog: Cloud Integration with Apache Camel and Amazon Web Services (AWS): S3, SQS and SNS]

August 30, 2011

by Kai Wähner

CORE

· 26,195 Views

Concurrency Pattern: Producer and Consumer

In my career spanning 15 years, the problem of Producer and Consumer is one that I have come across only a few times. In most programming cases, what we are doing is performing functions in a synchronous fashion where the JVM or the web container handles the complexities of multi-threading on its own. However, when writing certain kinds of use cases where we need this. Last week, I came acros one such use case that sent me 3 years back when I last did it. However, the way it was done last time was very different. When I first heard the problem statement, I knew instantly what was needed. However, my approach to doing it this time was going to be different from last time. It had simply to do with how I am viewing technology in my life today. I will not go into any non-technical side and will jump straight into the problem and its solution. I started to look at what existed in the market and did come across a couple of posts that helped me in channelizing my thoughts in the right way. Problem Statement We need a solution for a batch migration. We are migrating data form System 1 to System 2 and in the process we need to do three tasks: Load data from Database based on groups Process the data Update the records loaded in step#1 with modifications We have to handle 100s of groups and each group will have around 40K records. You can imagine the amount of time it would take if we were to perform this exercise in a synchronous fashion. Image here explains this problem in an effective way. Producer Consumer: The Problem Producer and Consumer Pattern Let us take a look at the Producer Consumer pattern to begin with. If you refer to the problem statement above and look at the image, we see that there are so many entities who are ready with their part of data. However, there are not enough workers who can process all the data. Hence, as the producers continue to line-up in a queue it just continues to grow. We see that the systems start to hog up threads and take a lot of time. Intermediate Solution Producer Consumer: The Intermediate approch We do have an intermediate solution. Refer to the image and you will immediately notice that the producers are piling up their work in a filing cabinet and the worker continues to pick it up as they get done with the previous task. However, this approach does have some glaring shortcomings: There is still one worker who has to do all the work. The external systems may be happy, but the task will still continue to exist until the worker has completed all of the tasks The producers will pile up their data in a queue and it needs resources to hold the same. Just as in this example the cabinet can fill up, the same can happen with the JVM resources too. We need to be careful how much data we are going to place in memory and in some cases it may not be much. The Solution Producer Consumer: The Solution The solution is what we see everyday in many places – like the cinema hall queue, Petrol Pumps etc. There are so many people who come in to book a ticket and based on how many people come in, the more people are added to issue tickets. Essentially, refer to image here and you will notice that Producers will keep adding their jobs to the cabinet and we have more workers to handle the work load. Java provided concurrency package to solve this issue. Till now, I have always worked on threading at a much lower level and this was first time I was going to work with this package. As I started to explore the web and read fellow bloggers with what they have to say, I came across one very good article. It helped in understanding the use of BlockingQueue in a very effective manner. However, the solutions provided by Dhruba would not have helped me in achieving the high throughput which is needed. So, I started to explore the use of ArrayBlockingQueue for the same. The Controller This is the first class where the contract between the producers and consumers are managed. The controller will setup 1 thread for the Producer and 2 threads for the consumer. Based on the needs we can create as many threads as we need; and even can even read the data from a properties or do some dynamic magic. For now, we will keep this simple. package com.kapil.techieforever.producerconsumer; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; public class TestProducerConsumer { public static void main(String args[]) { try { Broker broker = new Broker(); ExecutorService threadPool = Executors.newFixedThreadPool(3); threadPool.execute(new Consumer("1", broker)); threadPool.execute(new Consumer("2", broker)); Future producerStatus = threadPool.submit(new Producer(broker)); // this will wait for the producer to finish its execution. producerStatus.get(); threadPool.shutdown(); } catch (Exception e) { e.printStackTrace(); } } } I am using ExecuteService to create a thread pool and manage it. Instead of using the basic Thread implementation, this is a more effective way as it will handle the exiting and restarting the threads as needed. You will also notice that I am using Future class to get the status of the producer thread. This class is very effective and will halt my program from further execution. This is a nice way of replacing the “.join” method on the threads. Note: I am not using Future very effectively in this example; so you may have to try a few things as you feel fit. Also, you should note the Broker class which is being used as filing cabinet between the producers and consumers. We will see its implementation in just a little while. The Producer This class is responsible for producing the data that needs to be worked upon. package com.kapil.techieforever.producerconsumer; public class Producer implements Runnable { private Broker broker; public Producer(Broker broker) { this.broker = broker; } @Override public void run() { try { for (Integer i = 1; i < 5 + 1; ++i) { System.out.println("Producer produced: " + i); Thread.sleep(100); broker.put(i); } this.broker.continueProducing = Boolean.FALSE; System.out.println("Producer finished its job; terminating."); } catch (InterruptedException ex) { ex.printStackTrace(); } } } This class is doing the most simplest of things that it can do – adding an integer to the broker. Some key areas to note are: 1. There is a property on Broker which is updated in the end by the producer when its done producing. This is also known as the “final” or “poison” entry. This is used by the consumers to know that there are no more data coming up 2. I have used Thread.sleep to simulate that some producers may take more time to produce the data. You can tweak this value and see the consumers act The Consumer This class is responsible for reading the data from the broker and doing its job package com.kapil.techieforever.producerconsumer; public class Consumer implements Runnable { private String name; private Broker broker; public Consumer(String name, Broker broker) { this.name = name; this.broker = broker; } @Override public void run() { try { Integer data = broker.get(); while (broker.continueProducing || data != null) { Thread.sleep(1000); System.out.println("Consumer " + this.name + " processed data from broker: " + data); data = broker.get(); } System.out.println("Comsumer " + this.name + " finished its job; terminating."); } catch (InterruptedException ex) { ex.printStackTrace(); } } } This is again a simple class that reads the Integer and prints it on the console. However, key points to note are: 1. The loop to process data is an endless loop, that runs on two conditions – until the producer is consuming and there is some data with the broker 2. Again, the Thread.sleep is used to create effective and different scenarios The Broker package com.kapil.techieforever.producerconsumer; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.TimeUnit; public class Broker { public ArrayBlockingQueue queue = new ArrayBlockingQueue(100); public Boolean continueProducing = Boolean.TRUE; public void put(Integer data) throws InterruptedException { this.queue.put(data); } public Integer get() throws InterruptedException { return this.queue.poll(1, TimeUnit.SECONDS); } } The very first thing to note is that we are using ArrayBlockingQueue as the data holder. I am not going to say what this does, but insist you to read it on the JavaDocs here. however, I will explain that the producers are going to place the data in the queue and the consumers will fetch from the queue in FIFO format. But, if the producers are slow, the consumers will wait for data to come in and if the array is full, the producers will wait for it to fill up. Also, note that I am using the ‘poll’ function instead of get in the queue. This is to ensure that the consumers will not keep waiting for ever and the waiting will time out after a few seconds. This helps us in inter-communication and kill the consumers when all the data is processed. (Note: try replacing poll with get and you will see some interesting outputs). Code I have the code sitting on Google project hosting. Feel free to go across and download it from there. It is essentially an eclipse (Spring STS) project. You may also get additional packages and classes when you download it based on when you are downloading it. Feel free to look into those too and share your comments - You can browse the source code on the SVN browser or; - You can download it from the project itself From http://scratchpad101.com/2011/08/22/concurrency-pattern-producer-consumer/

August 29, 2011

by Kapil Viren Ahuja

· 68,658 Views · 2 Likes

Java NIO vs. IO

when studying both the java nio and io api's, a question quickly pops into mind: when should i use io and when should i use nio? in this text i will try to shed some light on the differences between java nio and io, their use cases, and how they affect the design of your code. main differences of java nio and io the table below summarizes the main differences between java nio and io. i will get into more detail about each difference in the sections following the table. io nio stream oriented buffer oriented blocking io non blocking io selectors stream oriented vs. buffer oriented the first big difference between java nio and io is that io is stream oriented, where nio is buffer oriented. so, what does that mean? java io being stream oriented means that you read one or more bytes at a time, from a stream. what you do with the read bytes is up to you. they are not cached anywhere. furthermore, you cannot move forth and back in the data in a stream. if you need to move forth and back in the data read from a stream, you will need to cache it in a buffer first. java nio's buffer oriented approach is slightly different. data is read into a buffer from which it is later processed. you can move forth and back in the buffer as you need to. this gives you a bit more flexibility during processing. however, you also need to check if the buffer contains all the data you need in order to fully process it. and, you need to make sure that when reading more data into the buffer, you do not overwrite data in the buffer you have not yet processed. blocking vs. non-blocking io java io's various streams are blocking. that means, that when a thread invokes a read() or write(), that thread is blocked until there is some data to read, or the data is fully written. the thread can do nothing else in the meantime. java nio's non-blocking mode enables a thread to request reading data from a channel, and only get what is currently available, or nothing at all, if no data is currently available. rather than remain blocked until data becomes available for reading, the thread can go on with something else. the same is true for non-blocking writing. a thread can request that some data be written to a channel, but not wait for it to be fully written. the thread can then go on and do something else in the mean time. what threads spend their idle time on when not blocked in io calls, is usually performing io on other channels in the meantime. that is, a single thread can now manage multiple channels of input and output. selectors java nio's selectors allow a single thread to monitor multiple channels of input. you can register multiple channels with a selector, then use a single thread to "select" the channels that have input available for processing, or select the channels that are ready for writing. this selector mechanism makes it easy for a single thread to manage multiple channels. how nio and io influences application design whether you choose nio or io as your io toolkit may impact the following aspects of your application design: the api calls to the nio or io classes. the processing of data. the number of thread used to process the data. the api calls of course the api calls when using nio look different than when using io. this is no surprise. rather than just read the data byte for byte from e.g. an inputstream, the data must first be read into a buffer, and then be processed from there. the processing of data the processing of the data is also affected when using a pure nio design, vs. an io design. in an io design you read the data byte for byte from an inputstream or a reader. imagine you were processing a stream of line based textual data. for instance: name: anna age: 25 email: [email protected] phone: 1234567890 this stream of text lines could be processed like this: inputstream input = ... ; // get the inputstream from the client socket bufferedreader reader = new bufferedreader(new inputstreamreader(input)); string nameline = reader.readline(); string ageline = reader.readline(); string emailline = reader.readline(); string phoneline = reader.readline(); notice how the processing state is determined by how far the program has executed. in other words, once the first reader.readline() method returns, you know for sure that a full line of text has been read. the readline() blocks until a full line is read, that's why. you also know that this line contains the name. similarly, when the second readline() call returns, you know that this line contains the age etc. as you can see, the program progresses only when there is new data to read, and for each step you know what that data is. once the executing thread have progressed past reading a certain piece of data in the code, the thread is not going backwards in the data (mostly not). this principle is also illustrated in this diagram: java io: reading data from a blocking stream. a nio implementation would look different. here is a simplified example: bytebuffer buffer = bytebuffer.allocate(48); int bytesread = inchannel.read(buffer); notice the second line which reads bytes from the channel into the bytebuffer. when that method call returns you don't know if all the data you need is inside the buffer. all you know is that the buffer contains some bytes. this makes processing somewhat harder. imagine if, after the first read(buffer) call, that all what was read into the buffer was half a line. for instance, "name: an". can you process that data? not really. you need to wait until at leas a full line of data has been into the buffer, before it makes sense to process any of the data at all. so how do you know if the buffer contains enough data for it to make sense to be processed? well, you don't. the only way to find out, is to look at the data in the buffer. the result is, that you may have to inspect the data in the buffer several times before you know if all the data is inthere. this is both inefficient, and can become messy in terms of program design. for instance: bytebuffer buffer = bytebuffer.allocate(48); int bytesread = inchannel.read(buffer); while(! bufferfull(bytesread) ) { bytesread = inchannel.read(buffer); } the bufferfull() method has to keep track of how much data is read into the buffer, and return either true or false, depending on whether the buffer is full. in other words, if the buffer is ready for processing, it is considered full. the bufferfull() method scans through the buffer, but must leave the buffer in the same state as before the bufferfull() method was called. if not, the next data read into the buffer might not be read in at the correct location. this is not impossible, but it is yet another issue to watch out for. if the buffer is full, it can be processed. if it is not full, you might be able to partially process whatever data is there, if that makes sense in your particular case. in many cases it doesn't. the is-data-in-buffer-ready loop is illustrated in this diagram: java nio: reading data from a channel until all needed data is in buffer. summary nio allows you to manage multiple channels (network connections or files) using only a single (or few) threads, but the cost is that parsing the data might be somewhat more complicated than when reading data from a blocking stream. if you need to manage thousands of open connections simultanously, which each only send a little data, for instance a chat server, implementing the server in nio is probably an advantage. similarly, if you need to keep a lot of open connections to other computers, e.g. in a p2p network, using a single thread to manage all of your outbound connections might be an advantage. this one thread, multiple connections design is illustrated in this diagram: java nio: a single thread managing multiple connections. if you have fewer connections with very high bandwidth, sending a lot of data at a time, perhaps a classic io server implementation might be the best fit. this diagram illustrates a classic io server design: java io: a classic io server design - one connection handled by one thread. from http://tutorials.jenkov.com/java-nio/nio-vs-io.html

August 28, 2011

by Jakob Jenkov

· 134,071 Views · 19 Likes