DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Database Interaction with DAO and DTO Design Patterns
Learn what is a DAO and how to create a DAO, as well as the significance of creating Data Access Objects.
May 13, 2011
by Sandeep Bhandari
· 165,677 Views · 5 Likes
article thumbnail
Practical PHP Testing Patterns: Transaction Rollback Teardown
Maintaining isolation of tests when they have a database as Shared Fixture is not a trivial task. An important constraint is not having the headache of keeping track what manipulations on the database has your code done; in that case the rollback may not even be performed in case of a regression. An alternative way to resetting the database via DELETE and TRUNCATE queries is to roll back a transaction which has been started in the setup phase during the teardown. Implementation The phases of a database test involving Transaction Rollback Teardown are roughly the following: begin transaction, usually in setUp(). arrange, act, assert actions in the various Test Methods. rollback of the transaction in teardown(). The active transaction is never committed. An issue with using this pattern is that code that already uses transaction is prone to generate errors, and ultimately should never be tested with this technique. The rules for your safety are simple: the SUT should never start a transaction or committing it. Some databases support nested transaction levels, but it's very brittle to use them for testing purposes, and in case of any failure the whole suite will blow up as test executes teardowns at the wrong level of nesting. This pattern safety is also difficult to ensure, as DDL statements like CREATE/DELETE or other commands may commit the current transaction automatically. Check the documentation of your testing database. The advantage of this pattern is great performance: rollback is faster than every other command, including TRUNCATE. Moreover, if you encapsulate transactions well in your production code, most of it won't commit them (typically leaving the control over the transaction to an upper layer). Doctrine 2 In a sense, we already use this pattern with an UnitOfWork ORM such as Doctrine 2 when we do not flush() the ORM in our code. The flow is: The database is ready by setup. Exercise code. Check results as persisted or removed entities. Instead of calling flush() over the Entity Manager, call clear(). In this case, the database never sees a transaction, as Doctrine 2 keeps everything in the Unit Of Work until you say to flush it. Even when your code is calling flush(), you can explicitly use beginTransaction() and rollback() over the connection object: in this other scenario, the testing database sees an open transaction, but it's never committed and can be discarded in teardown() like the pattern prescribes. Example The code sample is the same test case shown in the Table Truncation Teardown article, which now uses transactions to encapsulate the single tests. The various tests check the tables content is restored, along with the AUTOINCREMENT next value. exec('CREATE TABLE users ( id INTEGER PRIMARY KEY AUTOINCREMENT, name VARCHAR(255) )'); } $this->connection = self::$sharedConnection; $this->connection->beginTransaction(); } public function teardown() { $this->connection->rollback(); } public function testTableCanBePopulated() { $this->connection->exec('INSERT INTO users (name) VALUES ("Giorgio")'); $this->assertEquals(1, $this->howManyUsers()); } public function testTableRestartsFrom1() { $this->assertEquals(0, $this->howManyUsers()); $this->connection->exec('INSERT INTO users (name) VALUES ("Isaac")'); $stmt = $this->connection->query('SELECT name FROM users WHERE id=1'); $result = $stmt->fetch(); $this->assertEquals('Isaac', $result['name']); } public function testTableIsEmpty() { $this->assertEquals(0, $this->howManyUsers()); } private function howManyUsers() { $stmt = $this->connection->query('SELECT COUNT(*) AS number FROM users'); $result = $stmt->fetch(); return $result['number']; } }
May 11, 2011
by Giorgio Sironi
· 7,738 Views
article thumbnail
Practical PHP Testing Patterns: Stored Procedure Test
It happened in the day before the advent of DDD and the Hexagonal architecture, that you had code that lived inside the database, such as Stored Procedures, constraints, and triggers. Back in the day, the relational database was considered the single source of truth instead of a Domain Model written in a language like PHP or Java. Today the picture is different - but there are still scenarios where pushing code in the database make sense. One of the reasons for having logic expressed as SQL and in other database languages is their power, and their performance. SQL operators, especially when augmented by proprietary extensions, let you declare pieces of logic that you would instead have to code by hand. SQL that is executed directly on the database can accomplish operations too onerous to perform over a reconstituted object graph with a subsequent saving. In fact, every decent ORM include a language for batch updates that translates to SQL, like Doctrine with DQL; and also a mechanism for providing hints for the underlying database, like indexes definitions. The problem with SQL derivates and other database-specific embedded logic is that we cannot execute it and test it in isolation - we need a real copy of the database to perform our tests. Thus the Stored Procedure Test is an umbrella term for tests that encompasses database code, even when they're not actually stored procedures. When I'll use the term stored procedure in this article, it will be to signify any database-specific code, such as complex queries, triggers and so on. Implementation The pattern prescribes to write unit tests for the stored procedure, to test it in isolation from the rest of application a first simplification. These tests cover nontrivial logic in database code - probably you don't need them for indexes definition, but more for queries with aggregate functions. In the PHP world, Sqlite often suffices for testing queries - as long as you have an intermediate layer like Doctrine DBAL (part of Doctrine 2) which smooths out the differences between vendors. You use MySQL in production, Sqlite in the test suite, and you can write queries in Doctrine's DQL being confident that it will translate them to the right SQL dialect. These tests should be executed in a sandbox - a database with just enough structure and data to test the stored procedure at hand. This sandbox should run by definition on the production dbms. The most difficult aspect of the pattern is integrating with the dbms: it should be running and listening on the right port. A sandbox should be created in setUp() or setUpBeforeClass(), and destroyed during teardown. In case the database is not available, the tests should be marked as skipped or incomplete. Variations In In-Database Stored Procedure Tests, the test is written in the same language as the database code. I cannot imagine something more boring for a PHP programmer. In Remoted Stored Procedure Tests, which is the variation of interest for us, the tests are written in PHPUnit and integrated with the suite (slowing it down a bit). The logic is that whatever SQL logic you're going to add to your application, is already encapsulated in some PHP class: for example, complex queries are encapsulated in Repositories or DAO. So it's going to be feasible to build a sandbox via PHP code, and test the stored procedure as a black box. It will be encapsulated for a unique execution, like schema creation, or for executing multiple times in case of queries. Example The example shows you how to test a query with a real database - supposing a surrogate database does not support all the needed functions - from inside a test suite. I thought it would be difficult to write this test, but instead it required less than a Pomodoro. connection = new PDO("mysql:host=localhost;dbname=sandbox", 'root', ''); $this->connection->exec("CREATE TABLE users (name VARCHAR(255) NOT NULL PRIMARY KEY, year YEAR)"); $this->repository = new UserRepository($this->connection, 2011); } public function testAverageAgeIsCalculated() { $this->insertUser('Giorgio', 1942); $this->insertUser('Isaac', 1920); $this->assertEquals(80, $this->repository->getAverageAge()); } private function insertUser($name, $year) { $stmt = $this->connection->prepare("INSERT INTO users (name, year) VALUES (:name, :year)"); $stmt->bindValue('name', $name, PDO::PARAM_STR); $stmt->bindValue('year', $year, PDO::PARAM_INT); return $stmt->execute(); } public function tearDown() { $this->connection->exec('DROP TABLE users'); } } class UserRepository { private $connection; private $currentYear; public function __construct(PDO $connection, $currentYear) { $this->connection = $connection; $this->currentYear = $currentYear; } /** * We suppose AVG() cannot be correctly implemented by Sqlite or * another surrogate database (substitute another vendor feature * for the same effect). * We also suppose reconstituting millions of User objects to calculate * their average age isn't feasible: that's why we used SQL directly. */ public function getAverageAge() { $stmt = $this->connection->prepare('SELECT AVG(:year - year) AS average_age FROM users'); $stmt->bindValue('year', $this->currentYear, PDO::PARAM_INT); $stmt->execute(); $row = $stmt->fetch(); return $row['average_age']; } }
May 4, 2011
by Giorgio Sironi
· 2,790 Views
article thumbnail
Reasons for Slow Database Performance
Usually there are scenarios where the application does not perform as expected. A simple web page which fetches data from database and displays optimizes it for mobiles should be fast and turnaround times should be less than 30 seconds on a good network connection. But still there are cases where these kinds of applications suffer the most performance issues. This is because the database in these cases is not designed by giving proper attention to the application requirements. You can change one application design even after delivery but changing the database design once a number of application have been integrated with it is like explosion. Here I am giving some points which should be kept in mind while designing application and database. Only basic idea is being provided. For details one can search each topic on the web as each point expands well to multiple articles. 1) Bind Variables: When a SQL query is sent to the database engine for processing and sending the result, it is compiled by the database compiler to get the tokens of the query. This involves parsing, optimizing and identifying the query. After a number of steps, the SQL query is passed to the database engine for processing. In a small application with a user base of less than 500, it is usually the same query which is executed more often than others. The use of bind variables helps in storing the compiled query once and executing it with different data at different times. For using bind variables, one needs to use PreparedStatement objects in Java. 2) Query is not well formed: Usually the same SQL query can be written in multiple ways. There are ways by which a query can be optimized to give the best performance. The corresponding SQL construct should be chosen depending upon requirement. I have scenarios where people have used WHERE clause instead of GROUP BY and are complaining of poor response times. Similarly Sub queries and Joins complement each other. 3) Database structure is not well defined/normalized: This is probably known to everybody that the database tables should be properly normalized as this is part of every DBMS course at graduation level. If the tables are not properly designed and normalized, anomalies set in. 4) Proper caching is not in place: Many applications make use of temporary caches on the application server to store the reference data or frequently accessed data as memory is less of an issue than the time with new generation servers. 5) Number of rows in the table too large: If the table itself has too much of data then the queries will take time to execute. Partitioning a table into multiple tables is recommended in these situations. For example: If a table has employee records of 1000000 employees then it could be split into 5 small tables each having 200000 rows. The advantage is we know beforehand in which smaller table to look for a particular employee code as the division of large table can be done on the employee id column. 6) Connections are not being pooled: If connections are not pooled then the each time a new connection is requested for a request to database. Maintaining a connection pool is much better than creating and destroying the connection for executing every SQL query. Of course, there are frameworks like Hibernate which take care of creating the connection pools and also allow the customization of these pools 7) Connections not closed/returned to pool in case of exceptions: When an exception occurs while performing database operations, it ought to be caught. Usually catching the exception is not the issue because SQLException is a checked exception but closing the connection is something that most of the times is left out. If the connection is not released, the same connection cannot be used for any other purpose till the connection is timed out. 8) Stored procedures for complex computations on database: Stored procedures are a good way to perform database intensive operations. This is because they are already compiled and there is less network trips for getting the same results as compared to SQL queries. From http://extreme-java.blogspot.com/2011/04/reasons-for-slow-database-performance.html
April 30, 2011
by Sandeep Bhandari
· 59,803 Views · 1 Like
article thumbnail
Hammurabi - A Scala Rule Engine
One of the most common reasons why software projects fail, or suffer unbearable delays, is the misunderstandings between the analysts who define the business rules of the domain for which the software is going to be written and the developers who have to code these rules. The latter write those rules in a language that is completely obscure for the first ones. In this way the business analysts don't have a chance to read, understand and validate what the programmers developed and then they can only empirically test the final software behavior, hardly covering all the possible corner cases and often recognizing mistakes only when it is too late. What Hammurabi is Hammurabi is a rule engine written in Scala that tries to leverage the features of this language making it particularly suitable to implement extremely readable internal Domain Specific Languages. Indeed, what actually makes Hammurabi different from all other rule engines is that it is possible to write and compile its rules directly in the host language. Anyway the Hammurabi's rules also have the important property of being readable even by non technical person. As usual a practical example worth more than a thousand words. The golfers problem This logical puzzle has been taken from the first chapter of the Jess in Action book written by Ernest Friedman-Hill and published by Manning. It is described there as it follows: A foursome of golfers is standing at a tee, in a line from left to right. Each golfer wears different colored pants; one is wearing red pants. The golfer to Fred’s immediate right is wearing blue pants. Joe is second in line. Bob is wearing plaid pants. Tom isn’t in position one or four, and he isn’t wearing the hideous orange pants. In what order will the four golfers tee off, and what color are each golfer’s pants?” The Jess solution Jess is written in Java and is one of the most popular rule engine on the market. The solution to the golfers problems presented in the book mentioned above is the following: first it is necessary to define the data structures representing the problem (deftemplate pants-color (slot of) (slot is)) (deftemplate position (slot of) (slot is)) A deftemplate is a bit like a class declaration in Java and in this case is used to write a first rule that in turns creates the facts representing each of the possible combinations of golfers, pants-color and positions: (defrule generate-possibilities => (foreach ?name (create$ Fred Joe Bob Tom) (foreach ?color (create$ red blue plaid orange) (assert (pants-color (of ?name)(is ?color))) ) (foreach ?position (create$ 1 2 3 4) (assert (position (of ?name)(is ?position))) ) ) ) After that it is possible to translate the sentences of the problem in the corresponding Jess rule: (defrule find-solution ;; There is a golfer named Fred, whose position is ?p1 ;; and pants color is ?c1 (position (of Fred) (is ?p1)) (pants-color (of Fred) (is ?c1)) ;; The golfer to immediate right of Fred ;; is wearing blue pants. (position (of ?n&~Fred)(is ?p&:(eq ?p (+ ?p1 1)))) (pants-color (of ?n&~Fred)(is blue&~?c1)) ;; Joe is in position #2 (position (of Joe) (is ?p2&2&~?p1)) (pants-color (of Joe) (is ?c2&~?c1)) ;; Bob is wearing the plaid pants (position (of Bob)(is ?p3&~?p1&~?p&~?p2)) (pants-color (of Bob&~?n)(is plaid&?c3&~?c1&~?c2)) ;; Tom is not in position 1 or 4 ;; and is not wearing orange (position (of Tom&~?n)(is ?p4&~1&~4&~?p1&~?p2&~?p3)) (pants-color (of Tom)(is ?c4&~orange&~blue&~?c1&~?c2&~?c3)) => (printout t Fred " " ?p1 " " ?c1 crlf) (printout t Joe " " ?p2 " " ?c2 crlf) (printout t Bob " " ?p3 " " ?c3 crlf) (printout t Tom " " ?p4 " " ?c4 crlf) ) where the rows starting with ;; are just comments. In this way if you enter the code for the problem into Jess and then run it, you get the answer directly: Fred 1 orange Joe 2 blue Bob 4 plaid Tom 3 red Note that the facts that the golfers are in different positions and wear pants of different colors is not expressed in an explicit rule but need to be spread and repeated in many statements. This solution is clearly difficult to be maintained and doesn't scale well as underlined by the last condition statement stating that the position ?p4 of Tom is ?p4&~1&~4&~?p1&~?p2&~?p3 where ~ means not in the Jess language. In other words it says that the Tom's position is not only different from the position 1 and 4 but it is also different from the positions of all the other golfers (named one by one) formerly defined. Actually the needs to describe a golfer's position also as the negation of the positions of all the other golfers implies something even worse: it is not possible to translate each sentence of the problem in a different rule, but they have to be combined together in a single big rule. After this huge if part, its then section (the one after the => symbol) prints out a table containing the set of variables ?p1…?p4 and ?c1…?c4 that solves the problem. The Hammurabi solution As done while presenting the Jess solution, also with Hammurabi the first thing to do is to define the domain of the problem. In order to do that, since the Hammurabi rules are valid Scala statements, it is sufficient to create a plain Scala Person class having as attributes the name, the position and the color of the pants of the golfer that it represents: class Person(n: String) { val name = n var pos: Int = _ var color: String = _ } Then we can model the fact that all the golfers must have different position and pants color by putting them in 2 different Set: var availablePos = (1 to 4).toSet var availableColors = Set("blue", "plaid", "red", "orange") and write two small methods that pull off them from the corresponding set once they have been assigned to a specific golfer: val assign = new { def color(color: String) = new { def to(person: Person) = { person.color = color availableColors = availableColors - color } } def position(pos: Int) = new { def to(person: Person) = { person.pos = pos availablePos = availablePos - pos } } } Those methods are written in a quite weird way just to make even more readable the DSL that will be used to define the rules and to stress the idea that it is possible to write the rules in a valid Scala that could be easily understood by a non-technical person. Of course it is easy to go even further by encapsulating other concepts in some convenient methods as it has been done above. Now everything is ready to write the set of rules describing the golfers problem: val ruleSet = Set( rule ("Unique positions") let { val p = any(kindOf[Person]) when { (availablePos.size equals 1) and (p.pos equals 0) } then { assign position availablePos.head to p } }, rule ("Unique colors") let { val p = any(kindOf[Person]) when { (availableColors.size equals 1) and (p.color == null) } then { assign color availableColors.head to p } }, rule ("Joe is in position 2") let { val p = any(kindOf[Person]) when { p.name equals "Joe" } then { assign position 2 to p } }, rule ("Person to Fred’s immediate right is wearing blue pants") let { val p1 = any(kindOf[Person]) val p2 = any(kindOf[Person]) when { (p1.name equals "Fred") and (p2.pos equals p1.pos + 1) } then { assign color "blue" to p2 } }, rule ("Fred isn't in position 4") let { val possibleFredPos = availablePos - 4 val p = any(kindOf[Person]) when { (p.name equals "Fred") and (possibleFredPos.size == 1) } then { assign position possibleFredPos.head to p } }, rule ("Tom isn't in position 1 or 4") let { val possibleTomPos = availablePos - 1 - 4 val p = any(kindOf[Person]) when { (p.name equals "Tom") and (possibleTomPos.size equals 1) } then { assign position possibleTomPos.head to p } }, rule ("Bob is wearing plaid pants") let { val p = any(kindOf[Person]) when { p.name equals "Bob" } then { assign color "plaid" to p } }, rule ("Tom isn't wearing orange pants") let { val possibleTomColors = availableColors - "orange" val p = any(kindOf[Person]) when { (p.name equals "Tom") and (possibleTomColors.size equals 1) } then { assign color possibleTomColors.head to p } } ) Here the first 2 rules explicitly leverage the uniqueness of positions and pants colors by assigning the last available of them to the only person who still doesn't have one. The other rules just match one by one the sentences of the problem as it has been defined. Now it is possible to make Hammurabi solve the problem by creating the four golfers: val tom = new Person("Tom") val joe = new Person("Joe") val fred = new Person("Fred") val bob = new Person("Bob") add them to a working memory (the set of objects against which the rule engine will evaluate and fire the rules): val workingMemory = WorkingMemory(tom, joe, fred, bob) and letting the rule engine, initialized with the formerly defined set of rules, to work on it: RuleEngine(ruleSet) execOn workingMemory Working with immutable data structures Immutability is probably not something that should be enforced at all costs in Hammurabi, for a very simple reason: the largest part of the execution time is spent by Hammurabi, like all other rule engines, looking for rules that can actually be executed (fired), i.e. the ones for which the when condition is true. During this phase data are only read and never written, so immutability doesn't matter at all, and all the rules that can be fired are put in an agenda. During the subsequent phase the rules in the agenda MUST be fired one by one, since the execution of one of them could make false the when condition of another one. It means that lack of immutability shouldn't prevent the rule engine to safely run in parallel during the discovery phase. That said, it is also possible to obtain the same result working with immutable data structures. For example having an immutable person: case class Person(name: String, pos: Int = 0, color: String = null) it's enough to rewrite the methods that assign the position and pants color to the golfers as it follows: val assign = new { def color(color: String) = new { def to(person: Person) = { remove(person) produce(person.copy(color = color)) availableColors = availableColors - color } } def position(pos: Int) = new { def to(person: Person) = { remove(person) produce(person.copy(pos = pos)) availablePos = availablePos - pos } } } leaving all the rules unchanged. In this way the old version of the Person is removed from the working memory and a brand new one, with its position or color set accordingly to the fired rule, is produced and then added to the working memory itself. The methods remove and produce can be indeed used respectively to remove objects from the working memory and produce new objects, that could be then used to evaluate and fire other rules. Hammurabi internals In Hammurabi all the rules are evaluated (but not fired) in parallel basically by assigning each rule to a different Scala actor. The replacement of this actor implementation with one based on the upcoming Scala parallel collections is currently under evaluation but I decided to wait until this technology will be stable. As anticipated, the evaluation of the rule in parallel is safe because is a read only process. While the actors discover set of variables that can fire the rules they are evaluating, they add them to the rule engine's agenda. At the end of this evaluation process the rules are fired sequentially one by one after a re-evaluation of their application condition, because the execution of one of them could change the result of that condition for a subsequent one. All the rules are fired in no particular order unless a different priority has been specified for some of them. Indeed since sometimes there could be the need to treat some rules as special cases they have an optional property called salience that acts as a priority setting for that rule in order to allow activated rules with the highest salience to always fire first, followed by rules of lower salience. By default all rules have salience 0 but you can alter it in the rule definition as it follows: rule ("Important rule") withSalience 10 let { ... } rule ("Negligible rule") withSalience -5 let { ... } Each actor also records the set of values on which the rule it is responsible for has been already executed, in order to don't fire again the same rule on the same values. The evaluation phase and the firing one are then executed again and again until either there is no rule that can be fired or one of the rule during its firing phase invokes one of the methods exitWith, making the rule engine gracefully finishing returning a value representing the result of the whole evaluation process or failWith that cause the rule engine to terminate by throwing an exception. Of course if the rule engine stops just because there are no longer rules that can be fired, the result of the evaluation is represented by the whole working set (as in the former example) and you can read from there the values you are interested in. Further implementations and improvements At the moment it is only possible to take from the working memory the object(s) against which a given rule will be evaluated only selecting them by type, as shown in the statement: val p = any(kindOf[Person]) Further mechanisms to categorize and select those objects in a more precise way are under evaluation, even if it is already possible to limit them with a Boolean function as in the following example: val p = kindOf[Person] having (_.name == "Joe") This is useful also under a performance point of view because it dramatically lowers the number of combination that the rule engine needs to check before to find a rule that can be fired. For example the "Person to Fred’s immediate right is wearing blue pants" rule could be rewritten as it follows bringing the number of combination that has to be tried from 16 to 4: rule ("Person to Fred’s immediate right is wearing blue pants") let { val p1 = kindOf[Person] having (_.name == "Fred") val p2 = any(kindOf[Person]) when { p2.pos equals p1.pos + 1 } then { assign color "blue" to p2 } } I am also evaluating of directly feeding the working memory with a NoSQL database. In other words, with this solution, the data present in the db could represent the working memory itself. At the moment I am experimenting with MongoDB since is the one I know best, but if somebody has some other good idea or even better wants to collaborate with this project I'd be very glad of it. The version 0.1 of the Hammurabi rule engine has been just released and it is available here.
April 18, 2011
by Mario Fusco
· 27,823 Views
article thumbnail
Eradicating Non-Determinism in Tests
An automated regression suite can play a vital role on a software project, valuable both for reducing defects in production and essential for evolutionary design. In talking with development teams I've often heard about the problem of non-deterministic tests - tests that sometimes pass and sometimes fail. Left uncontrolled, non-deterministic tests can completely destroy the value of an automated regression suite. In this article I outline how to deal with non-deterministic tests. Initially quarantine helps to reduce their damage to other tests, but you still have to fix them soon. Therefore I discuss treatments for the common causes for non-determinism: lack of isolation, asynchronous behavior, remote services, time, and resource leaks. I've enjoyed watching ThoughtWorks tackle many difficult enterprise applications, bringing successful deliveries to many clients who have rarely seen success. Our experiences have been a great demonstration that agile methods, deeply controversial and distrusted when we wrote the manifesto a decade ago, can be used successfully. There are many flavors of agile development out there, but in what we do there is a central role for automated testing. Automated testing was a core approach to Extreme Programming from the beginning, and that philosophy has been the biggest inspiration to our agile work. So we've gained a lot of experience in using automated testing as a core part of software development. Automated testing can look easy when presented in a text book. And indeed the basic ideas are really quite simple. But in the pressure-cooker of a delivery project, trials come up that are often not given much attention in texts. As I know too well, authors have a habit of skimming over many details in order to get a core point across. In my conversations with our delivery teams, one recurring problem that we've run into is tests which have become unreliable, so unreliable that people don't pay much attention to whether they pass or fail. A primary cause of this unreliability is that some tests have become non-deterministic. A test is non-deterministic when it passes sometimes and fails sometimes, without any noticeable change in the code, tests, or environment. Such tests fail, then you re-run them and they pass. Test failures for such tests are seemingly random. Non-determinism can plague any kind of test, but it's particularly prone to affect tests with a broad scope, such as acceptance or functional tests. Why non-deterministic tests are a problem Non-deterministic tests have two problems, firstly they are useless, secondly they are a virulent infection that can completely ruin your entire test suite. As a result they need to be dealt with as soon as you can, before your entire deployment pipeline is compromised. I'll start with expanding on their uselessness. The primary benefit of having automated tests is that they provide bug detection mechanism by acting as regression tests[1]. When a regression test goes red, you know you've got an immediate problem, often because a bug has crept into the system without you realizing. Having such a bug detector has huge benefits. Most obviously it means that you can find and fix bugs just after they are introduced. Not just does this give you the warm fuzzies because you kill bugs quickly, it also makes it easier to remove them since you know the bug got in with the last set of changes that are fresh in your mind. As a result you know where to look for the bug, which is more than half the battle in squashing it. The second level of benefit is that as you gain confidence in your bug detector, you gain the courage to make big changes knowing that when you goof, the bug detector will go off and you can fix the mistake quickly. [2] Without this teams are frightened to make the changes code needs in order to be kept clean, which leads to a rotting of the code base and plummeting development speed. The trouble with non-deterministic tests is that when they go red, you have no idea whether its due to a bug, or just part of the non-deterministic behavior. Usually with these tests a non-deterministic failure is relatively common, so you end up shrugging your shoulders when these tests go red. Once you start ignoring a regression test failure, then that test is useless and you might as well throw it away. Indeed you really ought to throw a non-deterministic test away, since if you don't it has an infectious quality. If you have a suite of 100 tests with 10 non-deterministic tests in them, than that suite will often fail. Initially people will look at the failure report and notice that the failures are in non-deterministic tests, but soon they'll lose the discipline to do that. Once that discipline is lost, then a failure in the healthy deterministic tests will get ignored too. At that point you've lots the whole game and might as well get rid of all the tests. Quarantine My principal aim in this article is to outline common cases of non-deterministic tests and how to eliminate the non-determinism. But before I get there I offer one piece of essential advice: quarantine your non-deterministic tests. If you have non-deterministic tests keep them in a different test suite to your healthy tests. That way you'll you can continue to pay attention to what's going on with your healthy tests and get good feedback from them. Place any non-deterministic test in a quarantined area. (But fix quarantined tests quickly.) Then the question is what to do with the quarantined test suites. They are useless as regression tests, but they do have a future as work items for cleaning up. You should not abandon such tests, since any tests you have in quarantine are not helping you with your regression coverage. A danger here is that tests keep getting thrown into quarantine and forgotten, which means your bug detection system is eroding. As a result it's worthwhile to have a mechanism that ensures that tests don't stay in quarantine too long. I've come across various ways to do this. One is a simple numeric limit: e.g. only allow 8 tests in quarantine. Once you hit the limit you must spend time to clear all the tests out. This has the advantage of batching up your test-cleaning if that's how you like to do things. Another route is to put a time limit on how long a test may be in quarantine, such as no longer than a week. The general approach with quarantine is to take the quarantined tests out of the main deployment pipeline so that you still get your regular build process. However a good team can be more aggressive. Our Mingle team puts its quarantine suite into the deployment pipeline one stage after its healthy tests. That way it can get the feedback from the healthy tests but is also forced to ensure that it sorts out the quarantined tests quickly. [3] Lack of Isolation In order to get tests to run reliably, you must have clear control over the environment in which they run, so you have a well-known state at the beginning of the test. If one test creates some data in the database and leaves it lying around, it can corrupt the run of another test which may rely on a different database state. Therefore I find it's really important to focus on keeping tests isolated. Properly isolated tests can be run in any sequence. As you get to larger operational scope of functional tests, it gets progressively harder to keep tests isolated. When you are tracking down a non-determinism, lack of isolation is a common and frustrating cause. Keep your tests isolated from each other, so that execution of one test will not affect any others. There are a couple of ways to get isolation - either always rebuild your starting state from scratch, or ensure that each test cleans up properly after itself. In general I prefer the former, as it's often easier - and in particular easier to find the source of a problem. If a test fails because it didn't build up the initial state properly, then it's easy to see which test contains the bug. With clean-up, however, one test will contain the bug, but another test will fail - so it's hard to find the real problem. Starting from a blank state is usually easy with unit tests, but can be much harder with functional tests [4] - particularly if you have a lot of data in a database that needs to be there. Rebuilding the database each time can add a lot of time to test runs, so that argues for switching to a clean-up strategy. One trick that's handy when you're using databases, is to conduct your tests inside a transaction, and then to rollback the transaction at the end of the test. That way the transaction manager cleans up for you, reducing the chance of errors[5]. Another approach is to do a single build of a mostly-immutable starting fixture before running a group of tests. Then ensure that the tests don't change that initial state (or if they do, they reverse the changes in tear-down). This tactic is more error-prone than rebuilding the fixture for each test, but it may be worthwhile iff it takes too long to build the fixture each time. Although databases are a common cause for isolation problems, there are plenty of times you can get these in-memory too. In particular be aware with static data and singletons. A good example for this kind of problem is contextual environment, such as the currently logged in user. If you have an explicit tear-down in a test, be wary of exceptions that occur during the tear-down. If this happens the test can pass, but cause isolation failures for subsequent tests. So ensure that if you do get a problem in a tear-down, it makes a loud noise. Some people prefer to put less emphasis on isolation and more on defining clear dependencies to force tests to run in a specified order. I prefer isolation because it gives you more flexibility in running subsets of tests and parallelizing tests. Asynchronous Behavior Asynchrony is a boon that allows you to keep software responsive while taking on long term tasks. Ajax calls allow a browser to stay responsive while going back to the server for more data, asynchronous message allow a server process to communicate with other system without being tied to their laggardly latency. But in testing, asynchrony can be curse. The common mistake here is to throw in a sleep: //pseudo-code makeAsyncCall; sleep(aWhile); readResponse; This can bite you two ways. First off you'll want to set the sleep time to long enough that it gives plenty of time to get the response. But that means that you'll spend a lot of time idly waiting for the response, thus slowing down your tests. The second bite is that, however long you sleep, sometimes it won't be enough. There will be some change in environment that will cause you to exceed the sleep - and you'll get false failure. As a result I strongly urge you to never use bare sleeps like this. Never use bare sleeps to wait for asynchonous responses: use a callback or polling. There are basically two tactics you can do for testing an asynchronous response. The first is for the asynchronous service to take a callback which it can call when done. This is the best since it means you'll never have to wait any longer than you need to [6]. The biggest problem with this is that the environment needs to be able to do this and then the service provider needs to do it. This is one of the advantages of having the development team integrated with testing - if they can provide a callback then they will. The second option is to poll on the answer. This is more than just looking once, but looking regularly, something like this //pseudo-code makeAsyncCall startTime = Time.now; while(! responseReceived) { if (Time.now - startTime > waitLimit) throw new TestTimeoutException; sleep (pollingInterval); } readResponse The point of this approach is that you can set the pollingInterval to a pretty small value, and know that that's the maximum amount of dead time you'll lose to waiting for a response. This means you can set the waitLimit very high, which minimizes the chance of hitting it unless something serious has gone wrong. [7] Make sure you use a clear exception class that indicates this is a test timeout that's failing. This will help make it clear what's gone wrong should it happen, and perhaps allow a more sophisticated test harness to take account of this information in its display. The time values, in particular the waitLimit, should never be literal values. Make sure they are always values that can be easily set in bulk, either by using constants or set through the runtime environment. That way if you need to tweak them (and you will) you can tweak them all quickly. All this advice is handy for async calls where you expect a response from the provider, but how about those where there is no response. These are calls where we invoke a command on something and expect it to happen without any acknowledgment. This is the trickiest case since you can test for your expected response, but there's nothing to do to detect a failure other than timing-out. If the provider is something you're building you can handle this by ensuring the provider implements some way of indicating that it's done - essentially some form of callback. Even if only the testing code uses it, it's worth it - although often you'll find this kind of functionality is valuable for other purposes too[8]. If the provider is someone else's work, you can try persuasion, but otherwise may be stuck. Although this is also a case when using Test Doubles for remote services is worthwhile (which I'll discuss more in the next section). If you have a general failure in something asynchronous, such that it's not responding at all, then you'll always be waiting for timeouts and your test suite will take a long time to fail. To combat this it's a good idea to use a smoke test to check that the asynchronous service is responding at all and stop the test run right away if it isn't. Gerard Meszaros's book, xUnit Test Patterns, contains lots of good patterns for constructing tests. You can also often side-step the asynchrony completely. Gerard Meszaros's Humble Object pattern says that whenever you have some logic that's in a hard-to-test environment, you should isolate the logic you need to test from that environment. In this case it means put most of the logic you need to test in a place where you can test it synchronously. The asynchronous behavior should be as minimal (humble) as possible, that way you don't need that much testing of it. Remote Services Sometimes I'm asked if ThoughtWorks does any integration work, which I find somewhat amusing since there's hardly any project we do that doesn't involve a fair bit of integration. By their nature, enterprise applications involve a great deal of combining data from different systems. These systems are maintained by other teams operating to their own schedules, teams that often use a very different software philosophy to our heavily test-driven agile approach. Testing with such remote systems brings a number of problems, and non-determinism is high on the list. Often remote systems don't have test system we can call, which means hitting a live system. If there is a test system, it may not be stable enough to provide deterministic responses. In this situation it's vital to ensure determinism, so it's time to reach for a Test Double - a component that looks like the remote service, but is really just a pretend version that mimics the remote system's behavior. The double needs to be setup so that provides the right kind of response in interaction with our system, but in a way we control. In this manner we can ensure determinism. Using a double has a downside, in particular when we are testing across a broad scope. How can we be sure that the double behaves in the same way that remote system does? We can tackle this again using tests, a form of test that I call Integration Contract Tests. These run the same interaction with the remote system and the double, and check that the two match. In this case 'match' may not mean coming up with the same result (due to the non-determinisms), but results that share the same essential structure. Integration Contract Tests need to be run frequently, but not part of our system's deployment pipeline. Periodic running based on the rate of the change of the remote system is usually best. For writing these kinds of test doubles, I'm a big fan of Self Initializing Fakes - since these are very simple to manage. Some people are firmly against using Test Doubles in functional tests, believing that you must test with real connection in order to ensure end-to-end behavior. While I sympathize with their argument, automated tests are useless if they are non-deterministic. So any advantage you gain by talking to the real system is overwhelmed by the need to stamp out non-determinism[9]. Time Few things are more non-deterministic than a call to the system clock. Each time you call it, you get a new result, and any tests that depend on it can thus change. Ask for all the todos due in the next hour, and you regularly get a different answer[10]. The most important thing here is to ensure that you always wrap the system clock with routines that can be replaced with a seeded value for testing. A clock stub can be set to particular time and frozen at that time, allowing your tests to have complete control over its movements. That way you can synchronize your test data to the values in the seeded clock.[11][12] Always wrap the system clock, so it can be easily substituted for testing. One thing to watch with this, is that eventually your test data might start having problems because it's too old, and you get conflicts with other time based factors in your application. In this case you can move the data, and your clock seeds to new values. When you do this, ensure that this is the only thing you do. That way you can be sure that any tests that fail are due to time-movement in the test data. Another area where time can be a problem is when you rely on other behaviors from the clock. I once saw a system that generated random keys based on clock values. This systems started failing when it was moved to a faster machine that could allocate multiple ids within a single clock tick.[13] I've heard so many problems due to direct calls to the system clock that I'd argue for finding a way to use code analysis to detect any direct calls to the system clock and failing the build right there. Even a simple regex check might save you a frustrating debugging session after a call at an ungodly hour. Resource Leaks If your application has some kind of resource leak, this will lead to random tests failing, since it's just which test causes the resource leak to go over a limit that gets the failure. This case is awkward because any test can fail intermittently due to this problem. If it isn't a case of one test being non-deterministic then resource leaks are a good candidate to investigate. By resource leak, I mean any resource that the application has to manage by acquiring and releasing. In non-memory-managed environments, the obvious example is memory. Memory-management did much to remove this problem, but other resources still need to be managed, such as database connections. Usually the best way to handle these kind of resources is through a Resource Pool. If you do this then a good tactic is to configure the pool to a size of 1 and make it throw an exception should it get a request for a resource when it has none left to give. That way the first test to request a resource after the leak will fail - which makes it a lot easier to find the problem test. This idea of limiting resource pool sizes, is about increasing constraints to make errors more likely to crop up in tests. This is good because we want errors to show in tests so we can fix them before they manifest themselves in production. This principle can be used in other ways too. One story I heard was of a system which generated randomly named temporary files, didn't clean them up properly, and crashed on a collision. This kind of bug is very hard to find, but one way to manifest it is to stub the randomizer for testing so it always returns the same value. That way you can surface the problem more quickly.
April 14, 2011
by Martin Fowler
· 6,643 Views · 1 Like
article thumbnail
Solr + Hadoop = Big Data Love
Bixo Labs shows how to use Solr as a NoSQL solution for big data Many people use the Hadoop open source project to process large data sets because it’s a great solution for scalable, reliable data processing workflows. Hadoop is by far the most popular system for handling big data, with companies using massive clusters to store and process petabytes of data on thousands of servers. Since it emerged from the Nutch open source web crawler project in 2006, Hadoop has grown in every way imaginable – users, developers, associated projects (aka the “Hadoop ecosystem”). Starting at roughly the same time, the Solr open source project has become the most widely used search solution on planet Earth. Solr wraps the API-level indexing and search functionality of Lucene with a RESTful API, GUI, and lots of useful administrative and data integration functionality. The interesting thing about combining these two open source projects is that you can use Hadoop to crunch the data, and then serve it up in Solr. And we’re not talking about just free-text search; Solr can be used as a key-value store (i.e. a NoSQL database) via its support for range queries. Even on a single server, Solr can easily handle many millions of records (“documents” in Lucene lingo). Even better, Solr now supports sharding and replication via the new, cutting-edge SolrCloud functionality. Background I started using Hadoop & Solr about five years ago, as key pieces of the Krugle code search startup I co-founded in 2005. Back then, Hadoop was still part of the Nutch web crawler we used to extract information about open source projects. And Solr was fresh out of the oven, having just been released as open source by CNET. At Bixo Labs we use Hadoop, Solr, Cascading, Mahout, and many other open source technologies to create custom data processing workflows. The web is a common source of our input data, which we crawl using the Bixo open source project. The Problem During a web crawl, the state of the crawl is contained in something commonly called a “crawl DB”. For broad crawls, this has to be something that works with billions of records, since you need one entry for each known URL. Each “record” has the URL as the key, and contains important state information such as the time and result of the last request. For Hadoop-based crawlers such as Nutch and Bixo, the crawl DB is commonly kept in a set of flat files, where each file is a Hadoop “SequenceFile”. These are just a packed array of serialized key/value objects. Sometimes we need to poke at this data, and here’s where the simple flat-file structure creates a problem. There’s no easy way run queries against the data, but we can’t store it in a traditional database since billions of records + RDBMS == pain and suffering. Here is where scalable NoSQL solutions shine. For example, the Nutch project is currently re-factoring this crawl DB layer to allow plugging in HBase. Other options include Cassandra, MongoDB, CouchDB, etc. But for simple analytics and exploration on smaller datasets, a Solr-based solution works and is easier to configure. Plus you get useful and surprising fun functionality like facets, geospatial queries, range queries, free-form text search, and lots of other goodies for free. Architecture So what exactly would such a Hadoop + Solr system look like? As mentioned previously, in this example our input data comes from a Bixo web crawler’s CrawlDB, with one entry for each known URL. But the input data could just as easily be log files, or records from a traditional RDBMS, or the output of another data processing workflow. The key point is that we’re going to take a bunch of input data, (optionally) munge it into a more useful format, and then generate a Lucene index that we access via Solr. Hadoop For the uninitiated, Hadoop implements both a distributed file system (aka “HDFS”) and an execution layer that supports the map-reduce programming model. Typically data is loaded and transformed during the map phase, and then combined/saved during the reduce phase. In our example, the map phase reads in Hadoop compressed SequenceFiles that contain the state of our web crawl, and our reduce phase write out Lucene indexes. The focus of this article isn’t on how to write Hadoop map-reduce jobs, but I did want to show you the code that implements the guts of the job. Note that it’s not typical Hadoop key/value manipulation code, which is painful to write, debug, and maintain. Instead we use Cascading, which is an open source workflow planning and data processing API that creates Hadoop jobs from shorter, more representative code. The snippet below reads SequenceFiles from HDFS, and pipes those records into a sink (output) that stores them using a LuceneScheme, which in turn saves records as Lucene documents in an index. Tap source = new Hfs(new SequenceFile(CRAWLDB_FIELDS), inputDir); Pipe urlPipe = new Pipe("crawldb urls"); urlPipe = new Each(urlPipe, new ExtractDomain()); Tap sink = new Hfs(new LuceneScheme(SOLR_FIELDS, STORE_SETTINGS, INDEX_SETTINGS, StandardAnalyzer.class, MAX_FIELD_LENGTH), outputDir, true); FlowConnector fc = new FlowConnector(); fc.connect(source, sink, urlPipe).complete(); We defined CRAWLDB_FIELDS and SOLR_FIELDS to be the set of input and output data elements, using names like “url” and “status”. We take advantage of the Lucene Scheme that we’ve created for Cascading, which lets us easily map from Cascading’s view of the world (records with fields) to Lucene’s index (documents with fields). We don’t have a Cascading Scheme that directly supports Solr (wouldn’t that be handy?), but we can make-do for now since we can do simple analysis for this example. We indexed all of the fields so that we can perform queries against them. Only the status message contains normal English text, so that’s the only one we have to analyze (i.e., break the text up into terms using spaces and other token delimiters). In addition, the ExtractDomain operation pulls the domain from the URL field and builds a new Solr field containing just the domain. This will allow us to do queries against the domain of the URL as well as the complete URL. We could also have chosen to apply a custom analyzer to the URL to break it into several pieces (i.e., protocol, domain, port, path, query parameters) that could have been queried individually. Running the Hadoop Job For simplicity and pay-as-you-go, it’s hard to beat Amazon’s EC2 and Elastic Mapreduce offerings for running Hadoop jobs. You can easily spin up a cluster of 50 servers, run your job, save the results, and shut it down – all without needing to buy hardware or pay for IT support. There are many ways to create and configure a Hadoop cluster; for us, we’re very familiar with the (modified) EC2 Hadoop scripts that you can find in the Bixo distribution. Step-by-step instructions are available at http://openbixo.org/documentation/running-bixo-in-ec2/ The code for this article is available via GitHub, at http://github.com/bixolabs/hadoop2solr. The README displayed on that page contains step-by-step instructions for building and running the job. After the job is done, we’ll copy the resulting index out of the Hadoop distributed file system (HDFS) and onto the Hadoop cluster’s master server, then kill off the one slave we used. The Hadoop master is now ready to be configured as our Solr server. Solr On the Solr side of things, we need to create a schema that matches the index we’re generating. The key section of our schema.xml file is where we define the fields. These fields have a one-to-one correspondence with the SOLR_FIELDS we defined in our Hadoop workflow. They also need to use the same Lucene settings as what we defined in the static IndexWorkflow.java STORE_SETTINGS and INDEX_SETTINGS. Once we have this defined, all that’s left is to set up a server that we can use. To keep it simple, we’ll use the single EC2 instance in Amazon’s cloud (m1.large) that we used as our master for the Hadoop job, and run the simple Solr search server that relies on embedded Jetty to provide the webapp container. Similar to the Hadoop job, step-by-step instructions are in the README for the hadoop2solr project on GitHub. But in a nutshell, we’ll copy and unzip a Solr 1.4.1 setup on the EC2 server, do the same for our custom Solr configuration, create a symlink to the index, and then start it running with: Giving it a Try Now comes the interesting part. Since we opened up the default Jetty port used by Solr (8983) on this EC2 instance, we can directly access Solr’s handy admin console by pointing our browser at http://:8983/solr/admin % cd solr % java -Dsolr.solr.home=../solr-conf -Dsolr.data.dir=../solr-data -jar start.jar From here we can run queries against Solr: We can also use curl to talk to the server via HTTP requests: curl http://:8983/solr/select/?q=-status%3AFETCHED+and+-status%3AUNFETCHED The response is XML by default. Below is an example of the response from the above request, where we found 2,546 matches in 94ms. Now here’s what I find amazing. For an index of 82 million documents, running on a fairly wimpy box (EC2 m1.large = 2 virtual cores), the typical response time for a simple query like “status:FETCHED” is only 400 milliseconds, to find 9M documents. Even a complex query such as (status not FETCHED and not UNFETCHED) only takes six seconds. Scaling Obviously we could use beefier boxes. If we switched to something like m1.xlarge (15GB of memory, 4 virtual cores) then it’s likely we could handle upwards of 200M “records” in our Solr index and still get reasonable response times. If we wanted to scale beyond a single box, there are a number of solutions. Even out of the box Solr supports sharding, where your HTTP request can specify multiple servers to use in parallel. More recently, the Solr trunk has support for SolrCloud. This uses the ZooKeeper open source project to simplify coordination of multiple Solr servers. Finally, the Katta open source project supports Lucene-level distributed search, with many of the features needed for production quality distributed search that have not yet been added to SolrCloud. Summary The combination of Hadoop and Solr makes it easy to crunch lots of data and then quickly serve up the results via a fast, flexible search & query API. Because Solr supports query-style requests, it’s suitable as a NoSQL replacement for traditional databases in many situations, especially when the size of the data exceeds what is reasonable with a typical RDBMS. Solr has some limitations that you should be aware of, specifically: · Updating the index works best as a batch operation. Individual records can be updated, but each commit (index update) generates a new Lucene segment, which will impact performance. · Current support for replication, fail-over, and other attributes that you’d want in a production-grade solution aren’t yet there in SolrCloud. If this matters to you, consider Katta instead. · Many SQL queries can’t be easily mapped to Solr queries. The code for this article is available via GitHub, at http://github.com/bixolabs/hadoop2solr. The README displayed on that page contains additional technical details.
April 4, 2011
by Ken Krugler
· 119,523 Views
article thumbnail
Java Access to SQL Azure via the JDBC Driver for SQL Server
I’ve written a couple of posts (here and here) about Java and the JDBC Driver for SQL Server with the promise of eventually writing about how to get a Java application running on the Windows Azure platform. In this post, I’ll deliver on that promise. Specifically, I’ll show you two things: 1) how to connect to a SQL Azure Database from a Java application running locally, and 2) how to connect to a SQL Azure database from an application running in Windows Azure. You should consider these as two ordered steps in moving an application from running locally against SQL Server to running in Windows Azure against SQL Azure. In both steps, connection to SQL Azure relies on the JDBC Driver for SQL Server and SQL Azure. The instructions below assume that you already have a Windows Azure subscription. If you don’t already have one, you can create one here: http://www.microsoft.com/windowsazure/offers/. (You’ll need a Windows Live ID to sign up.) I chose the Free Trial Introductory Special, which allows me to get started for free as long as keep my usage limited. (This is a limited offer. For complete pricing details, see http://www.microsoft.com/windowsazure/pricing/.) After you purchase your subscription, you will have to activate it before you can begin using it (activation instructions will be provided in an email after signing up). Connecting to SQL Azure from an application running locally I’m going to assume you already have an application running locally and that it uses the JDBC Driver for SQL Server. If that isn’t the case, then you can start from scratch by following the steps in this post: Getting Started with the SQL Server JDBC Driver. Once you have an application running locally, then the process for running that application with a SQL Azure back-end requires two steps: 1. Migrate your database to SQL Azure. This only takes a couple of minutes (depending on the size of your database) with the SQL Azure Migration Wizard - follow the steps in the Creating a SQL Azure Server and Creating a SQL Azure Database sections of this post. 2. Change the database connection string in your application. Once you have moved your local database to SQL Azure, you only have to change the connection string in your application to use SQL Azure as your data store. In my case (using the Northwind database), this meant changing this… String connectionUrl = "jdbc:sqlserver://serverName\\sqlexpress;" + "database=Northwind;" + "user=UserName;" + "password=Password"; …to this… String connectionUrl = "jdbc:sqlserver://xxxxxxxxxx.database.windows.net;" + "database=Northwind;" + "user=UserName@xxxxxxxxxx;" + "password=Password"; (where xxxxxxxxxx is your SQL Azure server ID). Connecting to SQL Azure from an application running in Windows Azure The heading for this section might be a bit misleading. Once you have a locally running application that is using SQL Azure, then all you have to do is move your application to Windows Azure. The connecting part is easy (see above), but moving your Java application to Windows Azure takes a bit more work. Fortunately, Ben Lobaugh has written a great post that that shows how to use the Windows Azure Starter Kit for Java to get a Java application (a JSP application, actually) running in Windows Azure: Deploying a Java application to Windows Azure with Command-Line Ant. (If you are using Eclipse, see Ben’s related post: Deploying a Java application to Windows Azure with Eclipse.) I won’t repeat his work here, but I will call out the steps I took in modifying his instructions to deploy a simple JSP page that connects to SQL Azure. 1. Add the JDBC Driver for SQL Server to the Java archive. One step in Ben’s tutorial (see the Select the Java Runtime Environment section) requires that you create a .zip file from your local Java installation and add it to your Java/Azure application. Most likely, your local Java installation references the JDBC driver by setting the classpath environment variable. When you create a .zip file from your java installation, the JDBC driver will not be included and the classpath variable will not be set in the Azure environment. I found the easiest way around this was to simply add the sqljdbc4.jar file (probably located in C:\Program Files\Microsoft SQL Server JDBC Driver\sqljdbc_3.0\enu) to the \lib\ext directory of my local Java installation before creating the .zip file. Note: You can put the JDBC driver in a separate directory, include it when you create the .zip folder, and set the classpath environment variable in the startup.bat script. But, I found the above approach to be easier. 2. Modify the JSP page. Instead of the code Ben suggests for the HelloWorld.jsp file (see the Prepare your Java Application section), use code from your locally running application. In my case, I just used the code from this post after changing the connection string and making a couple minor JSP-specific changes: Northwind Customers That’s it!. To summarize the steps… Migrate your database to SQL Azure with the SQL Azure Migration Wizard. Change the database connection in your locally running application. Use the Windows Azure Starter Kit for Java to move your application to Windows Azure. (You’ll need to follow instructions in this post and instructions above.) Thanks. -Brian
March 30, 2011
by Brian Swan
· 18,884 Views
article thumbnail
IBatis (MyBatis): Working with Dynamic Queries (SQL)
this tutorial will walk you through how to setup ibatis ( mybatis ) in a simple java project and will present how to work with dynamic queries (sql). pre-requisites for this tutorial i am using: ide: eclipse (you can use your favorite one) database: mysql libs/jars: mybatis , mysql conector and junit (for testing) this is how your project should look like: sample database please run the script into your database before getting started with the project implementation. you will find the script (with dummy data) inside the sql folder. 1 – article pojo i represented the pojo we are going to use in this tutorial with a uml diagram, but you can download the complete source code in the end of this article. the goal of this tutorial is to demonstrate how to retrieve the article information from database using dynamic sql to filter the data. 2 – article mapper – xml one of the most powerful features of mybatis has always been its dynamic sql capabilities. if you have any experience with jdbc or any similar framework, you understand how painful it is to conditionally concatenate strings of sql together, making sure not to forget spaces or to omit a comma at the end of a list of columns. dynamic sql can be downright painful to deal with. while working with dynamic sql will never be a party, mybatis certainly improves the situation with a powerful dynamic sql language that can be used within any mapped sql statement. the dynamic sql elements should be familiar to anyone who has used jstl or any similar xml based text processors. in previous versions of mybatis, there were a lot of elements to know and understand. mybatis 3 greatly improves upon this, and now there are less than half of those elements to work with. mybatis employs powerful ognl based expressions to eliminate most of the other elements. if choose (when, otherwise) trim (where, set) foreach let’s explain each one with examples. 1 – first scenario : we want to retrieve all the articles from database with an optional filter: title. in other words, if user specify an article title, we are going to retrieve the articles that match with the title, otherwise we are going to retrieve all the articles from database. so we are going to implement a condition (if) : select id, title, author from article where id_status = 1 and title like #{title} 2 – second scenario : now we have two optional filters: article title and author. the user can specify both, none or only one filter. so we are going to implement two conditions: select id, title, author from article where id_status = 1 and title like #{title} and author like #{author} 3 – third scenario : now we want to give the user only one option: the user will have to specify only one of the following filters: title, author or retrieve all the articles from ibatis category. so we are going to use a choose element: select id, title, author from article where id_status = 1 and title like #{title} and author like #{author} and id_category = 3 4 – fourth scenario : take a look at all three statements above. they all have a condition in common: where id_status = 1. it means we are already filtering the active articles let’s remove this condition to make it more interesting. select id, title, author from article where title like #{title} and author like #{author} what if both title and author are null? we are going to have the following statement: select id, title, authorfrom articlewhere and what if only the author is not null? we are going to have the fololwing statement: select id, title, authorfrom articlewhereand author like #{author} and both fails! how to fix it? 5 – fifth scenario : we want to retrieve all the articles with two optional filters: title and author. to avoid the 4th scenatio, we are going to use a where element: select id, title, author from article title like #{title} and author like #{author} mybatis has a simple answer that will likely work in 90% of the cases. and in cases where it doesn’t, you can customize it so that it does. the where element knows to only insert “where” if there is any content returned by the containing tags. furthermore, if that content begins with “and” or “or”, it knows to strip it off. if the where element does not behave exactly as you like, you can customize it by defining your own trim element. for example,the trim equivalent to the where element is: select id, title, author from article title like #{title} and author like #{author} the overrides attribute takes a pipe delimited list of text to override, where whitespace is relevant. the result is the removal of anything specified in the overrides attribute, and the insertion of anything in the with attribute. you can also use the trim element with set. 6 – sixth scenario : the user will choose all the categories an article can belong to. so in this case, we have a list (a collection), and we have to interate this collection and we are going to use a foreach element: select id, title, author from article title like #{title} and author like #{author} the foreach element is very powerful, and allows you to specify a collection, declare item and index variables that can be used inside the body of the element. it also allows you to specify opening and closing strings, and add a separator to place in between iterations. the element is smart in that it won’t accidentally append extra separators. note : you can pass a list instance or an array to mybatis as a parameter object. when you do, mybatis will automatically wrap it in a map, and key it by name. list instances will be keyed to the name “list” and array instances will be keyed to the name “array”. the complete article.xml file looks like this: select id, title, author from article where id_status = 1 and title like #{title} select id, title, author from article where id_status = 1 and title like #{title} and author like #{author} select id, title, author from article where id_status = 1 and title like #{title} and author like #{author} and id_category = 3 select id, title, author from article title like #{title} and author like #{author} select id, title, author from article title like #{title} and author like #{author} select id, title, author from article where id_category in #{category} download if you want to download the complete sample project, you can get it from my github account: https://github.com/loiane/ibatis-dynamic-sql if you want to download the zip file of the project, just click on download: there are more articles about ibatis to come. stay tooned! happy coding! from http://loianegroner.com/2011/03/ibatis-mybatis-working-with-dynamic-queries-sql/
March 23, 2011
by Loiane Groner
· 72,581 Views
article thumbnail
IBatis (MyBatis): Discriminator Column Example – Inheritance Mapping Tutorial
This tutorial will walk you through how to setup iBatis (MyBatis) in a simple Java project and will present an example using a discriminator column, in another words it is a inheritance mapping tutorial. Pre-Requisites For this tutorial I am using: IDE: Eclipse (you can use your favorite one) DataBase: MySQL Libs/jars: Mybatis, MySQL conector and JUnit (for testing) This is how your project should look like: Sample Database Please run the script into your database before getting started with the project implementation. You will find the script (with dummy data) inside the sql folder. 1 – POJOs – Beans I represented the beans here with a UML model, but you can download the complete source code in the end of this article. As you can see on the Data Modeling Diagram and the UML diagram above, we have a class Employee and two subclasses: Developer and Manager. The goal of this tutorial is to retrieve all the Employees from the database, but these employees can be an instance of Developer or Manager, and we are going to use a discriminator column to see which class we are going to instanciate. 2 – Employee Mapper – XML Sometimes a single database query might return result sets of many different (but hopefully somewhat related) data types. The discriminator element was designed to deal with this situation, and others, including class inheritance hierarchies. The discriminator is pretty simple to understand, as it behaves much like a switch statement in Java. A discriminator definition specifies column and javaType attributes. The column is where MyBatis will look for the value to compare. The javaType is required to ensure the proper kind of equality test is performed (although String would probably work for almost any situation). SELECT id, name, employee_type, manager_id, info, developer_id, product FROM employee E left join manager M on M.employee_id = E.id left join developer D on D.employee_id = E.id In this example, MyBatis would retrieve each record from the result set and compare its employee type value. If it matches any of the discriminator cases, then it will use the resultMap specified by the case. This is done exclusively, so in other words, the rest of the resultMap is ignored (unless it is extended, which we talk about in a second). If none of the cases match, then MyBatis simply uses the resultMap as defined outside of the discriminator block. So, if the managerResult was declared as follows: Now all of the properties from both the managerResult and developerResult will be loaded. Once again though, some may find this external definition of maps somewhat tedious. Thereforethere’s an alternative syntax for those that prefer a more concise mapping style. For example: SELECT id, name, employee_type, manager_id, info, developer_id, product FROM employee E left join manager M on M.employee_id = E.id left join developer D on D.employee_id = E.id Remember that these are all Result Maps, and if you don’t specify any results at all, then MyBatis willautomatically match up columns and properties for you. So most of these examples are more verbosethan they really need to be. That said, most databases are kind of complex and it’s unlikely that we’ll beable to depend on that for all cases. 3 - Employee Mapper – Annotations We did the configuration in XML, now let’s try to use annotations to do the same thing we did using XML. This is the code for EmployeeMapper.java: package com.loiane.data; import java.util.List; import org.apache.ibatis.annotations.Case;import org.apache.ibatis.annotations.Result;import org.apache.ibatis.annotations.Select;import org.apache.ibatis.annotations.TypeDiscriminator; import com.loiane.model.Developer;import com.loiane.model.Employee;import com.loiane.model.Manager; public interface EmployeeMapper { final String SELECT_EMPLOYEE = "SELECT id, name, employee_type, manager_id, info, developer_id, product " + "FROM employee E left join manager M on M.employee_id = E.id " + "left join developer D on D.employee_id = E.id "; /** * Returns the list of all Employee instances from the database. * @return the list of all Employee instances from the database. */ @Select(SELECT_EMPLOYEE) @TypeDiscriminator(column = "employee_type", cases = { @Case (value="1", type = Manager.class, results={ @Result(property="managerId", column="manager_id"), @Result(property="info"), }), @Case (value="2", type = Developer.class, results={ @Result(property="developerId", column="developer_id"), @Result(property="project", column="product"), }) }) List getAllEmployeesAnnotation();} If you are reading this blog lately, you are already familiar with the @Select and @Result annotations. So let’s skip it. Let’s talk about the @TypeDiscriminator and @Case annotations. @TypeDiscriminator A group of value cases that can be used to determine the result mapping to perform. Attributes: column, javaType, jdbcType, typeHandler, cases. The cases attribute is an array of Cases. @Case A single case of a value and its corresponding mappings. Attributes: value, type, results. The results attribute is an array of Results, thus this Case Annotation is similar to an actual ResultMap, specified by the Results annotation below. In this example: We set the column atribute for @TypeDiscriminator to determine which column MyBatis will look for the value to compare. And we set an array of @Case. For each @Case we set the value, so if the column matches the value, MyBatis will instanciate a object of type we set and we also set an array of @Result to match column with class atribute. Note one thing: using XML we set the id and name properties. We did not set these properties using annotations. It is not necessary, because the column matches the atribute name. But if you need to set, it is going to look like this: @Select(SELECT_EMPLOYEE)@Results(value = { @Result(property="id"), @Result(property="name")})@TypeDiscriminator(column = "employee_type", cases = { @Case (value="1", type = Manager.class, results={ @Result(property="managerId", column="manager_id"), @Result(property="info"), }), @Case (value="2", type = Developer.class, results={ @Result(property="developerId", column="developer_id"), @Result(property="project", column="product"), })})List getAllEmployeesAnnotation(); 4 – EmployeeDAO In the DAO, we have two methods: the first one will call the select statement from the XML and the second one will call the annotation method. Both returns the same result. package com.loiane.dao; import java.util.List; import org.apache.ibatis.session.SqlSession;import org.apache.ibatis.session.SqlSessionFactory; import com.loiane.data.EmployeeMapper;import com.loiane.model.Employee; public class EmployeeDAO { /** * Returns the list of all Employee instances from the database. * @return the list of all Employee instances from the database. */ @SuppressWarnings("unchecked") public List selectAll(){ SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { List list = session.selectList("Employee.getAllEmployees"); return list; } finally { session.close(); } } /** * Returns the list of all Employee instances from the database. * @return the list of all Employee instances from the database. */ public List selectAllUsingAnnotations(){ SqlSessionFactory sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); SqlSession session = sqlSessionFactory.openSession(); try { EmployeeMapper mapper = session.getMapper(EmployeeMapper.class); List list = mapper.getAllEmployeesAnnotation(); return list; } finally { session.close(); } } The output if you call one of these methods and print: Employee ID = 1 Name = Kate Manager ID = 1 Info = info KateEmployee ID = 2 Name = Josh Developer ID = 1 Project = webEmployee ID = 3 Name = Peter Developer ID = 2 Project = desktopEmployee ID = 4 Name = James Manager ID = 2 Info = info JamesEmployee ID = 5 Name = Susan Developer ID = 3 Project = web Download If you want to download the complete sample project, you can get it from my GitHub account: https://github.com/loiane/ibatis-discriminator If you want to download the zip file of the project, just click on download: There are more articles about iBatis to come. Stay tooned! From http://loianegroner.com/2011/03/ibatis-mybatis-discriminator-column-example-inheritance-mapping-tutorial/
March 22, 2011
by Loiane Groner
· 34,126 Views
article thumbnail
Practical PHP Testing Patterns: Fake Object
The purpose of a Fake Object, a kind of Test Double, is to replace a collaborator with a functional copy. While Mocks prefer a specification of the behavior to check, Fake Objects are really a simplified version of the production object they substitute. A good Fake Object, however, will be very lightweight, easy to create and throw away to ensure test isolation. Usually it won't satisfy some other functional or non-functional requirements, otherwise we would use him instead of the real object. For example a UserRepository, which normally stores users in a database, may be substitued by a Fake that: does not send mails when an user is added. Doesn't really save anything in the database, but keeps a list in an array. When some complex method is called, just throws a NonImplementedException. Utility Of course performance and less brittleness of tests are the main selling points of a Fake Objects: think about testing with a real database and without one; however this is true for all Test Doubles and I don't need to repeat it. Often the Fake is used when the contract between the SUT and the collaborator is too complex to effectively build a Stub or Mock: many methods, or many calls to the same method are made. Return parameters are difficult to setup. A Fake avoids overspecifying a test for a very invasive contract, like the order of calls and precise parameters; a real implementation is more robust to changes in the contract. For example if the Fake is a collection, you can store and retrieve objects of another type, and still not change the Fake; expectations and configured return value instead would change (together). As an example, an embryonal Fake is PHPUnit $this->returnArgument() for the will() Mock expectation. It's a piece of functionality which substitutes an hard-coded expectation, in order to return one of the arguments of the call instead of adding the argument to the expectation itself. Note that many times we don't have to create new code to leverage Fakes: we can use the existing one with a different configuration (like a Cache maximumCachedItems=0, or with an adapter that use an in-memory cache instead of APC or Memcache) or something provided for us, like an sqlite in-memory database created by PDO. NullObjects are sometimes Fake, but are used in production more than in testing. Implementation Several steps are necessary for building an hand-rolled Fake: create the Fake class by hand, and define the simplest implementation that could do the job for this test. Remember, you're not testing the Fake, you're testing the SUT, so the Fake doesn't have to be perfect, but just passable for the test case at end. Instance an object from the Fake class: the constructor may be different from the real collaborator's one as it's not part of the contract. Install the Fake into the SUT, and proceed as normal. The steps for generating a Fake with PHPUnit are a little different; however, you will still have to write its businss logic: create the "mock" as always via getMock() or getMockBuilder(). Define expectations for its methods with will($this->returnCallback()) and some anonymous functions (or via self-shunting like we did for "Stubs"). Often objects passed in this closures (such as ArrayObject instances) can aid in making the Fake methods communicate with each other. Install the Fake and proceed with the test. The second approach is invaluable when you have a single method on the Fake: it's very fast. Variations Fake Database: an alternative implementation of the Database Facade that lets you test classes that depend on the database without actually using it. For exaple, a DAO which internally use an SplObjectStorage instead of the connection. In-Memory Database: sqlite3 in-memory database used with ORMs for tests that isolate you from the real database, but not from using PDO. High Return On Investment. Fake Web Service: mimics the interface of some web service like Google Analytics, when you have to interact also in write and not only in read. Fake Service Layer: simplified implementations of Service Layer classes, which avoid checking concurrency, authorization, authentication and so on in order to simplify functional testing. Example In this code sample, the test target a ForumManager class which needs to manipulate Posts. It's not a simple Facade: it moves Post around, merges threads and so on. So we inject in it a FakePostDao: ForumManager will call it instead of the database. When you I have many tests of this type, the time for writing the Fake implementation is well spent. array( new Post('Hello'), new Post('Hello!'), new Post('') ), 2 => array( new Post('Hi'), new Post('Hi!') ), 3 => array( new Post('Good morning.') ) )); $forumManager = new ForumManager($dao); $forumManager->mergeThreadsByIds(1, 2); $thread = $dao->getThread(1); $this->assertEquals(5, count($thread)); } } /** * The SUT. */ class ForumManager { private $dao; public function __construct(PostsDao $dao) { $this->dao = $dao; } public function mergeThreadsByIds($originalId, $toBeMergedId) { $original = $this->dao->getThread($originalId); $toBeMerged = $this->dao->getThread($toBeMergedId); $newOne = array_merge($original, $toBeMerged); $this->dao->removeThread($originalId); $this->dao->removeThread($toBeMergedId); $this->dao->addThread($originalId, $newOne); } } /** * Interface for the collaborator to substitute with the Test Double. */ interface PostsDao { public function getThread($id); public function removeThread($id); public function addThread($id, array $thread); } /** * Fake implementation. */ class FakePostDao implements PostsDao { private $threads; public function __construct(array $initialState) { $this->threads = $initialState; } public function getThread($id) { return $this->threads[$id]; } public function removeThread($id) { unset($this->threads[$id]); } /** * We model Thread as array of Posts for simplicity. */ public function addThread($id, array $thread) { $this->threads[$id] = $thread; } } /** * Again a Dummy object: minimal implementation, to make this test pass. */ class Post { }
March 16, 2011
by Giorgio Sironi
· 3,998 Views
article thumbnail
SQL Server Driver for PHP Connection Options: Encrypt
This short post adds to my series on connection options in the SQL Server Driver for PHP. I’ll go into a bit more detail on the Encrypt and TrustServerCertificate options than the driver documentation does. I’ll start with three important points related to these options, then I’ll go into a couple of hypothetical situations that should shed more light on what these options actually do. The first thing to note is that these two options, Encrypt and TrustServerCertificate, are often used together. The Encrypt option is used to specify whether or not the connection to the server is encrypted (the default is false). The TrustServerCertificate option is used to indicate whether the certificate on the SQL Server instance should be trusted (the default is false). Note: Setting the Encrypt option does not mean that data is stored in an encrypted form. It simply means that data is encrypted while in transport to the server. Also note that this setting does not apply to authentication credentials such as a SQL Server password – authentication credentials are always encrypted. For information about storing encrypted data, see Encryption Hierarchy in the SQL Server documentation. The second thing to note is that, by default, when SQL Server is installed it creates a self-signed certificate that it will use to encrypt connections. Of course, self-signed certificates are not ideal for secure connections (they are vulnerable to man-in-the-middle attacks), so it is best to replace this certificate with one from a certificate authority (CA). The third thing to know is how to use these options when connecting to SQL Server. If you are using the SQLSRV API, your connection code might look something like this: $serverName = "serverName"; $connectionInfo = array( "Database"=>"DbName", "UID"=>"UserName", "PWD"=>"Password", "Encrypt"=>true, "TrustServerCertificate"=>false); $conn = sqlsrv_connect( $serverName, $connectionInfo); If you are using the PDO_SQLSRV API, your connection code might look something like this: $serverName = "serverName"; $conn = new PDO("sqlsrv:Server = $serverName; Database = DBName; Encrypt = true; TrustServerCertificate = false", "UserName", "Password"); Now, with those three things in mind, let’s look at a couple of examples. First suppose the following: You did not replace the self-signed certificate created when SQL Server was installed. You set Encrypt = true. You set TrustServerCertificate = false. In this scenario, your connection will fail. When you set TrustServerCertificate = false, you are asking for some 3rd party verification of the certificate. Because this is a self-signed certificate, there is no 3rd party to do the verification. However, if you set TrustServerCertificate = true, then your connection will succeed because you are trusting the certificate. (Note that, as mentioned earlier, this connection would be vulnerable to man-in-the-middle attacks.) Now consider the following: You replaced the self-signed certificate created when SQL Server was installed with a certificate from a certificate authority (CA). You set Encrypt = true. In this case (assuming there are no problems with your certificate), regardless of your setting for TrustServerCertificate, your connection will succeed. However, for a more secure connection (one not vulnerable to man-in-the-middle attacks), you should set TrustServerCertificate = false. Doing so will force third party verification of the certificate. For information about installing a certificate on SQL Server, see How to: Enable Encrypted Connections to the Database Engine. Note that that topic references setting the Force Encryption option on the database server. Setting this option to Yes on the server does the same thing as setting Encryption = true on the client – it forces the connection to be encrypted using the server’s certificate. That’s it for today. Thanks. -Brian
March 14, 2011
by Brian Swan
· 12,648 Views
article thumbnail
Persisting Entity Classes using XML in JPA
Preface This document explains the working of a JPA application purely based on XML configurations. This explains how to create entity classes without using any annotations in Java classes. Entire configuration is done using xml files. Introduction Persistence is one of the major challenges for enterprise applications. JPA is the persistence standard from Sun Microsystems. JPA supports two ways of configuring persistence information- through annotations and XML files. This article discusses the XML based approach towards embedding persistence information for the entity classes. Setting up the environment JPA is light-weight persistence model. It works for both JavaSE and JavaEE applications. All the required class files are present in the JavaEE.jar file which should be added to the classpath. Along with the JavaEE.jar file, we need to add the Persistence Provider jar file as well to the classpath. The provider can be Toplink/ Hibernate/ App server specific persistence provider. In this application, we’ve used Toplink as the persistence provider. So we need to add toplink-essentials.jar file as well in the class path. To summarize, the jar files required are: 1.JavaEE.jar 2.ToplinkEssentials.jar (Replace with the jar file for Persistence Provider, in case you want to use other persistence provider) 3.derbyClient.jar (For JavaDB (Derby) Database, replace this with the jar file of your database) Instead of adding the configuration details in the entity class in the form of annotations, we’ll add the configuration information in the orm.xml file. Before we look into the code, let us discuss the orm.xml file in detail. orm.xml orm.xml file contains all the configuration details required for mapping a class to a relational database table. These details include the primary key of the entity class and the various constraints/ rules to be applied for the primary key. Other details about the entity class include the various attributes of the entity class and columns to which the attributes should be mapped. We can also specify multiple constraints for the same attribute. Other configurable things include information about the various relationships the entity may have with other entities (one-to-one, many-to-one, many-to-many or one-to-many), embedded attributes, information about the version and transient attributes. One application can have multiple entities. The configuration details about all these entity classes, embeddable classes go inside the same orm.xml file. Name of this configuration file can be anything, not mandatorily orm.xml. It should be placed in META-INF subdirectory along with the persistence.xml file. Persistence.xml file should include the orm.xml file. Please find below sample orm.xml and persistence.xml file. The xml schema definitions are highlighted for both. Also, please note the entry required in persistence.xml for including the orm.xml file. PFB the entries made in a sample orm.xml, which’ll persist employee instances to the database. My First JPA XML Application entity Different properties in orm.xml file The various properties in the orm.xml in the order defined in xml schema definition are discussed below: Root tag is It contains the following four types of elements: The persistence-unit-metadata element contains metadata for the entire persistence unit. It is undefined if this element occurs in multiple mapping files within the same persistence unit. The package, schema, catalog and access elements apply to all the entity, mapped-superclass and embeddable elements defined in the same file in which they occur. The sequence-generator, table-generator, named-query, named-native-query and sql-result-set-mapping elements are global to the persistence unit. The entity, mapped-superclass and embeddable elements each define the mapping information for a managed persistent class. The mapping information contained in these elements may be complete or it may be partial. One line string information about the entity classes in the application. specifies the package of the classes listed within the sub elements and attributes of the same mapping file only. classAttribute defines the fully qualified class name of the entity class name: Attribute defines the name of the entity entity: tag can be repeated to embed mapping information for all the entity classes in the application. The Sub-Tags of entity tag are maps the database table to which the entity class shall be persisted overrides or creates a new id-class setting overrides or creates a new inheritance setting overrides or creates a new discriminator value setting, useful in Single_Table Inheritance strategy. overrides or creates a new discriminator column setting, used while configuring Super class in Single_Table inheritance strategy. A sequence-generator is unique by name A table-generator is unique by name used to define named-query for the entity class. Can be repeated to define multiple named queries for the entity class. used to define native named query for the entity class. creates or overrides a pre-persist setting i.e. the entity listener method to be invoked before persisting the entity instance. creates or overrides a post-persist setting i.e. the entity listener method to be invoked after persisting the entity instance. creates or overrides a pre-remove setting i.e. the entity listener method to be invoked before removing the entity instance. creates or overrides a post-remove setting i.e. the entity listener method to be invoked after removing the entity instance. creates or overrides a pre-update setting i.e. the entity listener method to be invoked before updating the entity instance. creates or overrides a post-update setting i.e. the entity listener method to be invoked before updating the entity instance. creates or overrides a post-load setting i.e. the entity listener method to be invoked after loading the entity instance state from database defines the attributes of the entity class which shall be persisted in the database table. The sub tags of the attributes are: :- this tag defines the id column of the entity class. Cannot be repeated for an entity. An entity class can have only one Id attribute. :-this is used to define the ID generation strategy to be used for the primary key column. :-this tag is used to map the entity columns to the columns in the database table. Should be repeated to provide configurations for all the attributes of the entity class. :- This tag is used to add the various column-level constraints on the entity attributes. E.g. unique, insertable, updatable, length, precision etc. :- maps the Version attribute of the entity class. Development Environment We’ve used NetBeans IDE 6.5.1 for creating this JavaSE application for persistence through XML files. NetBeans has persistence support for JPA as well as Hibernate. So, we can choose either Hibernate or Toplink Essentials as the Persistence Provider. Let’s name the application as “JPAEntity”. The name of the entity class is “Employee” and is placed in the package “entity”. “EmpClient.java” is the client class. The xml configuration files are put in META-INF sub directory. The directory structure of the application is as follows:- First JPA META-INF orm.xml persistence.xml entity Employee.java EmpClient.java Developing the Application Step1:Start the NetBeans IDE. Create a new Java Application Step 2:Add the jar files for the Persistence Provider (Toplink Essentials in our case), JavaEE.jar and the database driver(DerbyClient.jar in our case) to the classpath. Step 3: Add the entity class “Employee.java” and client class “EmpClient.java” in the package "jpaentity" This is the code of Employee.java package jpaentity; public class Employee { private int empId; private String empName; private double empSalary; public Employee() { } public Employee(int empId, String empName, double empSalary) { this.empId = empId; this.empName = empName; this.empSalary = empSalary; } public int getEmpId() { return empId; } public void setEmpId(int empId) { this.empId = empId; } public String getEmpName() { return empName; } public void setEmpName(String empName) { this.empName = empName; } public double getEmpSalary() { return empSalary; } public void setEmpSalary(double empSalary) { this.empSalary = empSalary; } @Override public String toString() { return "Employee Id:="+empId+ +" Employee Name:="+empName+" Employee Salary:="+empSalary; } }//End of Employee.java This is the code of EmpClient.java package jpaentity; import java.util.List; import javax.persistence.EntityManager; import javax.persistence.EntityManagerFactory; import javax.persistence.Persistence; public class EmpClient { private EntityManager em; private EntityManagerFactory emf; public void initEmfAndEm() { emf=Persistence.createEntityManagerFactory("JPAEntityPU"); em=emf.createEntityManager(); } public void cleanup() { em.close(); } public void insertAndRetrieve() { System.out.println("-------------------Creating the Objects---------------------"); Employee empObj1=new Employee(1, "Anu", 1000.0); Employee empObj2=new Employee(2, "Rahul", 1500.0); System.out.println("-------------------Starting the transaction---------------------"); em.getTransaction().begin(); em.persist(empObj1); em.persist(empObj2); System.out.println("-------------------Committing the transaction---------------------"); em.getTransaction().commit(); System.out.println("-------------------Objects saved successfully--------------------"); System.out.println("*******************************************************************"); System.out.println("------------------- Reading Objects--------------------"); List emps=em.createQuery("select p from Employee p").getResultList(); for (Employee current:emps) System.out.println(current); System.out.println("-------------------Finished Reading Objects--------------------"); } public static void main(String args[]) { EmpClient myClient=new EmpClient(); System.out.println("-------------------Starting the Client---------------------"); myClient.initEmfAndEm(); myClient.insertAndRetrieve(); myClient.cleanup(); System.out.println("---------------Shutting down the Client---------------------"); } }//End of EmpClient.java Step 4: Set up the database connection. Go to Services Tab in the NetBeans IDE and expand the Databases node. Right click on JavaDB node and select Create Database. Give the database name, username and password and click on OK. The database is created and ready to accept connections. Step 5: Add the persistence unit to the application. Also, add the orm.xml file. The name of the orm.xml file can be changed. We have named it mapping.xml. Following is the code of mapping.xml file My First JPA XML Application entity Add the following code to the persistence.xml file oracle.toplink.essentials.PersistenceProvider \META-INF\orm.xml jpaentity.Employee Steps for execution Compile the source files Build the project Start the Database Server Execute the “EmpClient” class. Connect to the database & verify the table has been created & data is also added in the table. Advantages and Disadvantages of Using XML for Configuration Advantages No coupling between the metadata and the source code Compatible with pre EJB3.0 development process Support from IDEs like NetBeans, Eclipse etc Easy to modify with the help of good editors. Disadvantages Complexity Difficulty in debugging in absence of editors Wrap Up This article helps in understanding entity configuration with XML files as an alternative to embedding annotations in Java code for configuring persistence details.
March 6, 2011
by Anu Bakshi
· 131,217 Views · 1 Like
article thumbnail
IBatis (MyBatis): Handling Joins: Advanced Result Mapping, Association, Collections, N+1 Select Problem
This tutorial will present examples using advanced result mappings, how to handle mappings with association, collections, the n+1 problem, and more.
March 2, 2011
by Loiane Groner
· 121,312 Views · 3 Likes
article thumbnail
5 Key Events in the history of Cloud Computing
While we have been evaluating in our blog posts the various features available on popular Cloud Computing platforms today, I thought it might be a good idea to understand when and how all this started and look back at where this began and trace some of the key events in the progress of cloud computing. Amazon like all other Internet companies in the period of the dot com bubble were left with large amounts of underutilized computing infrastructure, reports suggest less than 10% of the server infrastructure of many companies were being used. Amazon may have use cloud computing as a way to provide this unused resources as utility computing service when they launched S3 as the first true cloud computing service in March 2006. 1. Launch of Amazon Web Services in July 2002 The initial version of AWS in 2002 was focused more on making information available from Amazon to partners through a web services model with programmatic and developer support and was very focused on Amazon as a retailer. While this set the stage for the next steps the launch of S3 was the true step towards building a cloud platform. Amazon Press Release 2. S3 Launches in March 2006 Here are some interesting articles on the launch of S3 in 2006. The real breakthrough however was the pricing model for S3 which defined the model of 'pay-per-use' which has now become the defacto standard for cloud pricing. Also the launch of S3 really defined the shift of Amazon from being just a retailer to a strong player in the technology space. Techcrunch Post on S3 on March 14th, 2006 Read Write Web Post on S3 and EC2 on Nov 3rd, 2006 Business Week Article on Jeff Bezos vision on cloud computing on Nov 13th, 2006 3. EC2 Launches in August 2006 EC2 had a much quieter launch in August 2006 but i would think had the bigger impact by making core computing infrastructure available. This completed the loop on enabling a more complete cloud infrastructure being available. In fact at that time analysts had some difficulty in understanding what the big deal is, and thought it looks similar to other hosting services available online only with a different pricing model. Some interesting articles from that time on the launch: Technologyevangelist Blog Virtualization Info 4. Launch of Google App Engine in April 2008 The launch of Google App Engine in 2008 was the entry of the first pure play technology company into the Cloud Computing market. Google a dominant Internet company entering into this market was clearly a major step towards wide spread adoption of cloud computing. As with all their other products they introduced radical pricing models with a free entry level plan and extremely low cost computing and storage services which are currently among the lowest in the market. Techcrunch post on App Engine Launch Google App Engine Launch Post 5. Windows Azure launches Beta in Nov 2009 The entry of Microsoft into Cloud Computing is a clear indication of the growth of the space. Microsoft for long has not accepted the Internet and the web as a significant market and has continued to focus on the desktop market for all these years. I think this is a realization that a clear shift is taking place. The launch of Azure is a key event in the history of cloud computing with the largest software company making a small but significant shift to the web. Launch of Azure Beta Azure General Availability - Feb 2010 You might also like: Cloud Computing, Google App Engine: How big is the market Really ? Comparing Google App Engine with Amazon EC2 Comparing Amazon EC2 and Microsoft Azure Languages Supported by Google App Engine Cloud Computing: What is it really ?
February 26, 2011
by Kaushik Raghupathi
· 48,031 Views
article thumbnail
How to remove getters and setters
Getters and setters are one of the first abstraction step that is thought over public fields in object-oriented programming. However, the paradigm was never about encapsulating properties by providing a special reading/writing mechanism via methods, but about objects responding to messages. In other words, encapsulation is (not only, but also) about being capable of changing private fields. If you write getters and setters, you introduce a leaky abstraction over private fields, since the names and number of your public methods are influenced coupled to them. They aren't really private anymore: for example in service classes I pass dependencies for private methods in the constructor, and the client code is never affected when these dependencies change. With getters and setters, a field addition, removal or renaming affects the contract of the class. In this article, we're going to take this User Entity class and explore the many ways we have for removing getters and setters. The choice between them depends mainly on what you need them for: it's the client code that should decide what contracts wants from these classes. nickname = $nickname; } public function getNickname() { return $nickname; } public function setPassword($password) { $this->password = $password; } public function getPassword() { return $this->password; } // ... you get the picture: other 8 getters or setters } The various techniques are ordered by complexity. As always, I express use cases for the User class via a test case. All the code is on Github. Constructor First, we can pass some fields in the constructor. If we do not expose the field then, the value will be immutable and the client code will never even know that this value exists. For example, our User has an immutable nickname, which serves also as a primary key: public function testPassWriteOnlyDataInTheConstructor() { $user = new User('giorgiosironi'); // should not explode //removed the setter as it cannot be changed } class User { public function __construct($nickname, $activationKey = '') { $this->nickname = $nickname; $this->activationKey = $activationKey; } } Information Expert Next, we have the Information Expert pattern: assign an operation to the object with the greatest knowledge for accomplishing it. If you do so, you won't need to expose private fields via getters and setters, since you will model some behavior as code in that class, which can see private fields. For example, when we register an User we want to activate it via email verification, so we send an activation key via mail. But to check it, we don't need to extract the value from the User object. // Information Expert /** * @expectedException InvalidArgumentException */ public function testAnUserIsActivated() { $user = new User('giorgiosironi', 'ABC'); $user->activate('AB'); } class User { public function activate($key) { if ($this->activationKey === $key) { $this->active = true; return; } throw new InvalidArgumentException('Key for activation is incorrect.'); } } This style is an example of the Tell, Don't Ask principle: we tell our User to do something instead of asking it for information and doing it ourselves. Double Dispatch Putting behavior inside an Entity class is nice, but sometimes the operation needs some external dependency to work, like a login mechanism needing a storage for the identity of the user (usually the session). We have two types of coupling to solve here: static: the User class should not depend on any other infrastructure class, which in turn refers the database or the session storage. runtime: the User class cannot hold a reference to an infrastructure object, for instance because we want to serialize it or to create it simply with a new operator, or our ORM does not support injection of collaborators. The first issues is solved by introducing an interface implemented by the infrastructure class; the second by passing in the dependency via Double Dispatch instead of via the constructor like we do with service classes. public function testUsersLogin() { $user = new User('giorgiosironi'); $user->setPassword('gs'); // will be removed in next tests // in reality, we would use a SessionLoginAdapter or something like that $loginAdapterMock = $this->getMock('LoginAdapter'); $loginAdapterMock->expects($this->once()) ->method('storeIdentity') ->with('giorgiosironi'); $user->login('gs', $loginAdapterMock); } class User { public function login($password, LoginAdapter $loginAdapter) { if ($this->password == $password) { $loginAdapter->storeIdentity($this->nickname); return true; } else { return false; } } } Still an example of Tell, Don't Ask, but more real world now. Command and changesets You really have to put data inside this object: the user has just compiled a form and you have to get thos input values in. So how do we define that operation? As an atomic method call, by passing in what I call a Changeset but it's a specialization of a Command (not Command pattern but CommandQueryResponsibilitySegregation). In the simplest cases, it's just a Value Object or a Data Transfer Object with no behavior. public function testCommandForChangingPassword() { $user = new User('giorgiosironi'); $passwordChange = new ChangeUserPassword('', 'gs'); $user->handle($passwordChange); $this->assertEquals('gs', $user->getPassword()); //deprecated, will be removed in next tests } class User { public function handle($command) { if ($command instanceof ChangeUserPassword) { $this->handleChangeUserPassword($command); } if ($command instanceof SetUserDetails) { $this->handleSetUserDetails($command); } // support other commands here... } private function handleChangeUserPassword(ChangeUserPassword $command) { if ($command->getOldPassword() == $this->password) { $this->password = $command->getNewPassword(); } else { throw new Exception('The old password is not correct.'); } } } Think about it: you will have to put these getters and setters somewhere; it's best to put them on an object which is a data structure than that on your Entity. This way: you will be coupled to the current fields only on this particular operation and not when you pass an User around. you will make it clear that you support only a full update operation, and it's not ok to call an isolate setter. Actually in PHP you could just use an array as a Changeset, but a class provides a stricter contract. Also public fields are not viable for a contract as no error will be raised by PHP if you assign a non-existent field on a Changeset object. Rendering on a canvas In the Growing Object-Oriented Software mailing list, there has been recently a discussion on how to emulate getters via callbacks. This solution is the specular of our Changeset argument used for extracting data instead of updating it. public function testCanvasForRenderingAnObject() { $user = new User('giorgiosironi'); $detailsSet = new SetUserDetails('Italy', 'Pizza'); //THIS may have set/get $user->handle($detailsSet); // canvas can also be a form, or xml, or json... $canvas = new HtmlCanvas('{{location}{{favoriteFood}'); $user->render($canvas); $this->assertEquals('ItalyPizza', (string) $canvas); } class User { public function render(Canvas $canvas) { $canvas->nickname = $this->nickname; $canvas->location = $this->location; $canvas->favoriteFood = $this->favoriteFood; } } Again, the canvas is hidden behind an interface and can be anything: an HTML view, a form, a JSON or RSS feed generator... CQRS In Command-Query Responsibility Segregation, you use the ORM for mapping your objects in the database, and fill your report screens by querying it directly, or even by querying another store which is continuously rebuilt from your master database. I don't know of any implementations of CQRS in PHP, but this mechanism promises to at least eliminate getters, as your domain objects will be write-only. Conclusion The full code is in this Github repository, as always. You have no excuses now: go and ditch one of your getters and setters immediately. Your code will breath a bit of fresh air.
February 22, 2011
by Giorgio Sironi
· 22,166 Views
article thumbnail
Getting Started with iBatis (MyBatis): Annotations
This tutorial will walk you through how to setup iBatis (MyBatis) in a simple Java project and will present examples using simple insert, update, select and delete statements using annotations. This is the third tutorial of the iBatis/MyBatis series, you can read the first 2 tutorials on the following links: Introduction to iBatis (MyBatis), An alternative to Hibernate and JDBC Getting Started with iBatis (MyBatis): XML Configuration iBatis/ MyBatis 3 offers a new feature: annotations. But it lacks examples and documentation about annotations. I started to explore and read the MyBatis mailing list archive to write this tutorial. Another thing you will notice is the limitation related to annotations. I am going to demonstrate some examples that you can do with annotations in this tutorial. All the power of iBatis is in its XMl configuration. So let’s play a little bit with annotations. It is much simpler and you can use it for simple queries in small projects. As I already mentioned, if you want something more complex, you will have to use XML configuration. One more thing before we get started: this tutorial is the same as the previous one, but instead of XML, we are going to use annotations. Pre-Requisites For this tutorial I am using: IDE: Eclipse (you can use your favorite one) DataBase: MySQL Libs/jars: Mybatis, MySQL conector and JUnit (for testing) This is how your project should look like: Sample Database Please run this script into your database before getting started with the project implementation: I will not post the sample database here again. You can get it from the previous post about iBatis or download this sample project. The files are the same. 1 – Contact POJO We will create a POJO class first to respresent a contact with id, name, phone number and email address – same as previous post. 2 – ContactMapper In this file, we are going to set up all the queries using annotations. It is the MyBatis-Interface for the SQLSessionFactory. A mapper class is simply an interface with method definitions that match up against the SqlSession methods. Mapper interfaces do not need to implement any interface or extend any class. As long as the method signature can be used to uniquely identify a corresponding mapped statement. Mapper interfaces can extend other interfaces. Be sure that you have the statements in the appropriate namespace when using XML binding to Mapper interfaces. Also, the only limitation is that you cannot have the same method signature in two interfaces in a hierarchy (a bad idea anyway). A mapperclass is simply an interface with method definitions that match up against the SqlSession methods. You can create some strings with the SQL code. Remember it has to be the same code as in XML Configuration. package com.loiane.data; import java.util.List; import org.apache.ibatis.annotations.Delete;import org.apache.ibatis.annotations.Insert;import org.apache.ibatis.annotations.Options;import org.apache.ibatis.annotations.Param;import org.apache.ibatis.annotations.Result;import org.apache.ibatis.annotations.Results;import org.apache.ibatis.annotations.Select;import org.apache.ibatis.annotations.Update; import com.loiane.model.Contact; public interface ContactMapper { final String SELECT_ALL = "SELECT * FROM CONTACT"; final String SELECT_BY_ID = "SELECT * FROM CONTACT WHERE CONTACT_ID = #{id}"; final String UPDATE = "UPDATE CONTACT SET CONTACT_EMAIL = #{email}, CONTACT_NAME = #{name}, CONTACT_PHONE = #{phone} WHERE CONTACT_ID = #{id}"; final String UPDATE_NAME = "UPDATE CONTACT SET CONTACT_NAME = #{name} WHERE CONTACT_ID = #{id}"; final String DELETE = "DELETE FROM CONTACT WHERE CONTACT_ID = #{id}"; final String INSERT = "INSERT INTO CONTACT (CONTACT_EMAIL, CONTACT_NAME, CONTACT_PHONE) VALUES (#{name}, #{phone}, #{email})"; /** * Returns the list of all Contact instances from the database. * @return the list of all Contact instances from the database. */ @Select(SELECT_ALL) @Results(value = { @Result(property="id", column="CONTACT_ID"), @Result(property="name", column="CONTACT_NAME"), @Result(property="phone", column="CONTACT_PHONE"), @Result(property="email", column="CONTACT_EMAIL") }) List selectAll(); /** * Returns a Contact instance from the database. * @param id primary key value used for lookup. * @return A Contact instance with a primary key value equals to pk. null if there is no matching row. */ @Select(SELECT_BY_ID) @Results(value = { @Result(property="id"), @Result(property="name", column="CONTACT_NAME"), @Result(property="phone", column="CONTACT_PHONE"), @Result(property="email", column="CONTACT_EMAIL") }) Contact selectById(int id); /** * Updates an instance of Contact in the database. * @param contact the instance to be updated. */ @Update(UPDATE) void update(Contact contact); /** * Updates an instance of Contact in the database. * @param name name value to be updated. * @param id primary key value used for lookup. */ void updateName(@Param("name") String name, @Param("id") int id); /** * Delete an instance of Contact from the database. * @param id primary key value of the instance to be deleted. */ @Delete(DELETE) void delete(int id); /** * Insert an instance of Contact into the database. * @param contact the instance to be persisted. */ @Insert(INSERT) @Options(useGeneratedKeys = true, keyProperty = "id") void insert(Contact contact);} @Select The @Select annotation is very simple. Let’s take a look at the first select statment of this class: selectAll. Note that you don’t need to use the mapping if your database table columns mach the name of the class atributes. I used different names to use the @Result annotation. If table columns match with atribute names, you don’t need to use the @Result annotation. Not let’s take a look on the second method: selectById. Notice that we have a parameter. It is a simple parameter – easy to use when you have a single parameter. @Update Let’s say you want to update all the columns. You can pass the object as parameter and iBatis will do all the magic for you. Remember to mach the parameter name with the atribute name, otherwise iBatis can get confused, Now let’s say you want to use 2 or 3 paramaters, and they don’t belong to an object. If you take a look at iBatis XML configuration you will see you have to set the option parameterType (remember?) and specify you parameter type in it, in another words, you can use only one parameter. If you are using annotation, you can use more than one parameter using the @Param annotation. @Delete The @Delete annotation is also very simple. It follows the previous rules related to parameters. @Insert The @Insert annotation also follows the rules related to parameters. You can use an object or use the @Param annotation to specify more than one parameter. What about the generation key? Well, if your database supports auto generation key, you can set it up using annotations with the @Options annotation. You will need to specify the option useGeneratedKeys and keyProperty. If your database does not support auto generation key, sorry, but I still did not figure out how to do it (in last case, use can do it manually and then pass as parameter to your insert query). 3 – MyBatisConnectionFactory Every MyBatis application centers around an instance of SqlSessionFactory. A SqlSessionFactory instance can be acquired by using the SqlSessionFactoryBuilder. SqlSessionFactoryBuilder can build a SqlSessionFactory instance from an XML configuration file, of from a custom prepared instance of the Configuration class. An observation about this file: on the previous example, we set the mappers on the iBatis Configuration XML file. Using annotations, we will set the mapper manually. In a future example, I’ll show you how to work with XML and Annotations together. package com.loiane.dao; import java.io.FileNotFoundException;import java.io.IOException;import java.io.Reader; import org.apache.ibatis.io.Resources;import org.apache.ibatis.session.SqlSessionFactory;import org.apache.ibatis.session.SqlSessionFactoryBuilder; import com.loiane.data.ContactMapper; public class MyBatisConnectionFactory { private static SqlSessionFactory sqlSessionFactory; static { try { String resource = "SqlMapConfig.xml"; Reader reader = Resources.getResourceAsReader(resource); if (sqlSessionFactory == null) { sqlSessionFactory = new SqlSessionFactoryBuilder().build(reader); sqlSessionFactory.getConfiguration().addMapper(ContactMapper.class); } } catch (FileNotFoundException fileNotFoundException) { fileNotFoundException.printStackTrace(); } catch (IOException iOException) { iOException.printStackTrace(); } } public static SqlSessionFactory getSqlSessionFactory() { return sqlSessionFactory; } } 4 – ContactDAO Now that we set up everything needed, let’s create our DAO. To call the sql statments, we need to do one more configuration, that is to set and get the mapper from the SqlSessionFactory. Then we just need to call the mapper method. It is a little bit different, but it does the same thing. package com.loiane.dao; import java.util.List; import org.apache.ibatis.session.SqlSession;import org.apache.ibatis.session.SqlSessionFactory; import com.loiane.data.ContactMapper;import com.loiane.model.Contact; public class ContactDAO { private SqlSessionFactory sqlSessionFactory; public ContactDAO(){ sqlSessionFactory = MyBatisConnectionFactory.getSqlSessionFactory(); } /** * Returns the list of all Contact instances from the database. * @return the list of all Contact instances from the database. */ public List selectAll(){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); List list = mapper.selectAll(); return list; } finally { session.close(); } } /** * Returns a Contact instance from the database. * @param id primary key value used for lookup. * @return A Contact instance with a primary key value equals to pk. null if there is no matching row. */ public Contact selectById(int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); Contact list = mapper.selectById(id); return list; } finally { session.close(); } } /** * Updates an instance of Contact in the database. * @param contact the instance to be updated. */ public void update(Contact contact){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.update(contact); session.commit(); } finally { session.close(); } } /** * Updates an instance of Contact in the database. * @param name name value to be updated. * @param id primary key value used for lookup. */ public void updateName(String name, int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.updateName(name, id); session.commit(); } finally { session.close(); } } /** * Insert an instance of Contact into the database. * @param contact the instance to be persisted. */ public void insert(Contact contact){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.insert(contact); session.commit(); } finally { session.close(); } } /** * Delete an instance of Contact from the database. * @param id primary key value of the instance to be deleted. */ public void delete(int id){ SqlSession session = sqlSessionFactory.openSession(); try { ContactMapper mapper = session.getMapper(ContactMapper.class); mapper.delete(id); session.commit(); } finally { session.close(); } } 5 – Mapper Configuration File The MyBatis XML configuration file contains settings and properties that have a dramatic effect on how MyBatis behaves. This time, we do not need to configure alias or xml config files. We did all the magic with annotations, so we have a simple config file with only the information about the database we want to connect with. Download I suggest you to take a look at the org.apache.ibatis.annotations package and try to find out what each annotation can do. Unfortunatelly, you won’t find much documentation or examples on MyBatis website. I also created a TestCase class. If you want to download the complete sample project, you can get it from my GitHub account: https://github.com/loiane/ibatis-annotations-helloworld If you want to download the zip file of the project, just click on download: There are more articles about iBatis to come. Stay tooned! In next articles, I’m going to demonstrate how to implement the feature using XML and then Annotations (when it is possible). Happy Coding! From http://loianegroner.com/2011/02/getting-started-with-ibatis-mybatis-annotations/
February 22, 2011
by Loiane Groner
· 72,144 Views
article thumbnail
Getting Started with iBatis (MyBatis): XML Configuration
This tutorial will walk you through how to setup iBatis (MyBatis) in a simple Java project and will present examples using simple insert, update, select and delete statements.
February 18, 2011
by Loiane Groner
· 112,572 Views
article thumbnail
Solve Foreign-key Problems in DBUnit Test Data
If you create small per-test datasets, as DBUnit advises, you’ll get intermittent build failures due to foreign-key violations. This post explains (1) why this happens, (2) why small per-test datasets are still a good idea, and (3) one simple way to get around the problem. NB When I searched for solutions to this problem, I discovered that other kinds of foreign-key problem come up with DBUnit. Some people have circular dependencies in their relational database schemas, which stops DBUnit from loading the test data. If such is your case, I’m sorry to say that this post won’t help you with it, and your best option is probably to just take yourself outside and shoot yourself now. (Although some people seem to chosen instead to disable foreign key checking during test runs.) What causes the foreign-key violations The cause of the problem is simple, and illustrated by a trivial example. Suppose you have two entity classes, HitchHiker and SpaceShip. The HitchHiker table has a foreign key that references SpaceShip. The test data for HitchHikerDaoTest contains lines from both tables, whereas the test data for SpaceShipDaoTest contains only lines from SpaceShip. DBUnit’s default setup operation, CLEAN_INSERT, wipes data from every table occurring in the test dataset and then inserts the lines listed in that dataset. When SpaceShipDaoTest runs, DBUnit will start by deleting everything in the SpaceShip table. If any HitchHikers are currently riding in the SpaceShips that are about to be deleted, the database will object to their untimely eviction (I’m not sure whether the error message will read like Vogon poetry, though). If you start from an empty database, and execute SpaceShipDaoTest and then HitchHikerDaoTest, you’ll be fine; but if you do it in the other order, your build will fail. It’s that second-worst kind of bug, the unpredictable kind, since you don’t (usually) specify the order in which tests run. After all, they’re supposed to be independent! So you may well find that you have no problems for months on end, until one day you get an error running individual tests in a particular sequence, or Maven changes the order in which it runs your tests on the CI server, and BOOM! Why you should still use small independent datasets It’s tempting to circumvent the problem by using a single monolithic dataset for all your integration tests. I’ve tried this, and I advise against it. A big data file is hard to work with: you waste a lot of time scrolling around looking for the line you need, and it’s very hard to follow and understand foreign-key relations. Worse still: by modifying the data to make one test pass, you can easily accidentally break another one. The larger the dataset and the test suite become, the more fragile they get, and the more painstaking it becomes to modify them. How to avoid the foreign-key problem with small independent datasets One working but unsatisfactory solution would be to pad out every XML dataset with the list of all tables touched in the test suite. It’s unsatisfactory because the only way to add a table into a FlatXmlDataSet is to list a line of that table — a FlatXmlDataSet can’t contain empty tables — and there’s no justification for polluting the test data with lines from tables that are not part of the test. The solution I found was to use a DTD to clean tables before tests. Every XML file has different contents, but they all reference a single DTD which lists all the tables involved in the test suite. The DTD is easy to generate from the database schema, and useful for auto-complete and catching typos in column names, so you should probably already be using one. The code to exploit its contents is very simple: private IDataSet loadTestDataWithDtdTableList(String dtdFilename) throws IOException, DataSetException, SQLException { Reader dtdReader = new FileReader(new ClassPathResource(dtdFilename).getFile()); IDataSet dtdDataset = new FlatDtdDataSet(dtdReader); FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder(); builder.setMetaDataSet(new DatabaseDataSet(dbUnitConnection, false)); IDataSet xmlDataset = builder.build(asFile(xmlFilename)); return new CompositeDataSet(dtdDataset, xmlDataset);} How it works: DBUnit provides a facility to load a dataset from a DTD. This dataset contains all the tables listed in the DTD, but of course empty of data. The DTD dataset is then combined with a FlatXmlDataSet representing your test data. The graphic below illustrates the composite dataset that would be produced for the SpaceShip example. If you have dictionary tables whose contents never change, you can and should leave them out of the DTD as well as out of the XML datasets, to improve test performance a little. One further detail: you should close the FileReader after test setup. I couldn’t find a hook into the end of the test setup operation (short of writing my own DatabaseOperation), so I saved the reference as a member variable and hooked the close() call into the tear-down phase of the test. NB For a more complete code example, see this Gist snippet of a base class for TestNG+Spring+DBUnit tests that adds the above-described DBUnit setup operation to Spring’s TestNG helper class. Happy database testing! From http://www.andrewspencer.net/2011/solve-foreign-key-problems-in-dbunit-test-data/
February 16, 2011
by Andrew Spencer
· 27,847 Views
article thumbnail
Table sorting & pagination with jQuery and Razor in ASP.NET MVC
Introduction jQuery enjoys living inside pages which are built on top of ASP.NET MVC Framework. The ASP.NET MVC is a place where things are organized very well and it is quite hard to make them dirty, especially because the pattern enforces you on purity (you can still make it dirty if you want so ;) ). We all know how easy is to build a HTML table with a header row, footer row and table rows showing some data. With ASP.NET MVC we can do this pretty easy, but, the result will be pure HTML table which only shows data, but does not includes sorting, pagination or some other advanced features that we were used to have in the ASP.NET WebForms GridView. Ok, there is the WebGrid MVC Helper, but what if we want to make something from pure table in our own clean style? In one of my recent projects, I’ve been using the jQuery tablesorter and tablesorter.pager plugins that go along. You don’t need to know jQuery to make this work… You need to know little CSS to create nice design for your table, but of course you can use mine from the demo… So, what you will see in this blog is how to attach this plugin to your pure html table and a div for pagination and make your table with advanced sorting and pagination features. Demo Project Resources The resources I’m using for this demo project are shown in the following solution explorer window print screen: Content/images – folder that contains all the up/down arrow images, pagination buttons etc. You can freely replace them with your own, but keep the names the same if you don’t want to change anything in the CSS we will built later. Content/Site.css – The main css theme, where we will add the theme for our table too Controllers/HomeController.cs – The controller I’m using for this project Models/Person.cs – For this demo, I’m using Person.cs class Scripts – jquery-1.4.4.min.js, jquery.tablesorter.js, jquery.tablesorter.pager.js – required script to make the magic happens Views/Home/Index.cshtml – Index view (razor view engine) the other items are not important for the demo. ASP.NET MVC 1. Model In this demo I use only one Person class which defines Person entity with several properties. You can use your own model, maybe one which will access data from database or any other resource. Person.cs public class Person { public string Name { get; set; } public string Surname { get; set; } public string Email { get; set; } public int? Phone { get; set; } public DateTime? DateAdded { get; set; } public int? Age { get; set; } public Person(string name, string surname, string email, int? phone, DateTime? dateadded, int? age) { Name = name; Surname = surname; Email = email; Phone = phone; DateAdded = dateadded; Age = age; } } 2. View In our example, we have only one Index.chtml page where Razor View engine is used. Razor view engine is my favorite for ASP.NET MVC because it’s very intuitive, fluid and keeps your code clean. 3. Controller Since this is simple example with one page, we use one HomeController.cs where we have two methods, one of ActionResult type (Index) and another GetPeople() used to create and return list of people. HomeController.cs public class HomeController : Controller { // // GET: /Home/ public ActionResult Index() { ViewBag.People = GetPeople(); return View(); } public List GetPeople() { List listPeople = new List(); listPeople.Add(new Person("Hajan", "Selmani", "[email protected]", 070070070,DateTime.Now, 25)); listPeople.Add(new Person("Straight", "Dean", "[email protected]", 123456789, DateTime.Now.AddDays(-5), 35)); listPeople.Add(new Person("Karsen", "Livia", "[email protected]", 46874651, DateTime.Now.AddDays(-2), 31)); listPeople.Add(new Person("Ringer", "Anne", "[email protected]", null, DateTime.Now, null)); listPeople.Add(new Person("O'Leary", "Michael", "[email protected]", 32424344, DateTime.Now, 44)); listPeople.Add(new Person("Gringlesby", "Anne", "[email protected]", null, DateTime.Now.AddDays(-9), 18)); listPeople.Add(new Person("Locksley", "Stearns", "[email protected]", 2135345, DateTime.Now, null)); listPeople.Add(new Person("DeFrance", "Michel", "[email protected]", 235325352, DateTime.Now.AddDays(-18), null)); listPeople.Add(new Person("White", "Johnson", null, null, DateTime.Now.AddDays(-22), 55)); listPeople.Add(new Person("Panteley", "Sylvia", null, 23233223, DateTime.Now.AddDays(-1), 32)); listPeople.Add(new Person("Blotchet-Halls", "Reginald", null, 323243423, DateTime.Now, 26)); listPeople.Add(new Person("Merr", "South", "[email protected]", 3232442, DateTime.Now.AddDays(-5), 85)); listPeople.Add(new Person("MacFeather", "Stearns", "[email protected]", null, DateTime.Now, null)); return listPeople; } } TABLE CSS/HTML DESIGN Now, lets start with the implementation. First of all, lets create the table structure and the main CSS. 1. HTML Structure @{ Layout = null; } value value value So, this is the main structure you need to create for each of your tables where you want to apply the functionality we will create. Of course the scripts are referenced once ;). As you see, our table has class tablesorter and also we have a div with id pager. In the next steps we will use both these to create the needed functionalities. The complete Index.cshtml coded to get the data from controller and display in the page is: NameSurnameEmailPhoneDate Added @{ foreach (var p in ViewBag.People) { @[email protected]@[email protected]@p.DateAdded } } NameSurnameEmailPhoneDate Added 510203040 So, mainly the structure is the same. I have added @Razor code to create table with data retrieved from the ViewBag.People which has been filled with data in the home controller. 2. CSS Design The CSS code I’ve created is: /* DEMO TABLE */ body { font-size: 75%; font-family: Verdana, Tahoma, Arial, "Helvetica Neue", Helvetica, Sans-Serif; color: #232323; background-color: #fff; } table { border-spacing:0; border:1px solid gray;} table.tablesorter thead tr .header { background-image: url(images/bg.png); background-repeat: no-repeat; background-position: center right; cursor: pointer; } table.tablesorter tbody td { color: #3D3D3D; padding: 4px; background-color: #FFF; vertical-align: top; } table.tablesorter tbody tr.odd td { background-color:#F0F0F6; } table.tablesorter thead tr .headerSortUp { background-image: url(images/asc.png); } table.tablesorter thead tr .headerSortDown { background-image: url(images/desc.png); } table th { width:150px; border:1px outset gray; background-color:#3C78B5; color:White; cursor:pointer; } table thead th:hover { background-color:Yellow; color:Black;} table td { width:150px; border:1px solid gray;} PAGINATION AND SORTING Now, when everything is ready and we have the data, lets make pagination and sorting functionalities 1. jQuery Scripts referencing 2. jQuery Sorting and Pagination script So, with only two lines of code, I’m using both tablesorter and tablesorterPager plugins, giving some options to both these. Options added: tablesorter - widthFixed: true – gives fixed width of the columns tablesorter - sortList[[0,0]] – An array of instructions for per-column sorting and direction in the format: [[columnIndex, sortDirection], ... ] where columnIndex is a zero-based index for your columns left-to-right and sortDirection is 0 for Ascending and 1 for Descending. A valid argument that sorts ascending first by column 1 and then column 2 looks like: [[0,0],[1,0]] (source: http://tablesorter.com/docs/) tablesorterPager – container: $(“#pager”) – tells the pager container, the div with id pager in our case. tablesorterPager – size: the default size of each page, where I get the default value selected, so if you put selected to any other of the options in your select list, you will have this number of rows as default per page for the table too. END RESULTS 1. Table once the page is loaded (default results per page is 5 and is automatically sorted by 1st column as sortList is specified) 2. Sorted by Phone Descending 3. Changed pagination to 10 items per page 4. Sorted by Phone and Name (use SHIFT to sort on multiple columns) 5. Sorted by Date Added 6. Page 3, 5 items per page ADDITIONAL ENHANCEMENTS We can do additional enhancements to the table. We can make search for each column. I will cover this in one of my next blogs. Stay tuned. DEMO PROJECT You can download demo project source code from HERE. CONCLUSION Once you finish with the demo, run your page and open the source code. You will be amazed of the purity of your code. Working with pagination in client side can be very useful. One of the benefits is performance, but if you have thousands of rows in your tables, you will get opposite result when talking about performance. Hence, sometimes it is nice idea to make pagination on back-end. So, the compromise between both approaches would be best to combine both of them. I use at most up to 500 rows on client-side and once the user reach the last page, we can trigger ajax postback which can get the next 500 rows using server-side pagination of the same data. I would like to recommend the following blog post http://weblogs.asp.net/gunnarpeipman/archive/2010/09/14/returning-paged-results-from-repositories-using-pagedresult-lt-t-gt.aspx, which will help you understand how to return page results from repository. I hope this was helpful post for you. Wait for my next posts ;). Please do let me know your feedback. Best Regards, Hajan
February 14, 2011
by Hajan Selmani
· 77,316 Views
  • Previous
  • ...
  • 516
  • 517
  • 518
  • 519
  • 520
  • 521
  • 522
  • 523
  • 524
  • 525
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×