DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Profiling MySQL Memory Usage With Valgrind Massif
This post comes from Roel Van de Paar at the MySQL Performance Blog. There are times where you need to know exactly how much memory the mysqld server (or any other program) is using, where (i.e. for what function) it was allocated, how it got there (a backtrace, please!), and at what point in time the allocation happened. For example; you may have noticed a sharp memory increase after executing a particular query. Or, maybe mysqld is seemingly using too much memory overall. Or again, maybe you noticed mysqld’s memory profile slowly growing overtime, indicating a possible memory bug. Whatever the reason, there is a simple but powerful way to profile MySQL memory usage; the Massif tool from Valgrind. An excerpt from the Massif manual page (Heap memory being simply the allotted pool of memory for use by programs); Massif tells you not only how much heap memory your program is using, it also gives very detailed information that indicates which parts of your program are responsible for allocating the heap memory. Firstly, we need to get the Valgrind program. Though you could use the latest version which comes with your OS (think yum or apt-get install Valgrind), I prefer to obtain & compile the latest release (3.8.1 at the moment): sudo yum remove valgrind* # or apt-get etc. sudo yum install wget make gcc gcc-c++ libtool libaio-devel bzip2 glibc* wget http://valgrind.org/downloads/valgrind-3.8.1.tar.bz2 # Or newer tar -xf valgrind-3.8.1.tar.bz2 cd valgrind-3.8.1 ./configure make sudo make install valgrind --version # check version to be same as what was downloaded (3.8.1 here) There are several advantages to self-compiling: When using the latest version of Valgrind, even compiled ‘out of the box’ (i.e. with no changes), you will likely see less issues then with earlier versions. For example, earlier versions may have too-small Valgrind-internal memory tracking allocations hardcoded. In other words; you may not be able to run your huge-buffer-pool under Valgrind without it complaining quickly. If you self compile, and those Valgrind-internal limits are still too small, you can easily change them before compiling. An often bumped up setting is VG_N_SEGMENTS in coregrind/m_aspacemgr/aspacemgr-linux.c (when you see ‘Valgrind: FATAL: VG_N_SEGMENTS is too low’) Newer releases [better] support newer hardware and software. Once ‘valgrind –version’ returns the correct installed version, you’re ready to go. In this example, we’ll write the output to /tmp/massif.out. If you prefer to use another location (and are therefore bound to set proper file rights etc.) use: $ touch /your_location/massif.out $ chown user:group /your_location/massif.out # Use the user mysqld will now run under (check 'user' setting in my.cnf also) (see here if this is not clear) Now, before you run mysqld under Valgrind, make sure debug symbols are present. Debug symbols are present when the binary is not stripped of them (downloaded ‘GA’ [generally available] packages may contain optimized or stripped binaries, which are optimized for speed rather than debugging). If the binaries you have are stripped, you have a few options to get a debug build of mysqld to use for memory profiling purposes: Download the appropriate debuginfo packages (these may not be available for all releases). Download debug binaries of the same server version as you are currently using, and simply use the debug mysqld as a drop-in replacement for your current mysqld (i.e. shutdown, mv mysqld mysqld.old, cp /debug_bin_path/mysqld ./mysqld, startup). If you have (through download or from past storage) the source code available (of the same server version as you are currently using) then simply debug-compile the source and use the mysqld binary as a drop-in replacement as shown in the last point. (For example, Percona Server 5.5 source can be debug-compiled by using ‘./build/build-binary –debug ..’). Valgrind Massif needs the debug symbol information to be present, so that it can print stack traces that show where memory is consumed. Without debug symbols available, you would not be able to see the actual function call responsible for memory usage. If you’re not sure if you have stripped binaries, simply test the procedure below and see what output you get. Once you’re all set with debug symbols, shutdown your mysqld server using your standard shutdown procedure, and then re-start it manually under Valgrind using the Massif tool: $ valgrind --tool=massif --massif-out-file=/tmp/massif.out /path/to/mysqld {mysqld options...} Note that ‘{mysqld options}’ could for instance include –default-file=/etc/my.cnf (if this is where your my.cnf file is located) in order to point mysqld to your settings file etc. After mysqld is properly started (check if you can login with your mysql client), you would execute whatever steps you think are necessary to increase memory usage/trigger the memory problem. You could also just leave the server running for some time (for example, if you have experienced memory increase over time). Once you’ve done that, shutdown mysqld (again using your normal shutdown procedure), and then use the ms_print tool on the masif.out file to output a textual graph of memory usage: ms_print /tmp/massif.out An partial example output from a recent customer problem we worked on: 96.51% (68,180,839B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->50.57% (35,728,995B) 0x7A3CB0: my_malloc (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | ->10.10% (7,135,744B) 0x7255BB: Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->10.10% (7,135,744B) 0x728DAA: Log_event::read_log_event(st_io_cache*, st_mysql_mutex*, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->10.10% (7,135,744B) 0x5300A8: ??? (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->10.10% (7,135,744B) 0x5316EC: handle_slave_sql (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->10.10% (7,135,744B) 0x3ECF60677B: start_thread (in /lib64/libpthread-2.5.so) | | | ->10.10% (7,135,744B) 0x3ECEAD325B: clone (in /lib64/libc-2.5.so) [...] And, a few snapshots later: 92.81% (381,901,760B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->84.91% (349,404,796B) 0x7A3CB0: my_malloc (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | ->27.73% (114,084,096B) 0x7255BB: Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->27.73% (114,084,096B) 0x728DAA: Log_event::read_log_event(st_io_cache*, st_mysql_mutex*, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->27.73% (114,084,096B) 0x5300A8: ??? (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->27.73% (114,084,096B) 0x5316EC: handle_slave_sql (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->27.73% (114,084,096B) 0x3ECF60677B: start_thread (in /lib64/libpthread-2.5.so) | | | ->27.73% (114,084,096B) 0x3ECEAD325B: clone (in /lib64/libc-2.5.so) As you can see, a fair amount of (and in this case ‘too much’) memory is being allocated to the Log_event::read_log_event function. You can also see the memory allocated to the function grow significantly accross the snapshots. This example helped to pin down a memory leak bug on a filtered slave (read more in the actual bug report). Besides running Valgrind Massif in the way above, you can also change Massif’s snapshot options and other cmd line options to match the snapshot frequency etc. to your specific requirements. However, you’ll likely find that the default options will perform well in most scenario’s. For the technically advanced, you can take things one step further: use Valgrind’s gdbserver to obtain Massif snapshots on demand (i.e. you can command-line initiate Massif snapshots just before, during and after executing any commands which may alter memory usage significantly). Conclusion: using Valgrind Massif, and potentially Valgrind’s gdbserver (which was not used in the resolution of the example bug discussed), will help you to analyze the ins and outs of mysqld’s (or any other programs) memory usage. Credits: Staff @ a Percona customer, Ovais, Laurynas, Sergei, George, Vladislav, Raghavendra, Ignacio, myself & others at Percona all combined efforts leading to the information you can read above.
January 26, 2013
by Peter Zaitsev
· 4,757 Views
article thumbnail
Building SOLID Databases: Single Responsibility and Normalization
Introduction This instalment will cover the single responsibility principle in object-relational design, and its relationship both to data normalization and object-oriented application programming. While single responsibility is a fairly easy object-oriented principle to apply here, I think it is critical to explore in depth because it helps provide a clearer framework to address object-relational design. As in later instalments I will be using snippets of code developed elsewhere for other areas. These will not be full versions of what was written, but versions sufficient to show the basics of data structure and interface. Relations and Classes: Similarities Objects and classes, in the surface, look deceptively similar, to the point where one can look at relations as sets of classes, and in fact this equivalence is the basis of object-relational database design. Objects are data structures used to store state, which have identity and are tightly bound to interface. Relations are data structures which store state, and if they meet second normal form, have identity in the form of a primary key. Object-relational databases then provide interface and thus in an object-relational database, relations are classes, and contain sets of objects of a certain class. Relations and Classes: Differences So similar are objects and classes in structure that a very common mistake is to simply use a relational database management system as a simple object store. This tends to result in brittle database leading to a relatively brittle application. Consequently many of us see this approach as something of an anti-pattern. The reason why this doesn't work terribly well is not that the basic equivalence is bad but that relations and classes are used in very different ways. On the application layer, classes are used to model (and control) behavior, while in the database, relations and tuples are used to model information. Thus tying database structures to application classes in this way essentially overloads the data structures, turning the structures into reporting objects as well as behavior objects. Relations thus need to be seen not only as classes but as specialized classes used for persistent storage and reporting. Thus they have fundamentally different requirements than the behavior classes in the application and thus they have different reasons to change. An application class typically changes when there is a need for a change in behavior, while a relation should only change when there is a change in data retention and reporting. Relations have traditionally tended to be divorced from interface and this provides a great deal of power. While classes tend to be fairly opaque, relations tend to be very transparent. The reason here is that while both represent state information whether it is application state or other facts, objects traditionally encapsulate behavior (and thus act as building blocks of behavior), relations always encapsulate information and are building blocks of information. Thus the data structures of relations must be transparent while object-oriented design tends to push for less transparency and more abstraction. It is worth noting then that because these systems are designed to do different things, there are many DBA's who suggest encapsulating the entire database behind an API, defined by stored procedures. The typical problem with this approach is that loose coupling of the application to the interface is difficult (but see one of the first posts on this blog for a solution). When the db interface is tightly coupled to the application, then typically one ends up with problems on several levels, and it tends to sacrifice good application design for good relational database design. Single Responsibility Principle in Application Programming The single responsibility principle holds that every class should have a single responsibility which it should entirely encapsulate. A responsibility is defined as a reason to change. The canonical example of a violation of this principle is a class which might format and print a report. Because both data changes and cosmetic changes may require a change to the class, this would be a violation of the principle at issue. In an ideal world, we'd separate out the printing and the formatting so that cosmetic changes do not require changes when data changes are made and vice versa. The problem of course with the canonical example is that it is not self-contained. If you change the data in the report, it will almost certainly require cosmetic changes. You can try to automate those changes but only within limits, and you can abstract interfaces (dependency inversion) but in the end if you change the data in the report enough, cosmetic changes will become necessary. Additionally a "reason to change" is epistemologically problematic. Reasons foreseen are rarely if ever atomic, and so there is a real question as far as how far one pushes this. In terms of formatting a report, do we want to break out the class that adjusts to paper size so that if we want to move from US Letter to A4 we no longer have to change the rest of the cosmetic layout? Perfect separation of responsibilities in that example is thus impossible, as it probably always is --- you can only change business rules to a certain point before interfaces must change, and when that happens the cascading flow of necessary changes can be quite significant. The database is, however, quite different in that that responsibility of database-level code (including DDL and DML) is limited to the proposition that we should construct answers from known facts. This makes a huge difference in terms of single responsibility, and it is possible to develop mathematical definitions for single responsibility. Not only is this possible but it has been done. All of the normal forms from third on up address single responsibility. The Definition of Third Normal Form Quoting Wikipedia, Codd's definition states that a table is in 3NF if and only if both of the following conditions hold: The relation R (table) is in second normal form (2NF) Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R. A non-prime attribute is an attribute not part of a superkey. In essence what third normal form states is that every relation must contain a superkey and values functionally and directly dependent on that superkey. This will become more important as we look at how data anomilies dovetail with single responsibility. Normalization and Single Responsibility The process of database normalization is an attempt to create relational databases where data anomalies do not exist. Data anomalies occur where modifying data either requires modifying other data to maintain accuracy (where no independent fact changes are recorded), or where existing data may project current or historical facts not in existence (join anomilies). This process occurs by breaking down keys and superkeys, and their dependencies, such that data is tracked in smaller, self-contained units. Beginning at third normal form, one can see relations are forming single responsibilities of managing data directly dependent on their superkeys. From this point forward, relations' structures would change (assuming no decision to further decompose a relation into a higher normal form) if and only if a change is made to what data is tracked that is directly dependent on a superkey. The responsibility of the database layer is the storage of facts and the synthesis of answers. Since the storage relations themselves handle the first, normalization is a prerequisite to good object-relational design. The one major caveat here however is that first normal form's atomicity requirement must be interpreted slightly differently in object-relational setups because more complex data structures can be atomic compared to a purely relational design. In a purely relational database, the data types that can be used are relatively minor and therefore facts must be maximally decomposed. For example we might store an IP address plus network mask as 4 ints for the address and an int for the network mask, or we might store as a single 32-bit int plus another int for the network mask but the latter poses problems of display that the former does not. In an object-relational database, however, we might store the address as an array of 4 ints for IP v4 or, if we need better performance we might build a custom type. If storage is not a problem but ease of maintenance is, we might even define relations, domains, and such to hold IP addresses, and then store the tuple in a column with appropriate functional interfaces. None of these approaches necessarily violate first normal form, as long as the data type involved properly and completely encapsulates the required behavior. Where such encapsulation is problematic, however, they would violate 1NF because they can no longer be treated as atomic values. In all cases, the specific value has a 1:1 correspondence to an IP address. Additionally where the needs are different, storage, application interface, and reporting classes should be different (this can be handled with updateable views, object-relational interfaces, and the like). Object-Relational Interfaces and Single Responsibility For purely relational databases, normalization is sufficient to address single responsibility. Object-relational designs bring some additional complexity because some behavior may be encapsulated in the object interfaces. There are two fundamental cases where this may make a difference, namely in terms of compositional patterns and in terms of encapsulated data within columns. A compositional pattern in PostgreSQL typically would occur when we use table inheritance to manage commonly co-occuring fields which occur in ways which are functionally dependent on many other fields in a database. For example, we might have a notes abstract table, and then have various tables which inherit this, possibly as part of other larger tables. A common case where composition makes a big difference is in managing notes. People may want to attach notes to all kinds of other data in a database, and so one cannot say that the text or subject of a note is mutually dependent, A typical purely relational approach is to either have many independently managed notes tables or have a single global note table which stores notes for everything, and then have multiple join tables to add join dependencies. The problem with this is that the note data is usually dependent logically, if not mathematically, on the join dependency, and so there is no reasonable way of expressing this without a great deal of complexity in the database design. An object-relational approach might be to have multiple notes tables, but have them inherit the table structure of a common notes table. This table can then be expanded, interfaces added as needed, and it should fill the single responsibility principle even though we might not be able to say that there is a natural functional dependency within the table itself. The second case has to do with storing complex information in columns. Here stability and robustness of code is especially important, and traditional approaches of the single responsibility principle apply directly to the contained data type. Example: Machine Configuration Database and SMTP configuration One of my current projects is building a network configuration database for a LedgerSMBhosting business I am helping to found (more on this soon). For competitive reasons I cannot display my whole code here. However, what I would like to do is show a very abbreviated version here as I used to solve a very specific issue. One of the basic challenges in a network configuration database is that the direct functional dependencies for a given machine may become quite complex when we assume that a given piece of network software is not likely to be running more than once on a given machine. Additionally we often want to ensure that certain sorts of software are set to be configured for certain types of machines, and so constraints can exist that force wider tables. The width and complexity of some configuration tables can possibly pose a management problem over time for the reason that they may not be obviously broken into manageable chunks of columns. One possible solution is to decompose the storage class into smaller mix-ins, each of which expresses a set of functional dependencies on a specific key, fully encapsulating a single responsibility. The overall storage class then exists to manage cross-mixin constraints and handle the actual physical storage. The data can then be presented as a unified table, or as multiple joined tables (and this works even where views would add significant complexity). In this way the smaller sub-tables can be given the responsibility of managing the configuration of specific pieces of software. We might therefore have tables like: -- abstract table, contains no data CREATE TABLE mi_smtp_config ( mi_id bigint, smtp_hostname text, smtp_forward_to text ); CREATE TABLE machine_instance ( mi_id bigserial, mi_name text not null, inservice_date date not null, retire_date date. .... ) INHERITS (mi_smtp_config, ...); The major advantage to this approach is that we can easily check and add which fields are set up to configure which software, without looking through a much larger, wider table. This also provides additional interfaces for related data, and the like. For example, "select * from mi_smtp_config" is directly equivalent of "select (mi::mi_smtp_config).* from machine_instance mi; Conclusions When we think of relations as specialized "fact classes" as opposed to "behavior classes" in the application world, the idea of the single responsibility principle works quite well with relational databases, particularly when paired with other encapsulation processes like stored procedures and views. In object-relational designs, the principle can be used as a general guide for further decomposing relations into mix-in classes, or creating intelligent data types for attributes, and it becomes possible to solve a number of problems in this regard without breaking normalization rules.
January 25, 2013
by Chris Travers
· 7,892 Views
article thumbnail
Fun with the MySQL pager command
This post comes from Stephane Combaudon at the MySQL Performance Blog. Last time I wrote about a few tips that can make you more efficient when using the command line on Unix. Today I want to focus more on pager. The most common usage of pager is to set it to a Unix pager such as less. It can be very useful to view the result of a command spanning over many lines (for instance SHOW ENGINE INNODB STATUS): mysql> pager less PAGER set to 'less' mysql> show engine innodb status\G [...] Now you are inside less and you can easily navigate through the result set (use q to quit, space to scroll down, etc). Reminder: if you want to leave your custom pager, this is easy, just run pager: mysql> pager Default pager wasn't set, using stdout. Or \n: mysql> \n PAGER set to stdout But the pager command is not restricted to such basic usage! You can pass the output of queries to most Unix programs that are able to work on text. We have discussed the topic, but here are a few more examples. Discarding the result set Sometimes you don’t care about the result set, you only want to see timing information. This can be true if you are trying different execution plans for a query by changing indexes. Discarding the result is possible with pager: mysql> pager cat > /dev/null PAGER set to 'cat > /dev/null' # Trying an execution plan mysql> SELECT ... 1000 rows in set (0.91 sec) # Another execution plan mysql> SELECT ... 1000 rows in set (1.63 sec) Now it’s much easier to see all the timing information on one screen. Comparing result sets Let’s say you are rewriting a query and you want to check if the result set is the same before and after rewrite. Unfortunately, it has a lot of rows: mysql> SELECT ... [..] 989 rows in set (0.42 sec) Instead of manually comparing each row, you can calculate a checksum and only compare the checksum: mysql> pager md5sum PAGER set to 'md5sum' # Original query mysql> SELECT ... 32a1894d773c9b85172969c659175d2d - 1 row in set (0.40 sec) # Rewritten query - wrong mysql> SELECT ... fdb94521558684afedc8148ca724f578 - 1 row in set (0.16 sec) Hmmm, checksums don’t match, something is wrong. Let’s retry: # Rewritten query - correct mysql> SELECT ... 32a1894d773c9b85172969c659175d2d - 1 row in set (0.17 sec) Checksums are identical, the rewritten query is much likely to produce the same result as the original one. Cleaning up SHOW PROCESSLIST If you have lots of connections on your MySQL, it’s very difficult to read the output of SHOW PROCESSLIST. For instance, if you have several hundreds of connections and you want to know how many connections are sleeping, manually counting the rows from the output of SHOW PROCESSLIST is probably not the best solution. With pager, it is straightforward: mysql> pager grep Sleep | wc -l PAGER set to 'grep Sleep | wc -l' mysql> show processlist; 337 346 rows in set (0.00 sec) This should be read as ’337 out 346 connections are sleeping’. Slightly more complicated now: you want to know the number of connections for each status: mysql> pager awk -F '|' '{print $6}' | sort | uniq -c | sort -r PAGER set to 'awk -F '|' '{print $6}' | sort | uniq -c | sort -r' mysql> show processlist; 309 Sleep 3 2 Query 2 Binlog Dump 1 Command Astute readers will have noticed that these questions could have been solved by querying INFORMATION_SCHEMA. For instance, counting the number of sleeping connections can be done with: mysql> SELECT COUNT(*) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND='Sleep'; +----------+ | COUNT(*) | +----------+ | 320 | +----------+ and counting the number of connection for each status can be done with: mysql> SELECT COMMAND,COUNT(*) TOTAL FROM INFORMATION_SCHEMA.PROCESSLIST GROUP BY COMMAND ORDER BY TOTAL DESC; +-------------+-------+ | COMMAND | TOTAL | +-------------+-------+ | Sleep | 344 | | Query | 5 | | Binlog Dump | 2 | +-------------+-------+ True, but: It’s nice to know several ways to get the same result Some of you may feel more comfortable with writing SQL queries, while others will prefer command line tools Conclusion As you can see, pager is your friend! It’s very easy to use and it can solve problems in an elegant and very efficient way. You can even write your custom script (if it is too complicated to fit in a single line) and pass it to the pager.
January 24, 2013
by Peter Zaitsev
· 14,204 Views
article thumbnail
How to Publish Maven Site Docs to BitBucket or GitHub Pages
In this post we will Utilize GitHub and/or BitBucket's static web page hosting capabilities to publish our project's Maven 3 Site Documentation. Each of the two SCM providers offer a slightly different solution to host static pages. The approach spelled out in this post would also be a viable solution to "backup" your site documentation in a supported SCM like Git or SVN. This solution does not directly cover site documentation deployment covered by the maven-site-plugin and the Wagon library (scp, WebDAV or FTP). There is one main project hosted on GitHub that I have posted with the full solution. The project URL is https://github.com/mike-ensor/clickconcepts-master-pom/. The POM has been pushed to Maven Central and will continue to be updated and maintained. com.clickconcepts.project master-site-pom 0.16 GitHub Pages GitHub hosts static pages by using a special branch "gh-pages" available to each GitHub project. This special branch can host any HTML and local resources like JavaScript, images and CSS. There is no server side development. To navigate to your static pages, the URL structure is as follows: http://.github.com/ An example of the project I am using in this blog post: http://mike-ensor.github.com/clickconcepts-master-pom/ where the first bold URL segment is a username and the second bold URL segment is the project. GitHub does allow you to create a base static hosted static site for your username by creating a repository with your username.github.com. The contents would be all of your HTML and associated static resources. This is not required to post documentation for your project, unlike the BitBucket solution. There is a GitHub Site plugin that publishes site documentation via GitHub's object API but this is outside the scope of this blog post because it does not provide a single solution for GitHub and BitBucket projects using Maven 3. BitBucket BitBucket provides a similar service to GitHub in that it hosts static HTML pages and their associated static resources. However, there is one large difference in how those pages are stored. Unlike GitHub, BitBucket requires you to create a new repository with a name fitting the convention. The files will be located on the master branch and each project will need to be a directory off of the root. mikeensor.bitbucket.org/ /some-project +index.html +... /css /img /some-other-project +index.html +... /css /img index.html .git .gitignore The naming convention is as follows: .bitbucket.org An example of a BitBucket static pages repository for me would be: http://mikeensor.bitbucket.org/. The structure does not require that you create an index.html page at the root of the project, but it would be advisable to avoid 404s. Generating Site Documentation Maven provides the ability to post documentation for your project by using the maven-site-plugin. This plugin is difficult to use due to the many configuration options that oftentimes are not well documented. There are many blog posts that can help you write your documentation including my post on maven site documentation. I did not mention how to use "xdoc", "apt" or other templating technologies to create documentation pages, but not to fear, I have provided this in my GitHub project. Putting it all Together The Maven SCM Publish plugin (http://maven.apache.org/plugins/maven-scm-publish-plugin/ publishes site documentation to a supported SCM. In our case, we are going to use Git through BitBucket or GitHub. Maven SCM Plugin does allow you to publish multi-module site documentation through the various properties, but the scope of this blog post is to cover single/mono module projects and the process is a bit painful. Take a moment to look at the POM file located in the clickconcepts-master-pom project. This master POM is rather comprehensive and the site documentation is only one portion of the project, but we will focus on the site documentation. There are a few things to point out here, first, the scm-publish plugin and the idiosyncronies when implementing the plugin. In order to create the site documentation, the "site" plugin must first be run. This is accomplished by running site:site. The plugin will generate the documentation into the "target/site" folder by default. The SCM Publish Plugin, by default, looks for the site documents to be in "target/staging" and is controlled by the content parameter. As you can see, there is a mismatch between folders. NOTE: My first approach was to run the site:stage command which is supposed to put the site documents into the "target/staging" folder. This is not entirely correct, the site plugin combines with the distributionManagement.site.url property to stage the documents, but there is very strange behavior and it is not documented well. In order to get the site plugin's site documents and the SCM Publish's location to match up, use the content property and set that to the location of the Site Plugin output (). If you are using GitHub, there is no modification to the siteOutputDirectory needed, however, if you are using BitBucket, you will need to modify the property to add in a directory layer into the site documentation generation (see above for differences between GitHub and BitBucket pages). The second property will tell the SCM Publish Plugin to look at the root "site" folder so that when the files are copied into the repository, the project folder will be the containing folder. The property will look like: ${project.build.directory}/site/ ${project.artifactId} ${project.build.directory} /site Next we will take a look at the custom properties defined in the master POM and used by the SCM Publish Plugin above. Each project will need to define several properties to use the Master POM that are used within the plugins during the site publishing. Fill in the variables with your own settings. BitBucket ... ... master scm:git:[email protected]:mikeensor/mikeensor.bitbucket.org.git ${project.build.directory}/site/${project.artifactId} ${project.build.directory}/site ${changelog.bitbucket.fileUri} ${changelog.revision.bitbucket.fileUri} ... ... GitHub ... ... gh-pages scm:git:[email protected]:mikeensor/clickconcepts-master-pom.git ${changelog.github.fileUri} ${changelog.revision.github.fileUri} ... ... NOTE: changelog parameters are required to use the Master POM and are not directly related to publishing site docs to GitHub or BitBucket How to Generate If you are using the Master POM (or have abstracted out the Site Plugin and the SCM Plugin) then to generate and publish the documentation is simple. mvn clean site:site scm-publish:publish-scm mvn clean site:site scm-publish:publish-scm -Dscmpublish.dryRun=true Gotchas In the SCM Publish Plugin documentation's "tips" they recommend creating a location to place the repository so that the repo is not cloned each time. There is a risk here in that if there is a git repository already in the folder, the plugin will overwrite the repository with the new site documentation. This was discovered by publishing two different projects and having my root repository wiped out by documentation from the second project. There are ways to mitigate this by adding in another folder layer, but make sure you test often! Another gotcha is to use the -Dscmpublish.dryRun=true to test out the site documentation process without making the SCM commit and push Project and Documentation URLs Here is a list of the fully working projects used to create this blog post: Master POM with Site and SCM Publish plugins &ndash https://github.com/mike-ensor/clickconcepts-master-pom. Documentation URL: http://mike-ensor.github.com/clickconcepts-master-pom/ Child Project using Master Pom &ndash http://mikeensor.bitbucket.org/fest-expected-exception. Documentation URL: http://mikeensor.bitbucket.org/fest-expected-exception/
January 23, 2013
by Mike Ensor
· 13,405 Views
article thumbnail
Spring Data JDBC Generic DAO Implementation: Most Lightweight ORM Ever
I am thrilled to announce first version of my Spring Data JDBC repository project. The purpose of this open source library is to provide generic, lightweight and easy to use DAO implementation for relational databases based on JdbcTemplate from Spring framework, compatible with Spring Data umbrella of projects. Design objectives Lightweight, fast and low-overhead. Only a handful of classes, no XML, annotations, reflection This is not full-blown ORM. No relationship handling, lazy loading, dirty checking, caching CRUD implemented in seconds For small applications where JPA is an overkill Use when simplicity is needed or when future migration e.g. to JPA is considered Minimalistic support for database dialect differences (e.g. transparent paging of results) Features Each DAO provides built-in support for: Mapping to/from domain objects through RowMapper abstraction Generated and user-defined primary keys Extracting generated key Compound (multi-column) primary keys Immutable domain objects Paging (requesting subset of results) Sorting over several columns (database agnostic) Optional support for many-to-one relationships Supported databases (continuously tested): MySQL PostgreSQL H2 HSQLDB Derby ...and most likely most of the others Easily extendable to other database dialects via SqlGenerator class. Easy retrieval of records by ID API Compatible with Spring Data PagingAndSortingRepository abstraction, all these methods are implemented for you: public interface PagingAndSortingRepository extends CrudRepository { T save(T entity); Iterable save(Iterable entities); T findOne(ID id); boolean exists(ID id); Iterable findAll(); long count(); void delete(ID id); void delete(T entity); void delete(Iterable entities); void deleteAll(); Iterable findAll(Sort sort); Page findAll(Pageable pageable); } Pageable and Sort parameters are also fully supported, which means you get paging and sorting by arbitrary properties for free. For example say you have userRepository extending PagingAndSortingRepository interface (implemented for you by the library) and you request 5th page of USERS table, 10 per page, after applying some sorting: Page page = userRepository.findAll( new PageRequest( 5, 10, new Sort( new Order(DESC, "reputation"), new Order(ASC, "user_name") ) ) ); Spring Data JDBC repository library will translate this call into (PostgreSQL syntax): SELECT * FROM USERS ORDER BY reputation DESC, user_name ASC LIMIT 50 OFFSET 10 ...or even (Derby syntax): SELECT * FROM ( SELECT ROW_NUMBER() OVER () AS ROW_NUM, t.* FROM ( SELECT * FROM USERS ORDER BY reputation DESC, user_name ASC ) AS t ) AS a WHERE ROW_NUM BETWEEN 51 AND 60 No matter which database you use, you'll get Page object in return (you still have to provide RowMapper yourself to translate from ResultSet to domain object. If you don't know Spring Data project yet, Page is a wonderful abstraction, not only encapsulating List , but also providing metadata such as total number of records, on which page we currently are, etc. Reasons to use You consider migration to JPA or even some NoSQL database in the future. Since your code will rely only on methods defined in PagingAndSortingRepository and CrudRepository from Spring Data Commons umbrella project you are free to switch from JdbcRepository implementation (from this project) to: JpaRepository, MongoRepository, GemfireRepository or GraphRepository. They all implement the same common API. Of course don't expect that switching from JDBC to JPA or MongoDB will be as simple as switching imported JAR dependencies - but at least you minimize the impact by using same DAO API. You need a fast, simple JDBC wrapper library. JPA or even MyBatis is an overkill You want to have full control over generated SQL if needed You want to work with objects, but don't need lazy loading, relationship handling, multi-level caching, dirty checking... You need CRUD and not much more You want to by DRY You are already using Spring or maybe even JdbcTemplate, but still feel like there is too much manual work You have very few database tables Getting started For more examples and working code don't forget to examine project tests. Prerequisites Maven coordinates: com.blogspot.nurkiewicz jdbcrepository 0.1 Unfortunately the project is not yet in maven central repository. For the time being you can install the library in your local repository by cloning it: $ git clone git://github.com/nurkiewicz/spring-data-jdbc-repository.git $ git checkout 0.1 $ mvn javadoc:jar source:jar install In order to start your project must have DataSource bean present and transaction management enabled. Here is a minimal MySQL configuration: @EnableTransactionManagement @Configuration public class MinimalConfig { @Bean public PlatformTransactionManager transactionManager() { return new DataSourceTransactionManager(dataSource()); } @Bean public DataSource dataSource() { MysqlConnectionPoolDataSource ds = new MysqlConnectionPoolDataSource(); ds.setUser("user"); ds.setPassword("secret"); ds.setDatabaseName("db_name"); return ds; } } Entity with auto-generated key Say you have a following database table with auto-generated key (MySQL syntax): CREATE TABLE COMMENTS ( id INT AUTO_INCREMENT, user_name varchar(256), contents varchar(1000), created_time TIMESTAMP NOT NULL, PRIMARY KEY (id) ); First you need to create domain object User mapping to that table (just like in any other ORM): public class Comment implements Persistable { private Integer id; private String userName; private String contents; private Date createdTime; @Override public Integer getId() { return id; } @Override public boolean isNew() { return id == null; } //getters/setters/constructors/... } Apart from standard Java boilerplate you should notice implementing Persistable where Integer is the type of primary key. Persistable is an interface coming from Spring Data project and it's the only requirement we place on your domain object. Finally we are ready to create our CommentRepository DAO: @Repository public class CommentRepository extends JdbcRepository { public CommentRepository() { super(ROW_MAPPER, ROW_UNMAPPER, "COMMENTS"); } public static final RowMapper ROW_MAPPER = //see below private static final RowUnmapper ROW_UNMAPPER = //see below @Override protected Comment postCreate(Comment entity, Number generatedId) { entity.setId(generatedId.intValue()); return entity; } } First of all we use @Repository annotation to mark DAO bean. It enables persistence exception translation. Also such annotated beans are discovered by CLASSPATH scanning. As you can see we extend JdbcRepository which is the central class of this library, providing implementations of all PagingAndSortingRepository methods. Its constructor has three required dependencies: RowMapper , RowUnmapper and table name. You may also provide ID column name, otherwise default "id" is used. If you ever used JdbcTemplate from Spring, you should be familiar with RowMapper interface. We need to somehow extract columns from ResultSet into an object. After all we don't want to work with raw JDBC results. It's quite straightforward: public static final RowMapper ROW_MAPPER = new RowMapper () { @Override public Comment mapRow(ResultSet rs, int rowNum) throws SQLException { return new Comment( rs.getInt("id"), rs.getString("user_name"), rs.getString("contents"), rs.getTimestamp("created_time") ); } }; RowUnmapper comes from this library and it's essentially the opposite of RowMapper : takes an object and turns it into a Map . This map is later used by the library to construct SQL CREATE / UPDATE queries: private static final RowUnmapper ROW_UNMAPPER = new RowUnmapper () { @Override public Map mapColumns(Comment comment) { Map mapping = new LinkedHashMap (); mapping.put("id", comment.getId()); mapping.put("user_name", comment.getUserName()); mapping.put("contents", comment.getContents()); mapping.put("created_time", new java.sql.Timestamp(comment.getCreatedTime().getTime())); return mapping; } }; If you never update your database table (just reading some reference data inserted elsewhere) you may skip RowUnmapper parameter or use MissingRowUnmapper. Last piece of the puzzle is the postCreate() callback method which is called after an object was inserted. You can use it to retrieve generated primary key and update your domain object (or return new one if your domain objects are immutable). If you don't need it, just don't override postCreate() . Check out JdbcRepositoryGeneratedKeyTest for a working code based on this example. By now you might have a feeling that, compared to JPA or Hibernate, there is quite a lot of manual work. However various JPA implementations and other ORM frameworks are notoriously known for introducing significant overhead and manifesting some learning curve. This tiny library intentionally leaves some responsibilities to the user in order to avoid complex mappings, reflection, annotations... all the implicitness that is not always desired. This project is not intending to replace mature and stable ORM frameworks. Instead it tries to fill in a niche between raw JDBC and ORM where simplicity and low overhead are key features. Entity with manually assigned key In this example we'll see how entities with user-defined primary keys are handled. Let's start from database model: CREATE TABLE USERS ( user_name varchar(255), date_of_birth TIMESTAMP NOT NULL, enabled BIT(1) NOT NULL, PRIMARY KEY (user_name) ); ...and User domain model: public class User implements Persistable { private transient boolean persisted; private String userName; private Date dateOfBirth; private boolean enabled; @Override public String getId() { return userName; } @Override public boolean isNew() { return !persisted; } public User withPersisted(boolean persisted) { this.persisted = persisted; return this; } //getters/setters/constructors/... } Notice that special persisted transient flag was added. Contract of CrudRepository.save() from Spring Data project requires that an entity knows whether it was already saved or not ( isNew() ) method - there are no separate create() and update() methods. Implementing isNew() is simple for auto-generated keys (see Comment above) but in this case we need an extra transient field. If you hate this workaround and you only insert data and never update, you'll get away with return true all the time from isNew() . And finally our DAO, UserRepository bean: @Repository public class UserRepository extends JdbcRepository { public UserRepository() { super(ROW_MAPPER, ROW_UNMAPPER, "USERS", "user_name"); } public static final RowMapper ROW_MAPPER = //... public static final RowUnmapper ROW_UNMAPPER = //... @Override protected User postUpdate(User entity) { return entity.withPersisted(true); } @Override protected User postCreate(User entity, Number generatedId) { return entity.withPersisted(true); } } "USERS" and "user_name" parameters designate table name and primary key column name. I'll leave the details of mapper and unmapper (see source code). But please notice postUpdate() and postCreate() methods. They ensure that once object was persisted, persisted flag is set so that subsequent calls to save() will update existing entity rather than trying to reinsert it. Check out JdbcRepositoryManualKeyTest for a working code based on this example. Compound primary key We also support compound primary keys (primary keys consisting of several columns). Take this table as an example: CREATE TABLE BOARDING_PASS ( flight_no VARCHAR(8) NOT NULL, seq_no INT NOT NULL, passenger VARCHAR(1000), seat CHAR(3), PRIMARY KEY (flight_no, seq_no) ); I would like you to notice the type of primary key in Peristable : public class BoardingPass implements Persistable { private transient boolean persisted; private String flightNo; private int seqNo; private String passenger; private String seat; @Override public Object[] getId() { return pk(flightNo, seqNo); } @Override public boolean isNew() { return !persisted; } //getters/setters/constructors/... } Unfortunately we don't support small value classes encapsulating all ID values in one object (like JPA does with @IdClass), so you have to live with Object[] array. Defining DAO class is similar to what we've already seen: public class BoardingPassRepository extends JdbcRepository { public BoardingPassRepository() { this("BOARDING_PASS"); } public BoardingPassRepository(String tableName) { super(MAPPER, UNMAPPER, new TableDescription(tableName, null, "flight_no", "seq_no") ); } public static final RowMapper ROW_MAPPER = //... public static final RowUnmapper UNMAPPER = //... } Two things to notice: we extend JdbcRepository and we provide two ID column names just as expected: "flight_no", "seq_no" . We query such DAO by providing both flight_no and seq_no (necessarily in that order) values wrapped by Object[] : BoardingPass pass = repository.findOne(new Object[] {"FOO-1022", 42}); No doubts, this is cumbersome in practice, so we provide tiny helper method which you can statically import: import static com.blogspot.nurkiewicz.jdbcrepository.JdbcRepository.pk; //... BoardingPass foundFlight = repository.findOne(pk("FOO-1022", 42)); Check out JdbcRepositoryCompoundPkTest for a working code based on this example. Transactions This library is completely orthogonal to transaction management. Every method of each repository requires running transaction and it's up to you to set it up. Typically you would place @Transactional on service layer (calling DAO beans). I don't recommend placing @Transactional over every DAO bean. Caching Spring Data JDBC repository library is not providing any caching abstraction or support. However adding @Cacheable layer on top of your DAOs or services using caching abstraction in Spring is quite straightforward. See also: @Cacheable overhead in Spring. Contributions ..are always welcome. Don't hesitate to submit bug reports and pull requests. Biggest missing feature now is support for MSSQL and Oracle databases. It would be terrific if someone could have a look at it. Testing This library is continuously tested using Travis (). Test suite consists of 265 tests (53 distinct tests each run against 5 different databases: MySQL, PostgreSQL, H2, HSQLDB and Derby. When filling bug reports or submitting new features please try including supporting test cases. Each pull request is automatically tested on a separate branch. Building After forking the official repository building is as simple as running: $ mvn install You'll notice plenty of exceptions during JUnit test execution. This is normal. Some of the tests run against MySQL and PostgreSQL available only on Travis CI server. When these database servers are unavailable, whole test is simply skipped: Results : Tests run: 265, Failures: 0, Errors: 0, Skipped: 106 Exception stack traces come from root AbstractIntegrationTest. Design Library consists of only a handful of classes, highlighted in the diagram below: JdbcRepository is the most important class that implements all PagingAndSortingRepository methods. Each user repository has to extend this class. Also each such repository must at least implement RowMapper and RowUnmapper (only if you want to modify table data). SQL generation is delegated to SqlGenerator. PostgreSqlGenerator. and DerbySqlGenerator are provided for databases that don't work with standard generator. License This project is released under version 2.0 of the Apache License (same as Spring framework).
January 22, 2013
by Tomasz Nurkiewicz
· 76,606 Views · 2 Likes
article thumbnail
ActiveMQ: Securing the ActiveMQ Web Console in Tomcat
This post will demonstrate how to secure the ActiveMQ WebConsole with a username and password when deployed in the Apache Tomcat web server. The Apache ActiveMQ documentation on the Web Console provides a good example of how this is done for Jetty, which is the default web server shipped with ActiveMQ, and this post will show how this is done when deploying the web console in Tomcat. To demonstrate, the first thing you will need to do is grab the latest distribution of ActiveMQ. For the purpose of this demonstration I will be using the 5.5.1-fuse-09-16 release which can be obtained via the Red Hat Support Portal or via the FuseSource repository: Red Hat Support Portal ActiveMQ ActiveMQ Web Console Tomcat mysql-connector-java-5.1.18-bin.jar Once you have the distributions, extract and start the broker. If you don't already have Tomcat installed you can grab it from the link above as well. I am using Tomcat 6.0.36 in this demonstration. Next, create a directory called activemq-console in the Tomcat webapps directory and extract the ActiveMQ Web Console war by using the jar -xf command. With all the binaries installed and our broker running we can begin configuring our web app and Tomcat to secure the Web Console. First, open the ActiveMQ Web Console's web descriptor, this can be found in the following location: activemq-console/WEB-INF/web.xml, and add the following configuration: Authenticate entire app /* GET POST activemq NONE BASIC This configuration enables the security constraint on the entire application as noted with /* url-pattern. Another point to notice is the auth-constraint which has been set to the activemq role, we will define this shortly. And lastly, note that this is configured for basic authentication. This means the username password are base64 encoded but not truly encrypted. To improve the security further you could enable a secure transport such as SSL. Now lets configure the Tomcat server to validate our activemq role we just specified in the web app. Out-of-the-box Tomcat is configured to use the UserDataBaseRealm. This is configured in [TOMCAT_HOME]/conf/server.xml. This instructs the web server to validate against the tomcat-users.xml file which can be found in [TOMCAT_HOME]/conf as well. Open the tomcat-users.xml file and add the following: This defines our activemq role and configures a user with that role. The last thing we need to do before starting our Tomcat server is add the required configuration to communicate with the broker. First, copy the activemq-all jar into the Tomcat lib directory. Next, open the catalina.sh/catalina.bat startup script and add the following configuration to initialize the JAVA_OPTS variable: JAVA_OPTS="-Dwebconsole.jms.url=tcp://localhost:61616 -Dwebconsole.jmx.url=service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi -Dwebconsole.jmx.user= -Dwebconsole.jmx.password=" Now we are ready to start the Tomcat server. Once started, you should be able to access the ActiveMQ Web Console at the following URL: http://localhost:8080/activemq-console. You should be prompted with something similar to this dialog: Once you enter the user name and password you should get logged into the ActiveMQ Web Console. As I mentioned before the user name and password are base64 encoded and each request is authenticated against the UserDataBaseRealm. The browser will retain your username and password in memory so you will need to exit the browser to end the session. What you have seen so far is a simple authentication using the UserDataBaseRealm which contains a list of users in a text file. Next we will look at configuring the ActiveMQ Web Console to use a JDBCRealm which will authenticate against users stored in a database. Lets first create a new database as follows using a MySQL database: mysql> CREATE DATABASE tomcat_users; Query OK, 1 row affected (0.00 sec) mysql> Provide the appropriate permissions for this database to a database user: mysql> GRANT ALL ON tomcat_users.* TO 'activemq'@'localhost'; Query OK, 0 rows affected (0.02 sec) mysql> Then you can login to the database and create the following tables: mysql> USE tomcat_users; Database changed mysql> CREATE TABLE tomcat_users ( -> user_name varchar(20) NOT NULL PRIMARY KEY, -> password varchar(32) NOT NULL -> ); Query OK, 0 rows affected (0.10 sec) mysql> CREATE TABLE tomcat_roles ( -> role_name varchar(20) NOT NULL PRIMARY KEY -> ); Query OK, 0 rows affected (0.05 sec) mysql> CREATE TABLE tomcat_users_roles ( -> user_name varchar(20) NOT NULL, -> role_name varchar(20) NOT NULL, -> PRIMARY KEY (user_name, role_name), -> CONSTRAINT tomcat_users_roles_foreign_key_1 FOREIGN KEY (user_name) REFERENCES tomcat_users (user_name), -> CONSTRAINT tomcat_users_roles_foreign_key_2 FOREIGN KEY (role_name) REFERENCES tomcat_roles (role_name) -> ); Query OK, 0 rows affected (0.06 sec) mysql> Next seed the tables with the user and role information: mysql> INSERT INTO tomcat_users (user_name, password) VALUES ('admin', 'dbpass'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO tomcat_roles (role_name) VALUES ('activemq'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO tomcat_users_roles (user_name, role_name) VALUES ('admin', 'activemq'); Query OK, 1 row affected (0.00 sec) mysql> Now we can verify the information in our database: mysql> select * from tomcat_users; +-----------+----------+ | user_name | password | +-----------+----------+ | admin | dbpass | +-----------+----------+ 1 row in set (0.00 sec) mysql> select * from tomcat_users_roles; +-----------+-----------+ | user_name | role_name | +-----------+-----------+ | admin | activemq | +-----------+-----------+ 1 row in set (0.00 sec) mysql> If you left the Tomcat server running from the first part of this demonstration shut it down at this time so we can change the configuration to use the JDBCRealm. In the server.xml file, located in [TOMCAT_HOME]/conf, we need to comment out the existing UserDataBaseRealm and add the JDBCRealm: Looking at the JDBCRealm, you can see we are using the mysql JDBC driver, the connection URL is configured to connect to the tomcat_users database using the specified credentials, and the table and column names used in our database have been specified. Now the Tomcat server can be started again. This time when you login to the ActiveMQ Web Console use the username and password specified when loading the database tables. That's all there is to it, you now know how to configure the ActiveMQ Web Console to use Tomcat's UserDatabaseRealm and JDBCRealm. The following sites were helpful in gathering this information: http://activemq.apache.org/web-console.html http://www.avajava.com/tutorials/lessons/how-do-i-use-a-jdbc-realm-with-tomcat-and-mysql.html?page=1 http://oreilly.com/pub/a/java/archive/tomcat-tips.html?page=1
January 21, 2013
by Jason Sherman
· 12,208 Views
article thumbnail
Assign a Fixed IP to an AWS EC2 Instance
as described in my previous post the ip (and dns) of your running ec2 ami will change after a reboot of that instance. of course this makes it very hard to make your applications on that machine available for the outside world, like in this case our wordpress blog. that is where elastic ip comes to the rescue. with this feature you can assign a static ip to your instance. assign one to your application as follows: click on the elastic ips link in the aws console allocate a new address associate the address with a running instance right click to associate the ip with an instance: pick the instance to assign this ip to: note the ip being assigned to your instance if you go to the ip address you were assigned then you see the home page of your server: and the nicest thing is that if you stop and start your instance you will receive a new public dns but your instance is still assigned to the elastic ip address: one important note: as long as an elastic ip address is associated with a running instance, there is no charge for it. however an address that is not associated with a running instance costs $0.01/hour. this prevents users from ‘reserving’ addresses while they are not being used.
January 20, 2013
by Eric Genesky
· 22,907 Views
article thumbnail
Why is MongoDB Wildly Popular? It’s a Data Structure Thing.
Curator's Note: The content of this article was originally published over at the MongoLab blog . “Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.” - Eric Raymond, in The Cathedral and the Bazaar, 1997 Linguistic innovation The fundamental task of programming is telling a computer how to do something. Because of this, much of the innovation in the field of software development has been linguistic innovation; that is, innovation in the ease and effectiveness with which a programmer is able to instruct a computer system. While machines operate in binary, we don’t talk to them that way. Every decade has introduced higher-level programming languages, and with each, an advancement in the ability of programmers to express themselves. These advancements include improvements in how we express data structures as well as how we express algorithms. The Object-Relational impedance mismatch Almost all modern programming languages support OO, and when we model entities in our code, we usually model them using a composition of primitive types (ints, strings, etc…), arrays, and objects. While each language might handle the details differently, the idea of nested object structures has become our universal language for describing ‘things’. The data structures we use to persist data have not evolved at the same rate. For the past 30 years the primary data structure for persistent data has been the Table – a set of Rows comprised of Columns containing scalar values (ints, strings, etc…). This is the world of the relational database, popularized in the 1980′s by its transactionality, speedy queries, space efficiency over other contemporary database systems, and a meat-eating ORCL salesforce. The difference between the way we model things in code, via objects, and the way they are represented in persistent storage, via tables, has been the source of much difficulty for programmers. Millennia of man-effort have been put against solving the problem of changing the shape of data from the object form to the relational form and back. Tools called Object-Relational Mapping systems (ORMs) exist for every object-oriented language in existence, and even with these tools, almost any programmer will complain that doing O/R mapping in any meaningful way is a time-consuming chore. Ted Neward hit it spot on when he said: “Object-Relational mapping is the Vietnam of our industry” There were attempts made at object databases in the 90s, but there was no technology that ever became a real alternative to the relational database. The document database, and in particular MongoDB, is the first successful Web-era object store, and because of that, represents the first big linguistic innovation in persistent data structures in a very long time. Instead of flat, two-dimensional tables of records, we have collections of rich, recursive, N-dimensional objects (a.k.a. documents) for records. An Example: the Blog Post Consider the blog post. Most likely you would have a class / object structure for modeling blog posts in your code, but if you are using a relational database to store your blog data, each entry would be spread across a handful of tables. As a developer you, need to get know how to convert the each ‘BlogPost’ object to and from the set of tables that house them in the relational model. A different approach Using MongoDB, your blog posts can be stored in a single collection, with each entry looking like this: { _id: 1234, author: { name: "Bob Davis", email : "[email protected]" }, post: "In these troubled times I like to …", date: { $date: "2010-07-12 13:23UTC" }, location: [ -121.2322, 42.1223222 ], rating: 2.2, comments: [ { user: "[email protected]", upVotes: 22, downVotes: 14, text: "Great point! I agree" }, { user: "[email protected]", upVotes: 421, downVotes: 22, text: "You are a moron" } ], tags: [ "Politics", "Virginia" ] } With a document database your data is stored almost exactly as it is represented in your program. There is no complex mapping exercise (although one often chooses to bind objects to instances of particular classes in code). What’s MongoDB good for? MongoDB is great for modeling many of the entities that back most modern web-apps, either consumer or enterprise: Account and user profiles: can store arrays of addresses with ease CMS: the flexible schema of MongoDB is great for heterogeneous collections of content types Form data: MongoDB makes it easy to evolve structure of form data over time Blogs / user-generated content: can keep data with complex relationships together in one object Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas System configuration: just a nice object graph of configuration values, which is very natural in MongoDB Log data of any kind: structured log data is the future Graphs: just objects and pointers – a perfect fit Location based data: MongoDB understands geo-spatial coordinates and natively supports geo-spatial indexing Looking forward: the data is the interface There is a famous quote by Eric Raymond, in The Cathedral and the Bazaar (rephrasing an earlier quote by Fred Brooks from the famous The Mythical Man-Month): “Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.” Data structures embody the essence of our programs and our ideas. Therefore, as programmers, we are constantly inviting innovation in the ease with which we can define expressive data structures to model our application domain. People often ask me why MongoDB is so wildly popular. I tell them it’s a data structure thing. While MongoDB may have ridden onto the scene under the banner of scalability with the rest of the NoSQL database technologies, the disproportionate success of MongoDB is largely based on its innovation as a data structure store that lets us more easily and expressively model the ‘things’ at the heart of our applications. For this reason MongoDB, or something very like it, will become the dominant database paradigm for operational data storage, with relational databases filling the role of a specialized tool. Having the same basic data model in our code and in the database is the superior method for most use-cases, as it dramatically simplifies the task of application development, and eliminates the layers of complex mapping code that are otherwise required. While a JSON-based document database may in retrospect seem obvious (if it doesn’t yet, it will), doing it right, as the folks at 10gen have, represents a major innovation.
January 16, 2013
by Eric Genesky
· 6,654 Views
article thumbnail
Reading Hive Tables from MapReduce
This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight. This is part two of a two part blog series on how to read/write Apache Hive data from MapReduce jos. Part one (Writing Hive Tables from MapReduce) is here. So just as sometimes you need to write data to Hive with a custom MapReduce job, sometimes you need to read that data back from Hive with a custom MapReduce job. As covered in part one, Hive is a layer that sits on HDFS and imposes a standard convention on the structure of the files so it can interpret them as columns and rows. Reading data out of Hive is just a matter of parsing the files correctly. Recall that files processed by MapReduce (and by extension, Hive) are output as key value pairs. Hive ignores the keys (read as a BytesWritable with a value of null) and reads/writes the values as Text objects. The value of the Text object for each row is the concatenation of all the column values delimited by the delimiter of the table (which Hive defaults to the "char 1" ASCII character). Seems like a simple problem, so my first thought was to just using String.split() in the map() method of the MapReduce job. String SEPARATOR_FIELD = new String(new char[] {1}); String[] rowColumns = new String (rowTextObject.getBytes()).split(SEPARATOR_FIELD); In theory this should have worked perfectly, but unfortunately I have found that String.split() actually consumes repeated delimiters. This is a problem if any of the values in the row are blank, since split() will shift the positions of your columns and you will be unable to match up what values belong with which columns. An alternative would be to create a String from the Text object and iterate through it using indexOf(). This approach however requires extra object creation and depending on the scale of your MapReduce job and the size of your rows, may slow you down needlessly. So an alternative is to use the Text object's find() method. String SEPARATOR_FIELD = new String(new char[] {1}); String[] rowColumns = new String[NUMBER_OF_COLUMNS_IN_YOUR_HIVE_TABLE]; int start = 0; int end = 0; for (int i = 0; i < rowColumns.length; ++i) { end = rowTextObject.find(SEPARATOR_FIELD, start); if (end == -1) { end = rowString.getLength(); } rowColumns[i] = new String(rowTextObject.getBytes(), start, end-start); start = end + 1; } This will parse out each value into the appropriately index of the rowColumns array. Blank values will also be handled correctly and result in blank strings being inserted into the rowColumns array.
January 11, 2013
by Scott Leberknight
· 6,597 Views · 1 Like
article thumbnail
Bash Magic: List Hive Table Sizes in GB
How to list the sizes of Hive tables in Hadoop in GBs.
January 10, 2013
by Jakub Holý
· 42,821 Views · 3 Likes
article thumbnail
Chunk Oriented Processing in Spring Batch
Big Data Sets’ Processing is one of the most important problem in the software world. Spring Batch is a lightweight and robust batch framework to process the data sets. Spring Batch Framework offers ‘TaskletStep Oriented’ and ‘Chunk Oriented’ processing style. In this article, Chunk Oriented Processing Model is explained. Also, TaskletStep Oriented Processing in Spring Batch Article is definitely suggested to investigate how to develop TaskletStep Oriented Processing in Spring Batch. Chunk Oriented Processing Feature has come with Spring Batch v2.0. It refers to reading the data one at a time, and creating ‘chunks’ that will be written out, within a transaction boundary. One item is read from an ItemReader, handed to an ItemProcessor, and written. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed. Basically, this feature should be used if at least one data item’ s reading and writing is required. Otherwise, TaskletStep Oriented processing can be used if the data item’ s only reading or writing is required. Chunk Oriented Processing model exposes three important interface as ItemReader, ItemProcessor and ItemWriter via org.springframework.batch.item package. ItemReader : This interface is used for providing the data. It reads the data which will be processed. ItemProcessor : This interface is used for item transformation. It processes input object and transforms to output object. ItemWriter : This interface is used for generic output operations. It writes the datas which are transformed by ItemProcessor. For example, the datas can be written to database, memory or outputstream (etc). In this sample application, we will write to database. Let us take a look how to develop Chunk Oriented Processing Model. Used Technologies : JDK 1.7.0_09 Spring 3.1.3 Spring Batch 2.1.9 Hibernate 4.1.8 Tomcat JDBC 7.0.27 MySQL 5.5.8 MySQL Connector 5.1.17 Maven 3.0.4 Step 1 : Create Maven Project A maven project is created as below. (It can be created by using Maven or IDE Plug-in). Step 2: Libraries A new USER Table is created by executing below script: CREATE TABLE ONLINETECHVISION.USER ( id int(11) NOT NULL AUTO_INCREMENT, name varchar(45) NOT NULL, surname varchar(45) NOT NULL, PRIMARY KEY (`id`) ); Step 3: Libraries Firstly, dependencies are added to Maven’ s pom.xml. 3.1.3.RELEASE 2.1.9.RELEASE org.springframework spring-core ${spring.version} org.springframework spring-context ${spring.version} org.springframework spring-tx ${spring.version} org.springframework spring-orm ${spring.version} org.springframework.batch spring-batch-core ${spring-batch.version} org.hibernate hibernate-core 4.1.8.Final org.apache.tomcat tomcat-jdbc 7.0.27 mysql mysql-connector-java 5.1.17 log4j log4j 1.2.16 maven-compiler-plugin(Maven Plugin) is used to compile the project with JDK 1.7 org.apache.maven.plugins maven-compiler-plugin 3.0 1.7 1.7 The following Maven plugin can be used to create runnable-jar, org.apache.maven.plugins maven-shade-plugin 2.0 package shade 1.7 1.7 com.onlinetechvision.exe.Application META-INF/spring.handlers META-INF/spring.schemas Step 4 : Create User Entity User Entity is created. This entity will be stored after processing. package com.onlinetechvision.user; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.Table; /** * User Entity * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ @Entity @Table(name="USER") public class User { private int id; private String name; private String surname; @Id @GeneratedValue(strategy=GenerationType.AUTO) @Column(name="ID", unique = true, nullable = false) public int getId() { return id; } public void setId(int id) { this.id = id; } @Column(name="NAME", unique = true, nullable = false) public String getName() { return name; } public void setName(String name) { this.name = name; } @Column(name="SURNAME", unique = true, nullable = false) public String getSurname() { return surname; } public void setSurname(String surname) { this.surname = surname; } @Override public String toString() { StringBuffer strBuff = new StringBuffer(); strBuff.append("id : ").append(getId()); strBuff.append(", name : ").append(getName()); strBuff.append(", surname : ").append(getSurname()); return strBuff.toString(); } } Step 5 : Create IUserDAO Interface IUserDAO Interface is created to expose data access functionality. package com.onlinetechvision.user.dao; import java.util.List; import com.onlinetechvision.user.User; /** * User DAO Interface * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public interface IUserDAO { /** * Adds User * * @param User user */ void addUser(User user); /** * Gets User List * */ List getUsers(); } Step 6 : Create UserDAO IMPL UserDAO Class is created by implementing IUserDAO Interface. package com.onlinetechvision.user.dao; import java.util.List; import org.hibernate.SessionFactory; import com.onlinetechvision.user.User; /** * User DAO * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class UserDAO implements IUserDAO { private SessionFactory sessionFactory; /** * Gets Hibernate Session Factory * * @return SessionFactory - Hibernate Session Factory */ public SessionFactory getSessionFactory() { return sessionFactory; } /** * Sets Hibernate Session Factory * * @param SessionFactory - Hibernate Session Factory */ public void setSessionFactory(SessionFactory sessionFactory) { this.sessionFactory = sessionFactory; } /** * Adds User * * @param User user */ @Override public void addUser(User user) { getSessionFactory().getCurrentSession().save(user); } /** * Gets User List * * @return List - User list */ @SuppressWarnings({ "unchecked" }) @Override public List getUsers() { List list = getSessionFactory().getCurrentSession().createQuery("from User").list(); return list; } } Step 7 : Create IUserService Interface IUserService Interface is created for service layer. package com.onlinetechvision.user.service; import java.util.List; import com.onlinetechvision.user.User; /** * * User Service Interface * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public interface IUserService { /** * Adds User * * @param User user */ void addUser(User user); /** * Gets User List * * @return List - User list */ List getUsers(); } Step 8 : Create UserService IMPL UserService Class is created by implementing IUserService Interface. package com.onlinetechvision.user.service; import java.util.List; import org.springframework.transaction.annotation.Transactional; import com.onlinetechvision.user.User; import com.onlinetechvision.user.dao.IUserDAO; /** * * User Service * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ @Transactional(readOnly = true) public class UserService implements IUserService { IUserDAO userDAO; /** * Adds User * * @param User user */ @Transactional(readOnly = false) @Override public void addUser(User user) { getUserDAO().addUser(user); } /** * Gets User List * */ @Override public List getUsers() { return getUserDAO().getUsers(); } public IUserDAO getUserDAO() { return userDAO; } public void setUserDAO(IUserDAO userDAO) { this.userDAO = userDAO; } } Step 9 : Create TestReader IMPL TestReader Class is created by implementing ItemReader Interface. This class is called in order to read items. When read method returns null, reading operation is completed. The following steps explains with details how to be executed firstJob. The commit-interval value of firstjob is 2 and the following steps are executed : 1) firstTestReader is called to read first item(firstname_0, firstsurname_0) 2) firstTestReader is called again to read second item(firstname_1, firstsurname_1) 3) testProcessor is called to process first item(FIRSTNAME_0, FIRSTSURNAME_0) 4) testProcessor is called to process second item(FIRSTNAME_1, FIRSTSURNAME_1) 5) testWriter is called to write first item(FIRSTNAME_0, FIRSTSURNAME_0) to database 6) testWriter is called to write second item(FIRSTNAME_1, FIRSTSURNAME_1) to database 7) first and second items are committed and the transaction is closed. 8) firstTestReader is called to read third item(firstname_2, firstsurname_2) 9) maxIndex value of firstTestReader is 3. read method returns null and item reading operation is completed. 10) testProcessor is called to process third item(FIRSTNAME_2, FIRSTSURNAME_2) 11) testWriter is called to write first item(FIRSTNAME_2, FIRSTSURNAME_2) to database 12) third item is committed and the transaction is closed. firstStep is completed with COMPLETED status and secondStep is started. secondJob and thirdJob are executed in the same way. package com.onlinetechvision.item; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.NonTransientResourceException; import org.springframework.batch.item.ParseException; import org.springframework.batch.item.UnexpectedInputException; import com.onlinetechvision.user.User; /** * TestReader Class is created to read items which will be processed * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class TestReader implements ItemReader { private int index; private int maxIndex; private String namePrefix; private String surnamePrefix; /** * Reads items one by one * * @return User * * @throws Exception * @throws UnexpectedInputException * @throws ParseException * @throws NonTransientResourceException * */ @Override public User read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException { User user = new User(); user.setName(getNamePrefix() + "_" + index); user.setSurname(getSurnamePrefix() + "_" + index); if(index > getMaxIndex()) { return null; } incrementIndex(); return user; } /** * Increments index which defines read-count * * @return int * */ private int incrementIndex() { return index++; } public int getMaxIndex() { return maxIndex; } public void setMaxIndex(int maxIndex) { this.maxIndex = maxIndex; } public String getNamePrefix() { return namePrefix; } public void setNamePrefix(String namePrefix) { this.namePrefix = namePrefix; } public String getSurnamePrefix() { return surnamePrefix; } public void setSurnamePrefix(String surnamePrefix) { this.surnamePrefix = surnamePrefix; } } Step 10 : Create FailedCaseTestReader IMPL FailedCaseTestReader Class is created in order to simulate the failed job status. In this sample application, when thirdJob is processed at fifthStep, failedCaseTestReader is called and exception is thrown so its status will be FAILED. package com.onlinetechvision.item; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.NonTransientResourceException; import org.springframework.batch.item.ParseException; import org.springframework.batch.item.UnexpectedInputException; import com.onlinetechvision.user.User; /** * FailedCaseTestReader Class is created in order to simulate the failed job status. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class FailedCaseTestReader implements ItemReader { private int index; private int maxIndex; private String namePrefix; private String surnamePrefix; /** * Reads items one by one * * @return User * * @throws Exception * @throws UnexpectedInputException * @throws ParseException * @throws NonTransientResourceException * */ @Override public User read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException { User user = new User(); user.setName(getNamePrefix() + "_" + index); user.setSurname(getSurnamePrefix() + "_" + index); if(index >= getMaxIndex()) { throw new Exception("Unexpected Error!"); } incrementIndex(); return user; } /** * Increments index which defines read-count * * @return int * */ private int incrementIndex() { return index++; } public int getMaxIndex() { return maxIndex; } public void setMaxIndex(int maxIndex) { this.maxIndex = maxIndex; } public String getNamePrefix() { return namePrefix; } public void setNamePrefix(String namePrefix) { this.namePrefix = namePrefix; } public String getSurnamePrefix() { return surnamePrefix; } public void setSurnamePrefix(String surnamePrefix) { this.surnamePrefix = surnamePrefix; } } Step 11 : Create TestProcessor IMPL TestProcessor Class is created by implementing ItemProcessor Interface. This class is called to process items. User item is received from TestReader, processed and returned to TestWriter. package com.onlinetechvision.item; import java.util.Locale; import org.springframework.batch.item.ItemProcessor; import com.onlinetechvision.user.User; /** * TestProcessor Class is created to process items. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class TestProcessor implements ItemProcessor { /** * Processes items one by one * * @param User user * @return User * @throws Exception * */ @Override public User process(User user) throws Exception { user.setName(user.getName().toUpperCase(Locale.ENGLISH)); user.setSurname(user.getSurname().toUpperCase(Locale.ENGLISH)); return user; } } Step 12 : Create TestWriter IMPL TestWriter Class is created by implementing ItemWriter Interface. This class is called to write items to DB, memory etc… package com.onlinetechvision.item; import java.util.List; import org.springframework.batch.item.ItemWriter; import com.onlinetechvision.user.User; import com.onlinetechvision.user.service.IUserService; /** * TestWriter Class is created to write items to DB, memory etc... * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class TestWriter implements ItemWriter { private IUserService userService; /** * Writes items via list * * @throws Exception * */ @Override public void write(List userList) throws Exception { for(User user : userList) { getUserService().addUser(user); } System.out.println("User List : " + getUserService().getUsers()); } public IUserService getUserService() { return userService; } public void setUserService(IUserService userService) { this.userService = userService; } } Step 13 : Create FailedStepTasklet Class FailedStepTasklet is created by implementing Tasklet Interface. It illustrates business logic in failed step. package com.onlinetechvision.tasklet; import org.apache.log4j.Logger; import org.springframework.batch.core.StepContribution; import org.springframework.batch.core.scope.context.ChunkContext; import org.springframework.batch.core.step.tasklet.Tasklet; import org.springframework.batch.repeat.RepeatStatus; /** * FailedStepTasklet Class illustrates a failed job. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class FailedStepTasklet implements Tasklet { private static final Logger logger = Logger.getLogger(FailedStepTasklet.class); private String taskResult; /** * Executes FailedStepTasklet * * @param StepContribution stepContribution * @param ChunkContext chunkContext * @return RepeatStatus * @throws Exception * */ public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception { logger.debug("Task Result : " + getTaskResult()); throw new Exception("Error occurred!"); } public String getTaskResult() { return taskResult; } public void setTaskResult(String taskResult) { this.taskResult = taskResult; } } Step 14 : Create BatchProcessStarter Class BatchProcessStarter Class is created to launch the jobs. Also, it logs their execution results. package com.onlinetechvision.spring.batch; import org.apache.log4j.Logger; import org.springframework.batch.core.Job; import org.springframework.batch.core.JobExecution; import org.springframework.batch.core.JobParametersBuilder; import org.springframework.batch.core.JobParametersInvalidException; import org.springframework.batch.core.launch.JobLauncher; import org.springframework.batch.core.repository.JobExecutionAlreadyRunningException; import org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException; import org.springframework.batch.core.repository.JobRepository; import org.springframework.batch.core.repository.JobRestartException; /** * BatchProcessStarter Class launches the jobs and logs their execution results. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class BatchProcessStarter { private static final Logger logger = Logger.getLogger(BatchProcessStarter.class); private Job firstJob; private Job secondJob; private Job thirdJob; private JobLauncher jobLauncher; private JobRepository jobRepository; /** * Starts the jobs and logs their execution results. * */ public void start() { JobExecution jobExecution = null; JobParametersBuilder builder = new JobParametersBuilder(); try { getJobLauncher().run(getFirstJob(), builder.toJobParameters()); jobExecution = getJobRepository().getLastJobExecution(getFirstJob().getName(), builder.toJobParameters()); logger.debug(jobExecution.toString()); getJobLauncher().run(getSecondJob(), builder.toJobParameters()); jobExecution = getJobRepository().getLastJobExecution(getSecondJob().getName(), builder.toJobParameters()); logger.debug(jobExecution.toString()); getJobLauncher().run(getThirdJob(), builder.toJobParameters()); jobExecution = getJobRepository().getLastJobExecution(getThirdJob().getName(), builder.toJobParameters()); logger.debug(jobExecution.toString()); } catch (JobExecutionAlreadyRunningException | JobRestartException | JobInstanceAlreadyCompleteException | JobParametersInvalidException e) { logger.error(e); } } public Job getFirstJob() { return firstJob; } public void setFirstJob(Job firstJob) { this.firstJob = firstJob; } public Job getSecondJob() { return secondJob; } public void setSecondJob(Job secondJob) { this.secondJob = secondJob; } public Job getThirdJob() { return thirdJob; } public void setThirdJob(Job thirdJob) { this.thirdJob = thirdJob; } public JobLauncher getJobLauncher() { return jobLauncher; } public void setJobLauncher(JobLauncher jobLauncher) { this.jobLauncher = jobLauncher; } public JobRepository getJobRepository() { return jobRepository; } public void setJobRepository(JobRepository jobRepository) { this.jobRepository = jobRepository; } } Step 15 : Create dataContext.xml jdbc.properties, is created. It defines data-source informations and is read via dataContext.xml jdbc.db.driverClassName=com.mysql.jdbc.Driver jdbc.db.url=jdbc:mysql://localhost:3306/onlinetechvision jdbc.db.username=root jdbc.db.password=root jdbc.db.initialSize=10 jdbc.db.minIdle=3 jdbc.db.maxIdle=10 jdbc.db.maxActive=10 jdbc.db.testWhileIdle=true jdbc.db.testOnBorrow=true jdbc.db.testOnReturn=true jdbc.db.initSQL=SELECT 1 FROM DUAL jdbc.db.validationQuery=SELECT 1 FROM DUAL jdbc.db.timeBetweenEvictionRunsMillis=30000 Step 16 : Create dataContext.xml Spring Configuration file, dataContext.xml, is created. It covers dataSource, sessionFactory and transactionManager definitions. com.onlinetechvision.user.User org.hibernate.dialect.MySQLDialect true Step 17 : Create jobContext.xml Spring Configuration file, jobContext.xml, is created. It covers jobRepository, jobLauncher, item reader, item processor, item writer, tasklet and job definitions. Step 18 : Create applicationContext.xml Spring Configuration file, applicationContext.xml, is created. It covers bean definitions. Step 19 : Create Application Class Application Class is created to run the application. package com.onlinetechvision.exe; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.onlinetechvision.spring.batch.BatchProcessStarter; /** * Application Class starts the application. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class Application { /** * Starts the application * * @param String[] args * */ public static void main(String[] args) { ApplicationContext appContext = new ClassPathXmlApplicationContext("applicationContext.xml"); BatchProcessStarter batchProcessStarter = (BatchProcessStarter)appContext.getBean("batchProcessStarter"); batchProcessStarter.start(); } } Step 20 : Build Project After OTV_SpringBatch_Chunk_Oriented_Processing Project is built, OTV_SpringBatch_Chunk_Oriented_Processing-0.0.1-SNAPSHOT.jar will be created. STEP 21 : RUN PROJECT After created OTV_SpringBatch_Chunk_Oriented_Processing-0.0.1-SNAPSHOT.jar file is run, the following database and console output logs will be shown : Database screenshot : First Job’ s console output : 16.12.2012 19:30:41 INFO (SimpleJobLauncher.java:118) - Job: [FlowJob: [name=firstJob]] launched with the following parameters: [{}] 16.12.2012 19:30:41 DEBUG (AbstractJob.java:278) - Job execution starting: JobExecution: id=0, version=0, startTime=null, endTime=null, lastUpdated=Sun Dec 16 19:30:41 GMT 2012, status=STARTING, exitStatus=exitCode=UNKNOWN;exitDescription=, job=[JobInstance: id=0, version=0, JobParameters=[{}], Job=[firstJob]] User List : [id : 181, name : FIRSTNAME_0, surname : FIRSTSURNAME_0, id : 182, name : FIRSTNAME_1, surname : FIRSTSURNAME_1, id : 183, name : FIRSTNAME_2, surname : FIRSTSURNAME_2, id : 184, name : SECONDNAME_0, surname : SECONDSURNAME_0, id : 185, name : SECONDNAME_1, surname : SECONDSURNAME_1, id : 186, name : SECONDNAME_2, surname : SECONDSURNAME_2] 16.12.2012 19:30:42 DEBUG (BatchProcessStarter.java:43) - JobExecution: id=0, version=2, startTime=Sun Dec 16 19:30:41 GMT 2012, endTime=Sun Dec 16 19:30:42 GMT 2012, lastUpdated=Sun Dec 16 19:30:42 GMT 2012, status=COMPLETED, exitStatus=exitCode=COMPLETED;exitDescription=, job=[JobInstance: id=0, version=0, JobParameters=[{}], Job=[firstJob]] Second Job’ s console output : 16.12.2012 19:30:42 INFO (SimpleJobLauncher.java:118) - Job: [FlowJob: [name=secondJob]] launched with the following parameters: [{}] 16.12.2012 19:30:42 DEBUG (AbstractJob.java:278) - Job execution starting: JobExecution: id=1, version=0, startTime=null, endTime=null, lastUpdated=Sun Dec 16 19:30:42 GMT 2012, status=STARTING, exitStatus=exitCode=UNKNOWN;exitDescription=, job=[JobInstance: id=1, version=0, JobParameters=[{}], Job=[secondJob]] User List : [id : 181, name : FIRSTNAME_0, surname : FIRSTSURNAME_0, id : 182, name : FIRSTNAME_1, surname : FIRSTSURNAME_1, id : 183, name : FIRSTNAME_2, surname : FIRSTSURNAME_2, id : 184, name : SECONDNAME_0, surname : SECONDSURNAME_0, id : 185, name : SECONDNAME_1, surname : SECONDSURNAME_1, id : 186, name : SECONDNAME_2, surname : SECONDSURNAME_2, id : 187, name : THIRDNAME_0, surname : THIRDSURNAME_0, id : 188, name : THIRDNAME_1, surname : THIRDSURNAME_1, id : 189, name : THIRDNAME_2, surname : THIRDSURNAME_2, id : 190, name : THIRDNAME_3, surname : THIRDSURNAME_3, id : 191, name : FOURTHNAME_0, surname : FOURTHSURNAME_0, id : 192, name : FOURTHNAME_1, surname : FOURTHSURNAME_1, id : 193, name : FOURTHNAME_2, surname : FOURTHSURNAME_2, id : 194, name : FOURTHNAME_3, surname : FOURTHSURNAME_3] 16.12.2012 19:30:42 DEBUG (BatchProcessStarter.java:47) - JobExecution: id=1, version=2, startTime=Sun Dec 16 19:30:42 GMT 2012, endTime=Sun Dec 16 19:30:42 GMT 2012, lastUpdated=Sun Dec 16 19:30:42 GMT 2012, status=COMPLETED, exitStatus=exitCode=COMPLETED;exitDescription=, job=[JobInstance: id=1, version=0, JobParameters=[{}], Job=[secondJob]] Third Job’ s console output : 16.12.2012 19:30:42 INFO (SimpleJobLauncher.java:118) - Job: [FlowJob: [name=thirdJob]] launched with the following parameters: [{}] 16.12.2012 19:30:42 DEBUG (AbstractJob.java:278) - Job execution starting: JobExecution: id=2, version=0, startTime=null, endTime=null, lastUpdated=Sun Dec 16 19:30:42 GMT 2012, status=STARTING, exitStatus=exitCode=UNKNOWN;exitDescription=, job=[JobInstance: id=2, version=0, JobParameters=[{}], Job=[thirdJob]] 16.12.2012 19:30:42 DEBUG (TransactionTemplate.java:159) - Initiating transaction rollback on application exception org.springframework.batch.repeat.RepeatException: Exception in batch process; nested exception is java.lang.Exception: Unexpected Error! ... 16.12.2012 19:30:43 DEBUG (BatchProcessStarter.java:51) - JobExecution: id=2, version=2, startTime=Sun Dec 16 19:30:42 GMT 2012, endTime=Sun Dec 16 19:30:43 GMT 2012, lastUpdated=Sun Dec 16 19:30:43 GMT 2012, status=FAILED, exitStatus=exitCode=FAILED;exitDescription=, job=[JobInstance: id=2, version=0, JobParameters=[{}], Job=[thirdJob]] Step 22 : Download https://github.com/erenavsarogullari/OTV_SpringBatch_Chunk_Oriented_Processing REFERENCES : Chunk Oriented Processing in Spring Batch
January 3, 2013
by Eren Avsarogullari
· 153,068 Views · 7 Likes
article thumbnail
Auditing Login Attempts in MySQL
This is a recurring question made by our MySQL Support customers: How can I audit the login attempts in MySQL? Logging all the attempts or just the failed ones is a very important task on some scenarios. Unfortunately there are not too many audit capabilities in MySQL Community so the first option to audit MySQL’s authentication process is to get all the information we need from logs. General Query Log The first option is the General Query Log. Let’s see an example: Enable the log: general_log_file = /var/log/mysql/mysql.log general_log = 1 User correctly authenticated: 121227 8:31:49 38 Connect root@localhost on 38 Query select @@version_comment limit 1 User not correctly authenticated: 121227 8:32:18 39 Connect root@localhost on 39 Connect Access denied for user 'root'@'localhost' (using password: YES) The problem of the General Query Log is that it will log everything so it can cause performance degradation and you will have to deal with very large files on high loaded servers. general_log variable is dynamic so a solution could be enabling and disabling the log just when it’s needed. Error log If you only care about failed attempts to login then there is another different and less problematic approach. From 5.5 it’s possible to log access denied messages to the error log. We just need to enable log_warnings with a value greater than 1: log_warnings = 2 Then check the error log: 121227 8:44:21 [Warning] Access denied for user 'root'@'localhost' (using password: YES) User Statistics If you are using Percona Server then there is a third option to get information about our users, the User Statistics. As with the previous options we can get the number of connections and failed connections made by a particular user but not the date and time of those attempts. Besides that information we can get other statistics that can be very useful if MySQL is running on a multi-tenant environment or we need to control how resources are used. Let’s seen an example, first we enable User Statistics in my.cnf: 5.5 userstat = 1 5.1 userstat_running = 1 Then we get the information about a particular user: mysql> select * from user_statistics where user='root'\G *************************** 1. row *************************** USER: root TOTAL_CONNECTIONS: 25 CONCURRENT_CONNECTIONS: 0 CONNECTED_TIME: 464 BUSY_TIME: 96 CPU_TIME: 19 BYTES_RECEIVED: 62869617 BYTES_SENT: 14520 BINLOG_BYTES_WRITTEN: 0 ROWS_FETCHED: 783051 ROWS_UPDATED: 1017714 TABLE_ROWS_READ: 1484751 SELECT_COMMANDS: 14 UPDATE_COMMANDS: 103 OTHER_COMMANDS: 3556 COMMIT_TRANSACTIONS: 0 ROLLBACK_TRANSACTIONS: 0 DENIED_CONNECTIONS: 2 LOST_CONNECTIONS: 16 ACCESS_DENIED: 0 EMPTY_QUERIES: 0 TOTAL_SSL_CONNECTIONS: 0 Here we can see that root has done 25 total connections. Two denied connections (bad password) and 16 lost connections (not closed properly). Apart from that information we get the connection time, bytes received and sent, rows accessed, commands executed and so on. Very valuable information. It is important to mention that these tables are stored in INFORMATION_SCHEMA and that means that after a mysqld restart all the information will be lost. So if you really need that information you should copy it to another table or export to a csv for further analysis. Conclusion We don’t have too many audit capabilities in MySQL Community so logging all events and then filter them with custom-made scripts is the best solution we have nowadays. If you are using Percona Server you can get more detailed information about what a particular user is doing. All options can be combined to meet your needs.
January 2, 2013
by Peter Zaitsev
· 18,101 Views
article thumbnail
How ACID is MongoDB?
Find out how MongoDB ranks with atomicity, consistency, isolation and durability (ACID).
January 1, 2013
by Giorgio Sironi
· 81,913 Views · 5 Likes
article thumbnail
MongoDB and the Concept of Identity in NoSQL Databases
in this article i deal with a different nosql database called mongodb a mature nosql engine born outside the .net world to clarify the concept of id in a typical no sql database. installation of mongo is really simple, just download, uncompress, locate the bin folder, and type this from an administrator console prompt to install mongo as service 1: mongod --install --logpath c:xxxx --dbpath c:yyyy you can find plenty of installation guide on the internet, but with the above install command you create a windows service that will automatically start mongodb on your machine using specified datafolder. now you should download the c# driver to connect from .net code, but if you like using linq you can install fluent mongo directly with nuget. figure 1: install fluent mongo with nuget. fluent mongo is a library that gave little linq capability over standard drivers, but adding a nuget reference to fluentmongo you get automatically a reference to the official drivers. now you are ready to insert your first record in mongodb with the above code. mongoserver server = mongoserver.create(); mongodatabase databasetest = server.getdatabase("test"); var untyped = databasetest.getcollection("untyped"); untyped.save(new bsondocument { { "name", "untyped1" } }); bsondocument seconddocument = bsondocument.parse("{name: 'untyped2', blabla: 'bla bla value'}"); untyped.save(seconddocument); in line1 i create a connection to mongodb server passing no parameter to connect to local mongodb server, then i obtain a reference to a mongodatabase object called “test” with mongoserver::getdatabase() method and finally i get a reference to a collection named “untyped” with the mongngodatabase::getcollection () method. this is quite similar to a sql server or other sql database, you have a server, the server contains several databases, and each database is composed by tables; in the same way mongo is divided into server/database/collection where a collection contains document. mongodb stores data in json format and to insert data inside a collection you can simply create a bsondocument, an object defined by c# driver assembly that is capable to represent a document composed by a series of key-value pair. to initialize a bsondocument you can pass a icollection (line 4) or if you feel more confortable with string json representation, you can user bsondocument.parse() to specify the document directly with a json string. after you inserted the above documents you can use mongovue to see what is contained inside the database. figure 2: use mongovue to see what is inside the database the interesting aspect is that each document has an unique id, even if i did not specified any special property in the code. this is a standard behavior for nosql databases, if you did not specify any id property the database engine will create unique id on his own to idendify the docuemnt. the id is a key factor for mongo and other nosql storage, if you try to store a document directly inside the collection specifying json content you will get an error. untyped.save("{name: 'json', attribute:'attribute content'}"); the mongocollection object contains a save object that accepts a string, but the above call will fail with the error subclass must implement getdocumentid . previous code works because one of the specific functionality that a bsondocument implements is the ability to manage id generation, but plain json does not have this capability. if you need to know the id generated by the database, you can query the bsondocument for its unique id after it was saved in a mongo collection . (remember that the id is not available until you save the document). bsondocument seconddocument = bsondocument.parse("{name: 'untyped2', blabla: 'bla bla value'}"); object id; type idtype; iidgenerator generator; untyped.save(seconddocument); seconddocument.getdocumentid(out id, out idtype, out generator); basically you are asking to your bsondocument to return you the generated id, as well as the type of the id and the generator that mongo used to generate that specific id. the result is represented into this snippet figure 3: the three object that you got with a call to getdocumentid: id, idtype and the generator. as you can see the id is an instance of type mongodb.bson.objectid, based on bsonvalue base class and the generator is an instance of objectidgenerator. this type of id is specific to mongo, and the documentation states that an objectid is a bson objectid is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. note that the timestamp and counter fields must be stored big endian unlike the rest of bson. this is because they are compared byte-by-byte and we want to ensure a mostly increasing order. if you want to have a generator that creates integer id, like identity column in sql server , you will find that it is simply not available out of the box, because an int value is not guarantee to be unique if you use sharding. sharding is a technique that permits to partition data into different physical instances, so each instance should generates ids that are unique across all instances and this prevents the use of a simple int32 id. clearly in .net world a guid is guarantee to be unique and is more .net oriented, so mongo db has a guid id generator, that can be specified with the above snippet of code.. bsondocument thirddocument = bsondocument.parse("{name: 'untyped3', anotherproperty: 'xxxxxxxxxxxxxxxxxxxxxxx'}"); var id2 = mongodb.bson.serialization.idgenerators.guidgenerator.instance.generateid(untyped, thirddocument); thirddocument.setdocumentid(id2); the key is using the guidgenerator (in the mongodb.bson.serialization.idgenerators namespace) to generate a valid mongoid guid value, then call the setdocumentid method of bsondocument to manually set the id and not relay on automatic id generation. if you look at the db you will find that the document with guid id has really a different id type. figure 4: the document with guid id is represented in a different way in mongovue, but as you can verify there is no problem in having documents with different id types in the same collection. this demonstrates that a no sql database has a concept of document id that is similar to the concept of id of a standard sql server, you can use a native id generation of the engine that generates a valid id during insertion or you can assign your own id to the document, but basically the whole concept of id is more engine-related and has no business meaning, so i strongly discourage to use anything that has a business meaning as id of a document. noone prevents you to insert in the document a property called “myid” or something else that has a business meaning and can be used as logical id and let the engine handle the internal id by itself.
December 26, 2012
by Ricci Gian Maria
· 8,184 Views
article thumbnail
Getting Started with Quartz Scheduler on MySQL Database
Here are some simple steps to get you fully started with Quartz Scheduler on MySQL database using Groovy. The script below will allow you to quickly experiment different Quartz configuration settings using an external file. First step is to setup the database with tables. Assuming you already have installed MySQL and have access to create database and tables. bash> mysql -u root -p sql> create database quartz2; sql> create user 'quartz2'@'localhost' identified by 'quartz2123'; sql> grant all privileges on quartz2.* to 'quartz2'@'localhost'; sql> exit; bash> mysql -u root -p quartz2 < /path/to/quartz-dist/docs/dbTables/tables_mysql.sql The tables_mysql.sql can be found from Quartz distribution download, or directly from their source here. Once the database is up, you need to write some code to start up the Quartz Scheduler. Here is a simply Groovy script quartzServer.groovy that will run as a tiny scheduler server. // Run Quartz Scheduler as a server // Author: Author: Zemian Deng, Date: 2012-12-15_16:46:09 @GrabConfig(systemClassLoader=true) @Grab('mysql:mysql-connector-java:5.1.22') @Grab('org.slf4j:slf4j-simple:1.7.1') @Grab('org.quartz-scheduler:quartz:2.1.6') import org.quartz.* import org.quartz.impl.* import org.quartz.jobs.* config = args.length > 0 ? args[0] : "quartz.properties" scheduler = new StdSchedulerFactory(config).getScheduler() scheduler.start() // Register shutdown addShutdownHook { scheduler.shutdown() } // Quartz has its own thread, so now put this script thread to sleep until // user hit CTRL+C while (!scheduler.isShutdown()) { Thread.sleep(Long.MAX_VALUE) } And now you just need a config file quartz-mysql.properties that looks like this: # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = DatabaseScheduler org.quartz.scheduler.instanceId = NON_CLUSTERED org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate org.quartz.jobStore.dataSource = quartzDataSource org.quartz.jobStore.tablePrefix = QRTZ_ org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5 # JobStore: JDBC jobStoreTX org.quartz.dataSource.quartzDataSource.driver = com.mysql.jdbc.Driver org.quartz.dataSource.quartzDataSource.URL = jdbc:mysql://localhost:3306/quartz2 org.quartz.dataSource.quartzDataSource.user = quartz2 org.quartz.dataSource.quartzDataSource.password = quartz2123 org.quartz.dataSource.quartzDataSource.maxConnections = 8 You can run the Groovy script as usual bash> groovy quartzServer.groovy quartz-mysql.properties Dec 15, 2012 6:20:26 PM com.mchange.v2.log.MLog INFO: MLog clients using java 1.4+ standard logging. Dec 15, 2012 6:20:27 PM com.mchange.v2.c3p0.C3P0Registry banner INFO: Initializing c3p0-0.9.1.1 [built 15-March-2007 01:32:31; debug? true; trace:10] [main] INFO org.quartz.impl.StdSchedulerFactory - Using default implementation for ThreadExecutor [main] INFO org.quartz.core.SchedulerSignalerImpl - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl [main] INFO org.quartz.core.QuartzScheduler - Quartz Scheduler v.2.1.6 created. [main] INFO org.quartz.core.QuartzScheduler - JobFactory set to: org.quartz.simpl.SimpleJobFactory@1a40247 [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Using thread monitor-based data access locking (synchronization). [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - JobStoreTX initialized. [main] INFO org.quartz.core.QuartzScheduler - Scheduler meta-data: Quartz Scheduler (v2.1.6) 'DatabaseScheduler' with instanceId 'NON_CLUSTERED' Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally. NOT STARTED. Currently in standby mode. Number of jobs executed: 0 Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 5 threads. Using job-store 'org.quartz.impl.jdbcjobstore.JobStoreTX' - which supports persistence. and is not clustered. [main] INFO org.quartz.impl.StdSchedulerFactory - Quartz scheduler 'DatabaseScheduler' initialized from the specified file : 'quartz-mysql.properties' from the class resource path. [main] INFO org.quartz.impl.StdSchedulerFactory - Quartz scheduler version: 2.1.6 Dec 15, 2012 6:20:27 PM com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource getPoolManager INFO: Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, dataSourceName -> 1hge16k8r18mveoq1iqtotg|1486306, debugUnreturnedConnectionStackTraces -> fals e, description -> null, driverClass -> com.mysql.jdbc.Driver, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, identityToken -> 1hge16k8r18mveoq1iqtotg|1486306, idleConnectionTestPeriod -> 0, initialPoolSize -> 3, jdbcUrl -> jdbc:mysql://localhost:3306/quartz2, lastAcquisitionFailureDefaultUser -> null, maxAdministrativeTaskTime -> 0 , maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 8, maxStatements -> 0, maxStatementsPerConnection -> 120, minPoolSize -> 1, numHelperThreads -> 3, numThreadsAwaitingCheckoutDefaultUser -> 0, pref erredTestQuery -> null, properties -> {user=******, password=******}, propertyCycle -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, usesTraditionalReflectiveProxies -> false ] [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Freed 0 triggers from 'acquired' / 'blocked' state.[main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Recovering 0 jobs that were in-progress at the time of the last shut-down. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Recovery complete. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Removed 0 'complete' triggers. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Removed 0 stale fired job entries. [main] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED started. ... CTRL+C [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED shutting down. [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED paused. [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED shutdown complete. That's a full run of above setup. Go ahead and play with different config. Read http://quartz-scheduler.org/documentation/quartz-2.1.x/configuration for more details. Here I will post couple more easy config that will get you started in a commonly used config set. A MySQL cluster enabled configuration. With this, you can start one or more shell terminal and run different instance of quartzServer.groovy with the same config. All the quartz scheduler instances should cluster themselve and distribute your jobs evenly. # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = DatabaseClusteredScheduler org.quartz.scheduler.instanceId = AUTO org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate org.quartz.jobStore.dataSource = quartzDataSource org.quartz.jobStore.tablePrefix = QRTZ_ org.quartz.jobStore.isClustered = true org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5 # JobStore: JDBC jobStoreTX org.quartz.dataSource.quartzDataSource.driver = com.mysql.jdbc.Driver org.quartz.dataSource.quartzDataSource.URL = jdbc:mysql://localhost:3306/quartz2 org.quartz.dataSource.quartzDataSource.user = quartz2 org.quartz.dataSource.quartzDataSource.password = quartz2123 org.quartz.dataSource.quartzDataSource.maxConnections = 8 Here is another config set for a simple in-memory scheduler. # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = InMemoryScheduler org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5 Now, if you need more fancy UI management of Quartz, give MySchedule a try.
December 21, 2012
by Zemian Deng
· 49,957 Views · 2 Likes
article thumbnail
When you should and should NOT use ENUM data type
ENUM is a new enumerated data type introduced in CUBRID 9.0. Like in all programming languages, the ENUM type is a data type composed of a set of static, ordered values. Users can define numeric and string values for ENUM columns. Working with ENUM types Creating an ENUM column is done by specifying a static list of possible values: CREATE TABLE person( name VARCHAR(255), gender ENUM('Male', 'Female') ); CUBRID understands the ENUM type as an ordered set of constants which, in the above example, is a set of {NULL: NULL, 1: 'Male', 2: 'Female”}. To assign a value to the gender column, users may either use the index of the value ({NULL, 1, 2}) or the actual constant literal ({NULL}, {'Male'}, {'Female'}). CUBRID restricts the values that can be assigned to this column to only values from this set + NULL. Moreover, ENUM column is case-sensitive, i.e. it will raise an error if you try to enter 'female' in lower case. Also, an empty string is allowed if it is defined as one of the elements of the ENUM column. In our examples, it is not allowed. csql> INSERT INTO person(name, gender) VALUES('Eugene', 'Male'); 1 row affected. 1 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('Anne', 2); 1 row affected. 1 command(s) successfully processed. csql> SELECT * FROM person; === === name gender ============================================ 'Anne' 'Female' 'Eugene' 'Male' 2 rows selected. Any attempt to insert a value outside of the defined set will result in a coercion error. In the below case, trying to insert an empty string raises an error because it is not in the set of allowed values defined in the person table. csql> INSERT INTO person(name, gender) VALUES('John', 'N/A'); IN line 1, COLUMN 44, ERROR: before ' ); ' Cannot coerce 'N/A' TO type enum. 0 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('John', 4); IN line 1, COLUMN 45, ERROR: before ' ); ' Cannot coerce 4 TO type enum. 0 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('John', ''); IN line 1, COLUMN 44, ERROR: before ' ); ' Cannot coerce '' TO type enum. 0 command(s) successfully processed. Why you should use the ENUM type There are three important reasons for which you should consider using the ENUM type: Reduce storage space. Reduce join complexity. Create cheap values constraints. Storage Space CUBRID uses only 1 byte per tuple when 255 or less ENUM elements are defined or 2 bytes for 256~65535 elements. This is because, rather that storing the constant literal of the value, CUBRID stores the index in the ordered set of that value. For very large tables, this might prove to be a significant storage space save. Take, for example, a table with 1,000,000,000 records which has an ENUM column defined as ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'). If you use a VARCHAR type instead of the ENUM type to store these values, the column would require anywhere between 5GB and 9GB of storage space. Using the ENUM type, you can reduce the required space to 2 bytes per tuple, adding up to a total of 2GB. Reduce join complexity JOIN way The same effect of the ENUM type can be achieved by creating a one to many relationship on two or more tables. Considering the example above, you can store values for days of the week like this: CREATE TABLE days_of_week( id SHORT PRIMARY KEY, name VARCHAR(9) ); CREATE TABLE opening_hours( week_day SHORT, opening_time TIME, closing_time TIME, FOREIGN KEY fk_dow (week_day) REFERENCES days_of_week(id) ); Then, when you wish to display the name of the week day, you would execute a query like: SELECT d.name day_name, o.opening_time, o.closing_time FROM days_of_week d, opening_hours o WHERE d.id = o.week_day ORDER BY d.id; === === day_name opening_time closing_time ================================================== 'Monday' 09:00:00 AM 06:00:00 PM 'Tuesday' 09:00:00 AM 06:00:00 PM 'Wednesday' 09:00:00 AM 06:00:00 PM ... ENUM way You can achieve the same effect using an ENUM column: CREATE TABLE opening_hours( week_day ENUM ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), opening_time TIME, closing_time TIME ); And there’s no JOIN required to select opening hours: SELECT week_day, opening_time, closing_time FROM opening_hours ORDER BY week_day; === === week_day opening_time closing_time ================================================== 'Monday' 09:00:00 AM 06:00:00 PM 'Tuesday' 09:00:00 AM 06:00:00 PM 'Wednesday' 09:00:00 AM 06:00:00 PM ... This can prove to be very useful, especially if your queries join several tables. Value constraints ENUM columns behave like foreign key relationships in the sense that values from an ENUM column are restricted to the values specified in the column definition. For a short list of values, this is more efficient than creating foreign key relationships. While foreign key relationships use index scans to enforce the restriction, ENUM columns just have to go through a list of predefined values which is faster even for small indexes. Why/When you should NOT use the ENUM type Even though ENUM is a great feature, there are cases when you’d better not use it. For example: When ENUM type is not fixed When ENUM type has a long list of values When your application does not know the list of ENUM values ENUM type is not reusable Portability is a concern When ENUM type is not fixed If you’re not sure if the ENUM type holds all possible values for that column, you should consider using a one to many relationship instead. The only way in which an ENUM column can be changed to handle more values is by using an ALTER statement. This is a very expensive operation in any RDBMS and requires administrator rights. Also, ALTER statements are maintenance operations and should, as much as possible, be performed offline. When ENUM type has a long list of values ENUM types should not be used if you cannot limit a set of possible values to a few elements. When your application does not know the list of ENUM values There are only two ways of getting a list of values you have defined for an ENUM type: parsing the output of SHOW CREATE TABLE statement: csql> SHOW CREATE TABLE opening_hours; === === TABLE CREATE TABLE ============================================ 'opening_hours' 'CREATE TABLE [opening_hours] ([week_day] ENUM('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), [opening_time] TIME, [closing_time] TIME) selecting information from CUBRID system tables: csql> SELECT d.enumeration FROM _db_domain d, _db_attribute a WHERE a.attr_name = 'week_day' AND a.class_of.class_name = 'opening_hours' AND d IN a.domains; === === enumeration ====================== {'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'} Both might require complex coding and selecting from system tables requires administrator privileges. ENUM type is not reusable If you have several tables which require the names of week days, you will have to create an ENUM type for each of them. If you create a table to hold week days names, you can join this table with whichever other table that requires this information. Portability is a concern The ENUM type is only supported by a few RDBMSs and each one has its own idea as to how ENUM type is supposed to work. Below is a list of a few notable differences between CUBRID, MySQL and PostgreSQL: CUBRID PostgreSQL MySQL Inserting out of range value Throws error Throws error Inserts special value index 0 Comparing to char literals Compare as strings Compare as ENUM elements Compare as strings Comparing to values outside of the ENUM domain Compare as strings Throws error Compare as strings These subtle differences will most probably break your application in interesting and hard to understand ways. If you’re migrating from PostgreSQL to CUBRID for example, and you expect comparisons with char literals to be performed as ENUM comparisons, you’ll have a hard time understanding why your query returns weird results.
December 19, 2012
by Esen Sagynov
· 60,331 Views
article thumbnail
All about Two-Phase Locking and a little bit MVCC
In this blog I will describe the concurrency control methods implemented in database management systems, and the differences between them. I will also explain about what locking technique is used in CUBRID RDBMS, about locking modes and their compatibility, and finally, the deadlocks and the solution for them. Overview When multiple transactions, which change the data, are executed simultaneously, it is required to control the order of processing these transactions to satisfy the ACID (Atomicity, Consistency, Integrity, Durability) property of the database. Executing multiple transactions simultaneously should lead to the same result as executing each transaction independently, in other words, one transaction should not be affected by another transaction. If different data is changed for each transaction, no interference between transactions is made, so there is no issue. However, if the same data is simultaneously changed by multiple transactions, the order of processing each transaction should be controlled. Types of Concurrency Control For example, the T1 transaction changes the A record from 1 to 2 and then changes the B record, the T2 transaction can simultaneously change the A record, too. Let's assume that the T2 transaction changes the A record from 2 to 4 by adding +2. If two transactions are successfully terminated, there is no issue. But it is important that all transactions can be rolled back. If the T1 transaction is rolled back, the value of the A record should be returned to 1, i.e. the value before the T1 transaction was executed. This is to satisfy the ACID property of the database. However, the T2 transaction has already changed the A record value to 3. So, it is impossible to return the A record to 1 regardless of the situation. In this case, there can be two options. Two-phase locking (2PL) The first one is when the T2 transaction tries to change the A record, it knows that the T1 transaction has already changed the A record and waits until the T1 transaction is completed because the T2 transaction cannot know whether the T1 transaction will be committed or rolled back. This method is called Two-phase locking (2PL). Multi-version concurrency control (MVCC) The other one is to allow each of them, T1 and T2 transactions, to have their own changed versions. Even when the T1 transaction has changed the A record from 1 to 2, the T1 transaction leaves the original value 1 as it is and writes that the T1 transaction version of the A record is 2. Then, the following T2 transaction changes the A record from 1 to 3, not from 2 to 4, and writes that the T2 transaction version of the A record is 3. When the T1 transaction is rolled back, it does not matter if the 2, the T1 transaction version, is not applied to the A record. After that, if the T2 transaction is committed, the 3, the T2 transaction version, will be applied to the A record. If the T1 transaction is committed prior to the T2 transaction, the A record is changed to 2, and then to 3 at the time of committing the T2 transaction. The final database status is identical to the status of executing each transaction independently, without any impact on other transactions. Therefore, it satisfies the ACID property. This method is called Multi-version concurrency control (MVCC). CUBRID has implemented 2PL method as well as DB2 and SQL Server, while Oracle, InnoDB and PostgreSQL have implemented MVCC. Two-phase locking in CUBRID The 2PL adopted by CUBRID uses locks to ensure the consistency between transactions that change the identical data. As the "lock" literally means, the locking is executed through two phases: expanding phase (acquiring) shrinking phase (releasing) More accurately, all transactions should acquire lock for the data to be accessed and the acquired locks are released only when the transaction is terminated. After a transaction has acquired the lock for a certain data (regardless of the lock type, S_LOCK for read, stands for Shared Lock, or X_LOCK for write, stands for Exclusive Lock), when another transaction tries to acquire a new lock for the data, the new lock is allowed or pended depending on the lock compatibility rule. Therefore, success or failure of the prior transaction does not have impact on the following transactions, so the data consistency is maintained. Lock Manager in CUBRID Thus, the key point of 2PL, adopted by CUBRID, is that the lock must be processed through two phases: expanding phase and shrinking phase. Then, [Figure 1] release all locks, acquired while executing a transaction, only after the transaction ends (commit or rollback). Figure 1: Two-Phase Locking. 2PL concurrency control method naturally controls access to the identical data from transactions by making all transactions observe the 2PL protocol. The following Figure 2 below shows an example of three transactions using 2PL: Transaction 1 executes B=B+A operation, Transaction 2 executes C=A+B operation, and Transaction 3 executes Print C operation. Since all three transactions are accessing the data A, B and C, the concurrency control is required. In this case, each transaction is executed according to the 2PL protocol so that there is no data conflict. Figure 2: Concurrency Control by using 2PL. Lock modes To understand the concurrency control of multiple transactions more deeply, let's discuss about lock modes, lock conversion and transaction isolation level. In the above figure, you can see that S-lock, Shared Lock, for A was first acquired by Transaction 1, but it is also acquired by Transaction 2, too. On the contrary, the transaction which requested X-lock is blocked until S-lock is released. In this matter, a variety of lock modes are used to minimize conflicts by lockers. Major types of locks utilized in DBMSs are. Shared (S) Lock: Used for read operation. It is generally set on the target record when SELECT statement is executed. It blocks a transaction from changing data which was already read by other transactions. Exclusive (X) Lock: Used for write-operations such as INSERT, UPDATE, DELETE. It blocks one data from being changed by multiple transactions. Update (U) Lock: Used to define that the target resource will be changed. It is used to minimize deadlock which may occur when multiple transactions are executing both read and write. Intent Shared (IS) Lock: Set on the upper resource (e.g. tables) to set the S-lock on some lower resources (e.g. records or pages). It is to prevent other transactions from setting X-lock on the upper resource. Intent lock will soon be described. Intent Exclusive (IX) Lock: Set on the upper resource to set X-lock on some lower resources. Shared with Intent Exclusive (SIX) Lock: Set on the upper resource to set S-lock and X-lock on some lower resources. Lock mode compatibility Among the lock modes above, intent locks are used to improve the transaction concurrency and to prevent deadlock between the upper resources and the lower resources. For example, when Transaction A tries to read Record R on Table T, it sets IS_LOCK on Table T before setting S_LOCK on Record R. Then, Transaction B is prevented from setting X_LOCK on Table T to change the structure of Table T. If Transaction A has not set IS_LOCK on Table T, Transaction B would change the structure of Table T. Then, Transaction A would perform a wrong read operation. This way Transaction B has no need to check all records in Table T to check whether there is any lock set by other transactions for setting X_LOCK on Table T. The following lock mode compatibility table will clearly show the effect of intent locks: Table 1: The lock mode compatibility table of CUBRID. Current Lock Mode NULL IS NS S IX SIX U NX X Newly-requested Lock Mode NULL True True True True True True True True True IS True True N/A True True True N/A N/A False NS True N/A True True N/A N/A False True False S True True True True False False False False False IX True True N/A False True False N/A N/A False SIX True True N/A False False False N/A N/A False U True N/A True True N/A N/A False False False NX True N/A True False N/A N/A False False False X True False False False False False False False False From the lock mode compatibility table, you can see that X_LOCK cannot be set on a table if IS_LOCK is set on the table. And only IS_LOCK can be compatible with SIX_LOCK. This means that SIX_LOCK intends to set S_LOCK and X_LOCK on the record and it will not allow any lock but IS_LOCK for S_LOCK on other non-conflicting records. From the table, you can see that IX_LOCK and IX_LOCK can be compatible with each other. IX_LOCK intends to set X_LOCK for some records. So, the compatibility is available. If there are two transactions that try to change an identical record, IX_LOCK for the table is allowed. However, there is no problem in concurrency control since only the transaction that has acquired X_LOCK for the record first can change the record (X_LOCK and X_LOCK are not compatible). The lock mode compatibility table is expressed as a global variable lock_Comp[][] in the lock_table.c file in CUBRID source code. Among CUBRID sources, most codes related to lock modes are implemented in lock_manager.c file. To set lock on a data object, the lock_object() function is used which receives three parameters: the OID of an object where the lock mode will be set, the OID of the class where the object belongs, and the desired lock mode. In the source code of the function, you can see that the function is executed in several ways based on the target of the lock mode, the lock mode for an instance object or for a class object. Take note of this: in CUBRID, a class object is also an object. Keep it in mind that a class object has an OID and all class objects are the instances of a root class, so it uses ROOTOID, the OID of the root object, as its class OID. From the code, you can see that the required intent lock is set on a class object when a lock mode is required for an instance object. And there is a concept of lock waiting time in the lock mode request. To retrieve the lock timeout value set on the current transaction, the logtb_find_wait_secs() function is called. CUBRID supports the SET TRANSACTION LOCK TIMEOUT SQL command and the setLockTimeout() method in JDBC. The command is to specify the lock timeout of the current transaction. Lock waiting time means the time for a transaction, which has made a request for lock mode, to wait when a lock mode is set on an object by a transaction and the requested lock is not compatible with the already-set lock mode. As you have seen before, the 2PL concurrency control method does not allow lock from other transactions until the existing lock is released. For the following two reasons, lock timeout should be set by a transaction: When a user does not want to wait too long because of the lock mode. To lower the frequency of deadlock. Deadlocks A deadlock occurs when two or more transactions request resources locked by each of them, so all transactions cannot be progressed. Figure 8 below shows an example of a deadlock. Figure 2: Transaction Deadlock. First, Transaction 1 executes UPDATE participant SET gold=10 WHERE host_year=2004 AND nation_code=’KOR’ statement and sets X_LOCK on the ‘KOR’ record. Transaction 2 sets X_LOCK on the ‘JPN’ record. Transaction 3 sets X_LOCK on the ‘CHN’ record. After that, Transaction 1 requests X_LOCK on the ‘JPN’ record for executing UPDATE for that record. However, the ‘JPN’ record is already locked with X_LOCK by Transaction 2. So, Transaction 1 should wait until Transaction 2 ends. Based on the 2PL protocol, the X_LOCK is released when the transaction ends. Transaction 2 requests X_LOCK on the ‘CHN’record and waits for Transaction 3. Finally, Transaction 3 waits for Transaction 1 to acquire the 'KOR' record of Transaction 1 as it has X_LOCK on the ‘CHN’ record. As a result,Transaction 1 waits for Transaction 2 to end, Transaction 2 waits for Transaction 3 to end, and Transaction 3 waits for Transaction 1 to end. So, no transaction can be progressed. This is called a deadlock. Most DBMSs which use the 2PL method, including CUBRID, use the deadlock detection method to solve the deadlock problem. It periodically checks whether the cycle illustrated in the above figure occurs by drawing a Lock Wait Graph for the transactions being executed. In CUBRID, the thread for detecting deadlock checks the Lock Wait Graph every second. When a deadlock is detected, one transaction among the transactions is randomly selected and aborted by force. This is called unilateral abort. When a transaction is selected as a victim to be sacrificed to solve the deadlock and unilaterally aborted, the corresponding SQL statement returns an error code. The error message is "The transaction has timed out due to deadlock while waiting for X_LOCK for an object. It waited until User 2 ended.” When an error is returned and the application aborts the transaction, the locks of the transaction are released and other transactions can be continuously processed. To see how the deadlock is detected, see the lock_detect_local_deadlock() function in the source code. This function is called with the intervals (in seconds) specified by the PRM_LK_RUN_DEADLOCK_INTERVAL variable (the deadlock_detection_interval_in_secs parameter in cubrid.conf file) on the background thread which executes thread_deadlock_detect_thread(). Even if a deadlock does not occur, when the execution time of a transaction is too long, other transactions should wait for too long as well. For a certain application, it is wiser to give up rather than wait. In particular, when a web server has called DB tasks and the wait time is too long, all threads of the web server are used to process the DB, so they cannot be used to process external HTTP requests any more, causing service failures. Therefore, for a web application, the threads should be returned without waiting an unlimited amount of time for DB processing even if an error occurs. Two methods are used for that: One is lock timeout supported by CUBRID. The other is query cancel. JDBC is defined with an API which can cancel the SQL statement being executed. The key data structure of the lock manager is defined in the lock_manager.c file. typedef struct lk_entry LK_ENTRY; struct lk_entry { #if defined(SERVER_MODE) struct lk_res *res_head; /* back to resource entry */ THREAD_ENTRY *thrd_entry; /* thread entry pointer */ int tran_index; /* transaction table index */ LOCK granted_mode; /* granted lock mode */ LOCK blocked_mode; /* blocked lock mode */ int count; /* number of lock requests */ struct lk_entry *next; /* next entry */ struct lk_entry *tran_next; /* list of locks that trans. holds */ struct lk_entry *class_entry; /* ptr. to class lk_entry */ LK_ACQUISITION_HISTORY *history; /* lock acquisition history */ LK_ACQUISITION_HISTORY *recent; /* last node of history list */ int ngranules; /* number of finer granules */ int mlk_count; /* number of instant lock requests */ unsigned char scanid_bitset[1]; /* PRM_LK_MAX_SCANID_BIT/8]; */ #else /* not SERVER_MODE */ int dummy; #endif /* not SERVER_MODE */ }; typedef struct lk_res LK_RES; struct lk_res { MUTEX_T res_mutex; /* resource mutex */ LOCK_RESOURCE_TYPE type; /* type of resource: class,instance */ OID oid; OID class_oid; LOCK total_holders_mode; /* total mode of the holders */ LOCK total_waiters_mode; /* total mode of the waiters */ LK_ENTRY *holder; /* lock holder list */ LK_ENTRY *waiter; /* lock waiter list */ LK_ENTRY *non2pl; /* non2pl list */ LK_RES *hash_next; /* for hash chain */ }; From the file, the lk_Gl global variable of LK_GLOBAL_DATA type is the core. The LK_ENTRY structure stands for the lock itself. For example, when the Transaction T1 has requested a lock, one LK_ENTRY is created. LK_RES is a structure that shows to which resource the lock belongs. In CUBRID, all resources are objects (instance objects and class objects), so they are shaped as OIDs. In the LK_RES structure, you can see the list of holders with LK_ENTRY type and the list of waiters. The list of holders is a list of transactions that hold the lock for the resource now. For example, when Transaction T1 and Transaction T2 have acquired S_LOCK for the data record with OID1, LK_ENTRY that corresponds to the S_LOCK of T1 and T2 will be registered in the list of holders. When Transaction T3 requests X_LOCK on the OID1 record, T3 should wait because of the existing S_LOCK. So, the LK_ENTRY corresponding to X_LOCK of T3 will be registered to the list of waiters. Which lock is held by which transaction is maintained in the tran_lock_table variable which has the LK_TRAN_LOCK structure as a table. The Wait For Graph for detecting a deadlock is expressed as TWFG_node and TWFG_edge of the LK_WFG_NODE structure and the LK_WFG_EDGE structure. The lock_detect_local_deadlock() function creates a Wait For Graph and detects whether there is a cycle on the graph. When a cycle is detected, the lock_select_deadlock_victim() function selects a victim transaction to be sacrificed for solving the deadlock. For reference, transactions are continuously executed while a Wait For Graph is drawn up and checked, the information of the ended transaction is removed from the graph. The victim transaction is selected based on the following criteria: If a transaction is not a holder, it cannot be a victim. When a transaction is in the commit phase or the rollback phase, it cannot be selected as a victim. Select a transaction of which lock timeout is not set to -1 (unlimited waiting) first. Select the latest transaction rather than the older one. (The transaction ID is an incremental number. A transaction with smaller transaction number is the older one.) Conclusion This concludes the talk about Two-Phase Locking in CUBRID. I briefly covered the types of concurrency control, the difference between 2PL and MVCC, about what locking technique is used in CUBRID RDBMS, about locking modes and their compatibility, and finally, the deadlocks and the solution for them. In this article I have mentioned about OID (Object Identifiers) which are used to identify instance objects as well as class objects. In the next article I will continue this talk and explain what Object, Class, and OID are.
December 14, 2012
by Esen Sagynov
· 11,160 Views · 1 Like
article thumbnail
Checking DB Connection Using Groovy
Here is a simple Groovy script to verify Oracle database connection using JDBC. @GrabConfig(systemClassLoader=true) @Grab('com.oracle:ojdbc6:11g') url= "jdbc:oracle:thin:@localhost:1521:XE" username = "system" password = "mypassword123" driver = "oracle.jdbc.driver.OracleDriver" // Groovy Sql connection test import groovy.sql.* sql = Sql.newInstance(url, username, password, driver) try { sql.eachRow('select sysdate from dual'){ row -> println row } } finally { sql.close() } This script should let you test connection and perform any quick ad hoc queries programmatically. However, when you first run it, it would likely failed without finding the Maven dependency for JDBC driver jar. In this case, you would need to first install the Oracle JDBC jar into maven local repository. This is due to Oracle has not publish their JDBC jar into any public Maven repository. So we are left with manually steps by installing it. Here are the onetime setup steps: 1. Download Oracle JDBC jar from their site: http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html. 2. Unzip the file into C:/ojdbc directory. 3. Now you can install the jar file into Maven local repository using Cygwin. bash> cd /cygdrive/c/ojdbc bash> mvn install:install-file -DgroupId=com.oracle -DartifactId=ojdbc6 -Dversion=11g -Dpackaging=jar -Dfile=ojdbc6-11g.jar That should make your script run successfully. The Groovy way of using Sql has many sugarcoated methods that you let you quickly query and see data on screens. You can see more Groovy feature by studying their API doc. Note that you would need systemClassLoader=true to make Groovy load the JDBC jar into classpath and use it properly. Oh, BTW, if you are using Oracle DB production, you will likely using a RAC configuration. The JDBC url connection string for that should look something like this: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=MY_DB))) Update: 12/07/2012 It appears that the groovy.sql.Sql class has a static withInstance method. This let you run onetime DB work without writing try/finally block. See this example: @GrabConfig(systemClassLoader=true) @Grab('com.oracle:ojdbc6:11g') url= "jdbc:oracle:thin:@localhost:1521:XE" username = "system" password = "mypassword123" driver = "oracle.jdbc.driver.OracleDriver" import groovy.sql.* Sql.withInstance(url, username, password, driver) { sql -> sql.eachRow('select sysdate from dual'){ row -> println row } } It's much more compact. But be aware of performance if you run it multiple times, because you will open and close the a java.sql.Connection per each call! I have also collected couple other popular databases connection test examples. These should have their driver jars already in Maven central, so Groovy Grab should able to grab them just fine. // MySQL database test @GrabConfig(systemClassLoader=true) @Grab('mysql:mysql-connector-java:5.1.22') import groovy.sql.* Sql.withInstance("jdbc:mysql://localhost:3306/mysql", "root", "mypassword123", "com.mysql.jdbc.Driver"){ sql -> sql.eachRow('SELECT * FROM USER'){ row -> println row } } // H2Database @GrabConfig(systemClassLoader=true) @Grab('com.h2database:h2:1.3.170') import groovy.sql.* Sql.withInstance("jdbc:h2:~/test", "sa", "", "org.h2.Driver"){ sql -> sql.eachRow('SELECT * FROM INFORMATION_SCHEMA.TABLES'){ row -> println row } }
December 12, 2012
by Zemian Deng
· 29,371 Views
article thumbnail
Configuring IIS methods for ASP.NET Web API on Windows Azure Websites
That’s a pretty long title, I agree. When working on my implementation of RFC2324, also known as the HyperText Coffee Pot Control Protocol, I’ve been struggling with something that you will struggle with as well in your ASP.NET Web API’s: supporting additional HTTP methods like HEAD, PATCH or PROPFIND. ASP.NET Web API has no issue with those, but when hosting them on IIS you’ll find yourself in Yellow-screen-of-death heaven. The reason why IIS blocks these methods (or fails to route them to ASP.NET) is because it may happen that your IIS installation has some configuration leftovers from another API: WebDAV. WebDAV allows you to work with a virtual filesystem (and others) using a HTTP API. IIS of course supports this (because flagship product “SharePoint” uses it, probably) and gets in the way of your API. Bottom line of the story: if you need those methods or want to provide your own HTTP methods, here’s the bit of configuration to add to your Web.config file: Here’s what each part does: Under modules, the WebDAVModule is being removed. Just to make sure that it’s not going to get in our way ever again. The security/requestFiltering element I’ve added only applies if you want to define your own HTTP methods. So unless you need the XYZ method I’ve defined here, don’t add it to your config. Under handlers, I’m removing the default handlers that route into ASP.NET. Then, I’m adding them again. The important part? The "verb attribute. You can provide a list of comma-separated methods that you want to route into ASP.NET. Again, I’ve added my XYZ methodbut you probably don’t need it. This will work on any IIS server as well as on Windows Azure Websites. It will make your API… happy.
December 11, 2012
by Maarten Balliauw
· 20,491 Views
article thumbnail
Using YAML for Java Application Configuration
YAML is well-known format within Ruby community, quite widely used for a long time now. But we as Java developers mostly deal with property files and XMLs in case we need some configuration for our apps. How many times we needed to express complicated configuration by inventing our own XML schema or imposing property names convention? Though JSON is becoming a popular format for web applications, using JSON files to describe the configuration is a bit cumbersome and, in my opinion, is not as expressive as YAML. Let's see what YAML can do for us to make our life easier. For sure, let's start with the problem. In order for our application to function properly, we need to feed it following data somehow: version and release date database connection parameters list of supported protocols list of users with their passwords This list of parameters sounds a bit weird, but the purpose is to demonstrate different data types in work: strings, numbers, dates, lists and maps. The Java model consists of two simple classes: Connection package com.example.yaml; public final class Connection { private String url; private int poolSize; public String getUrl() { return url; } public void setUrl(String url) { this.url = url; } public int getPoolSize() { return poolSize; } public void setPoolSize(int poolSize) { this.poolSize = poolSize; } @Override public String toString() { return String.format( "'%s' with pool of %d", getUrl(), getPoolSize() ); } } and Configuration, both are typical Java POJOs, verbose because of property setters and getters (we get used to it, right?). package com.example.yaml; import static java.lang.String.format; import java.util.Date; import java.util.List; import java.util.Map; public final class Configuration { private Date released; private String version; private Connection connection; private List< String > protocols; private Map< String, String > users; public Date getReleased() { return released; } public String getVersion() { return version; } public void setReleased(Date released) { this.released = released; } public void setVersion(String version) { this.version = version; } public Connection getConnection() { return connection; } public void setConnection(Connection connection) { this.connection = connection; } public List< String > getProtocols() { return protocols; } public void setProtocols(List< String > protocols) { this.protocols = protocols; } public Map< String, String > getUsers() { return users; } public void setUsers(Map< String, String > users) { this.users = users; } @Override public String toString() { return new StringBuilder() .append( format( "Version: %s\n", version ) ) .append( format( "Released: %s\n", released ) ) .append( format( "Connecting to database: %s\n", connection ) ) .append( format( "Supported protocols: %s\n", protocols ) ) .append( format( "Users: %s\n", users ) ) .toString(); } } ow, as model is quite clear, let us try to express it as the human being normally does it. Looking back to our list of required configuration, let's try to write it down one by one. 1. version and release date version: 1.0 released: 2012-11-30 2. database connection parameters connection: url: jdbc:mysql://localhost:3306/db poolSize: 5 3. list of supported protocols protocols: - http - https 4. list of users with their passwords users: tom: passwd bob: passwd And this is it, our configuration expressed in YAML syntax is completed! The whole file sample.yml looks like this: version: 1.0 released: 2012-11-30 # Connection parameters connection: url: jdbc:mysql://localhost:3306/db poolSize: 5 # Protocols protocols: - http - https # Users users: tom: passwd bob: passwd To make it work in Java, we just need to use the awesome library called snakeyml, respectively the Maven POM file is quite simple: 4.0.0 com.example yaml 0.0.1-SNAPSHOT jar UTF-8 org.yaml snakeyaml 1.11 org.apache.maven.plugins maven-compiler-plugin 2.3.1 1.7 1.7 Please notice the usage of Java 1.7, the language extensions and additional libraries simplify a lot of regular tasks as we could see looking into YamlConfigRunner: package com.example.yaml; import java.io.IOException; import java.io.InputStream; import java.nio.file.Files; import java.nio.file.Paths; import org.yaml.snakeyaml.Yaml; public class YamlConfigRunner { public static void main(String[] args) throws IOException { if( args.length != 1 ) { System.out.println( "Usage: " ); return; } Yaml yaml = new Yaml(); try( InputStream in = Files.newInputStream( Paths.get( args[ 0 ] ) ) ) { Configuration config = yaml.loadAs( in, Configuration.class ); System.out.println( config.toString() ); } } } The code snippet here loads the configuration from file (args[ 0 ]), tries to parse it and fill up the Configuration class with meaningful data using JavaBeans conventions, converting to the declared types where possible. Running this class with sample.yml as an argument generates the following output: Version: 1.0 Released: Thu Nov 29 19:00:00 EST 2012 Connecting to database: 'jdbc:mysql://localhost:3306/db' with pool of 5 Supported protocols: [http, https] Users: {tom=passwd, bob=passwd} Totally identical to the values we have configured!
December 10, 2012
by Andriy Redko
· 240,314 Views · 6 Likes
  • Previous
  • ...
  • 508
  • 509
  • 510
  • 511
  • 512
  • 513
  • 514
  • 515
  • 516
  • 517
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×