Databases Resources

The Latest Databases Topics

This is a recurring question made by our MySQL Support customers: How can I audit the login attempts in MySQL? Logging all the attempts or just the failed ones is a very important task on some scenarios. Unfortunately there are not too many audit capabilities in MySQL Community so the first option to audit MySQL’s authentication process is to get all the information we need from logs. General Query Log The first option is the General Query Log. Let’s see an example: Enable the log: general_log_file = /var/log/mysql/mysql.log general_log = 1 User correctly authenticated: 121227 8:31:49 38 Connect root@localhost on 38 Query select @@version_comment limit 1 User not correctly authenticated: 121227 8:32:18 39 Connect root@localhost on 39 Connect Access denied for user 'root'@'localhost' (using password: YES) The problem of the General Query Log is that it will log everything so it can cause performance degradation and you will have to deal with very large files on high loaded servers. general_log variable is dynamic so a solution could be enabling and disabling the log just when it’s needed. Error log If you only care about failed attempts to login then there is another different and less problematic approach. From 5.5 it’s possible to log access denied messages to the error log. We just need to enable log_warnings with a value greater than 1: log_warnings = 2 Then check the error log: 121227 8:44:21 [Warning] Access denied for user 'root'@'localhost' (using password: YES) User Statistics If you are using Percona Server then there is a third option to get information about our users, the User Statistics. As with the previous options we can get the number of connections and failed connections made by a particular user but not the date and time of those attempts. Besides that information we can get other statistics that can be very useful if MySQL is running on a multi-tenant environment or we need to control how resources are used. Let’s seen an example, first we enable User Statistics in my.cnf: 5.5 userstat = 1 5.1 userstat_running = 1 Then we get the information about a particular user: mysql> select * from user_statistics where user='root'\G *************************** 1. row *************************** USER: root TOTAL_CONNECTIONS: 25 CONCURRENT_CONNECTIONS: 0 CONNECTED_TIME: 464 BUSY_TIME: 96 CPU_TIME: 19 BYTES_RECEIVED: 62869617 BYTES_SENT: 14520 BINLOG_BYTES_WRITTEN: 0 ROWS_FETCHED: 783051 ROWS_UPDATED: 1017714 TABLE_ROWS_READ: 1484751 SELECT_COMMANDS: 14 UPDATE_COMMANDS: 103 OTHER_COMMANDS: 3556 COMMIT_TRANSACTIONS: 0 ROLLBACK_TRANSACTIONS: 0 DENIED_CONNECTIONS: 2 LOST_CONNECTIONS: 16 ACCESS_DENIED: 0 EMPTY_QUERIES: 0 TOTAL_SSL_CONNECTIONS: 0 Here we can see that root has done 25 total connections. Two denied connections (bad password) and 16 lost connections (not closed properly). Apart from that information we get the connection time, bytes received and sent, rows accessed, commands executed and so on. Very valuable information. It is important to mention that these tables are stored in INFORMATION_SCHEMA and that means that after a mysqld restart all the information will be lost. So if you really need that information you should copy it to another table or export to a csv for further analysis. Conclusion We don’t have too many audit capabilities in MySQL Community so logging all events and then filter them with custom-made scripts is the best solution we have nowadays. If you are using Percona Server you can get more detailed information about what a particular user is doing. All options can be combined to meet your needs.

January 2, 2013

by Peter Zaitsev

· 18,167 Views

How ACID is MongoDB?

Find out how MongoDB ranks with atomicity, consistency, isolation and durability (ACID).

January 1, 2013

by Giorgio Sironi

· 82,038 Views · 5 Likes

MongoDB and the Concept of Identity in NoSQL Databases

in this article i deal with a different nosql database called mongodb a mature nosql engine born outside the .net world to clarify the concept of id in a typical no sql database. installation of mongo is really simple, just download, uncompress, locate the bin folder, and type this from an administrator console prompt to install mongo as service 1: mongod --install --logpath c:xxxx --dbpath c:yyyy you can find plenty of installation guide on the internet, but with the above install command you create a windows service that will automatically start mongodb on your machine using specified datafolder. now you should download the c# driver to connect from .net code, but if you like using linq you can install fluent mongo directly with nuget. figure 1: install fluent mongo with nuget. fluent mongo is a library that gave little linq capability over standard drivers, but adding a nuget reference to fluentmongo you get automatically a reference to the official drivers. now you are ready to insert your first record in mongodb with the above code. mongoserver server = mongoserver.create(); mongodatabase databasetest = server.getdatabase("test"); var untyped = databasetest.getcollection("untyped"); untyped.save(new bsondocument { { "name", "untyped1" } }); bsondocument seconddocument = bsondocument.parse("{name: 'untyped2', blabla: 'bla bla value'}"); untyped.save(seconddocument); in line1 i create a connection to mongodb server passing no parameter to connect to local mongodb server, then i obtain a reference to a mongodatabase object called “test” with mongoserver::getdatabase() method and finally i get a reference to a collection named “untyped” with the mongngodatabase::getcollection () method. this is quite similar to a sql server or other sql database, you have a server, the server contains several databases, and each database is composed by tables; in the same way mongo is divided into server/database/collection where a collection contains document. mongodb stores data in json format and to insert data inside a collection you can simply create a bsondocument, an object defined by c# driver assembly that is capable to represent a document composed by a series of key-value pair. to initialize a bsondocument you can pass a icollection (line 4) or if you feel more confortable with string json representation, you can user bsondocument.parse() to specify the document directly with a json string. after you inserted the above documents you can use mongovue to see what is contained inside the database. figure 2: use mongovue to see what is inside the database the interesting aspect is that each document has an unique id, even if i did not specified any special property in the code. this is a standard behavior for nosql databases, if you did not specify any id property the database engine will create unique id on his own to idendify the docuemnt. the id is a key factor for mongo and other nosql storage, if you try to store a document directly inside the collection specifying json content you will get an error. untyped.save("{name: 'json', attribute:'attribute content'}"); the mongocollection object contains a save object that accepts a string, but the above call will fail with the error subclass must implement getdocumentid . previous code works because one of the specific functionality that a bsondocument implements is the ability to manage id generation, but plain json does not have this capability. if you need to know the id generated by the database, you can query the bsondocument for its unique id after it was saved in a mongo collection . (remember that the id is not available until you save the document). bsondocument seconddocument = bsondocument.parse("{name: 'untyped2', blabla: 'bla bla value'}"); object id; type idtype; iidgenerator generator; untyped.save(seconddocument); seconddocument.getdocumentid(out id, out idtype, out generator); basically you are asking to your bsondocument to return you the generated id, as well as the type of the id and the generator that mongo used to generate that specific id. the result is represented into this snippet figure 3: the three object that you got with a call to getdocumentid: id, idtype and the generator. as you can see the id is an instance of type mongodb.bson.objectid, based on bsonvalue base class and the generator is an instance of objectidgenerator. this type of id is specific to mongo, and the documentation states that an objectid is a bson objectid is a 12-byte value consisting of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter. note that the timestamp and counter fields must be stored big endian unlike the rest of bson. this is because they are compared byte-by-byte and we want to ensure a mostly increasing order. if you want to have a generator that creates integer id, like identity column in sql server , you will find that it is simply not available out of the box, because an int value is not guarantee to be unique if you use sharding. sharding is a technique that permits to partition data into different physical instances, so each instance should generates ids that are unique across all instances and this prevents the use of a simple int32 id. clearly in .net world a guid is guarantee to be unique and is more .net oriented, so mongo db has a guid id generator, that can be specified with the above snippet of code.. bsondocument thirddocument = bsondocument.parse("{name: 'untyped3', anotherproperty: 'xxxxxxxxxxxxxxxxxxxxxxx'}"); var id2 = mongodb.bson.serialization.idgenerators.guidgenerator.instance.generateid(untyped, thirddocument); thirddocument.setdocumentid(id2); the key is using the guidgenerator (in the mongodb.bson.serialization.idgenerators namespace) to generate a valid mongoid guid value, then call the setdocumentid method of bsondocument to manually set the id and not relay on automatic id generation. if you look at the db you will find that the document with guid id has really a different id type. figure 4: the document with guid id is represented in a different way in mongovue, but as you can verify there is no problem in having documents with different id types in the same collection. this demonstrates that a no sql database has a concept of document id that is similar to the concept of id of a standard sql server, you can use a native id generation of the engine that generates a valid id during insertion or you can assign your own id to the document, but basically the whole concept of id is more engine-related and has no business meaning, so i strongly discourage to use anything that has a business meaning as id of a document. noone prevents you to insert in the document a property called “myid” or something else that has a business meaning and can be used as logical id and let the engine handle the internal id by itself.

December 26, 2012

by Ricci Gian Maria

· 8,238 Views

Getting Started with Quartz Scheduler on MySQL Database

Here are some simple steps to get you fully started with Quartz Scheduler on MySQL database using Groovy. The script below will allow you to quickly experiment different Quartz configuration settings using an external file. First step is to setup the database with tables. Assuming you already have installed MySQL and have access to create database and tables. bash> mysql -u root -p sql> create database quartz2; sql> create user 'quartz2'@'localhost' identified by 'quartz2123'; sql> grant all privileges on quartz2.* to 'quartz2'@'localhost'; sql> exit; bash> mysql -u root -p quartz2 < /path/to/quartz-dist/docs/dbTables/tables_mysql.sql The tables_mysql.sql can be found from Quartz distribution download, or directly from their source here. Once the database is up, you need to write some code to start up the Quartz Scheduler. Here is a simply Groovy script quartzServer.groovy that will run as a tiny scheduler server. // Run Quartz Scheduler as a server // Author: Author: Zemian Deng, Date: 2012-12-15_16:46:09 @GrabConfig(systemClassLoader=true) @Grab('mysql:mysql-connector-java:5.1.22') @Grab('org.slf4j:slf4j-simple:1.7.1') @Grab('org.quartz-scheduler:quartz:2.1.6') import org.quartz.* import org.quartz.impl.* import org.quartz.jobs.* config = args.length > 0 ? args[0] : "quartz.properties" scheduler = new StdSchedulerFactory(config).getScheduler() scheduler.start() // Register shutdown addShutdownHook { scheduler.shutdown() } // Quartz has its own thread, so now put this script thread to sleep until // user hit CTRL+C while (!scheduler.isShutdown()) { Thread.sleep(Long.MAX_VALUE) } And now you just need a config file quartz-mysql.properties that looks like this: # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = DatabaseScheduler org.quartz.scheduler.instanceId = NON_CLUSTERED org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate org.quartz.jobStore.dataSource = quartzDataSource org.quartz.jobStore.tablePrefix = QRTZ_ org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5 # JobStore: JDBC jobStoreTX org.quartz.dataSource.quartzDataSource.driver = com.mysql.jdbc.Driver org.quartz.dataSource.quartzDataSource.URL = jdbc:mysql://localhost:3306/quartz2 org.quartz.dataSource.quartzDataSource.user = quartz2 org.quartz.dataSource.quartzDataSource.password = quartz2123 org.quartz.dataSource.quartzDataSource.maxConnections = 8 You can run the Groovy script as usual bash> groovy quartzServer.groovy quartz-mysql.properties Dec 15, 2012 6:20:26 PM com.mchange.v2.log.MLog INFO: MLog clients using java 1.4+ standard logging. Dec 15, 2012 6:20:27 PM com.mchange.v2.c3p0.C3P0Registry banner INFO: Initializing c3p0-0.9.1.1 [built 15-March-2007 01:32:31; debug? true; trace:10] [main] INFO org.quartz.impl.StdSchedulerFactory - Using default implementation for ThreadExecutor [main] INFO org.quartz.core.SchedulerSignalerImpl - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl [main] INFO org.quartz.core.QuartzScheduler - Quartz Scheduler v.2.1.6 created. [main] INFO org.quartz.core.QuartzScheduler - JobFactory set to: org.quartz.simpl.SimpleJobFactory@1a40247 [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Using thread monitor-based data access locking (synchronization). [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - JobStoreTX initialized. [main] INFO org.quartz.core.QuartzScheduler - Scheduler meta-data: Quartz Scheduler (v2.1.6) 'DatabaseScheduler' with instanceId 'NON_CLUSTERED' Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally. NOT STARTED. Currently in standby mode. Number of jobs executed: 0 Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 5 threads. Using job-store 'org.quartz.impl.jdbcjobstore.JobStoreTX' - which supports persistence. and is not clustered. [main] INFO org.quartz.impl.StdSchedulerFactory - Quartz scheduler 'DatabaseScheduler' initialized from the specified file : 'quartz-mysql.properties' from the class resource path. [main] INFO org.quartz.impl.StdSchedulerFactory - Quartz scheduler version: 2.1.6 Dec 15, 2012 6:20:27 PM com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource getPoolManager INFO: Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, dataSourceName -> 1hge16k8r18mveoq1iqtotg|1486306, debugUnreturnedConnectionStackTraces -> fals e, description -> null, driverClass -> com.mysql.jdbc.Driver, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, identityToken -> 1hge16k8r18mveoq1iqtotg|1486306, idleConnectionTestPeriod -> 0, initialPoolSize -> 3, jdbcUrl -> jdbc:mysql://localhost:3306/quartz2, lastAcquisitionFailureDefaultUser -> null, maxAdministrativeTaskTime -> 0 , maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 8, maxStatements -> 0, maxStatementsPerConnection -> 120, minPoolSize -> 1, numHelperThreads -> 3, numThreadsAwaitingCheckoutDefaultUser -> 0, pref erredTestQuery -> null, properties -> {user=******, password=******}, propertyCycle -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, usesTraditionalReflectiveProxies -> false ] [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Freed 0 triggers from 'acquired' / 'blocked' state.[main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Recovering 0 jobs that were in-progress at the time of the last shut-down. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Recovery complete. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Removed 0 'complete' triggers. [main] INFO org.quartz.impl.jdbcjobstore.JobStoreTX - Removed 0 stale fired job entries. [main] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED started. ... CTRL+C [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED shutting down. [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED paused. [Thread-6] INFO org.quartz.core.QuartzScheduler - Scheduler DatabaseScheduler_$_NON_CLUSTERED shutdown complete. That's a full run of above setup. Go ahead and play with different config. Read http://quartz-scheduler.org/documentation/quartz-2.1.x/configuration for more details. Here I will post couple more easy config that will get you started in a commonly used config set. A MySQL cluster enabled configuration. With this, you can start one or more shell terminal and run different instance of quartzServer.groovy with the same config. All the quartz scheduler instances should cluster themselve and distribute your jobs evenly. # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = DatabaseClusteredScheduler org.quartz.scheduler.instanceId = AUTO org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate org.quartz.jobStore.dataSource = quartzDataSource org.quartz.jobStore.tablePrefix = QRTZ_ org.quartz.jobStore.isClustered = true org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5 # JobStore: JDBC jobStoreTX org.quartz.dataSource.quartzDataSource.driver = com.mysql.jdbc.Driver org.quartz.dataSource.quartzDataSource.URL = jdbc:mysql://localhost:3306/quartz2 org.quartz.dataSource.quartzDataSource.user = quartz2 org.quartz.dataSource.quartzDataSource.password = quartz2123 org.quartz.dataSource.quartzDataSource.maxConnections = 8 Here is another config set for a simple in-memory scheduler. # Main Quartz configuration org.quartz.scheduler.skipUpdateCheck = true org.quartz.scheduler.instanceName = InMemoryScheduler org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool org.quartz.threadPool.threadCount = 5 Now, if you need more fancy UI management of Quartz, give MySchedule a try.

December 21, 2012

by Zemian Deng

· 50,091 Views · 2 Likes

When you should and should NOT use ENUM data type

ENUM is a new enumerated data type introduced in CUBRID 9.0. Like in all programming languages, the ENUM type is a data type composed of a set of static, ordered values. Users can define numeric and string values for ENUM columns. Working with ENUM types Creating an ENUM column is done by specifying a static list of possible values: CREATE TABLE person( name VARCHAR(255), gender ENUM('Male', 'Female') ); CUBRID understands the ENUM type as an ordered set of constants which, in the above example, is a set of {NULL: NULL, 1: 'Male', 2: 'Female”}. To assign a value to the gender column, users may either use the index of the value ({NULL, 1, 2}) or the actual constant literal ({NULL}, {'Male'}, {'Female'}). CUBRID restricts the values that can be assigned to this column to only values from this set + NULL. Moreover, ENUM column is case-sensitive, i.e. it will raise an error if you try to enter 'female' in lower case. Also, an empty string is allowed if it is defined as one of the elements of the ENUM column. In our examples, it is not allowed. csql> INSERT INTO person(name, gender) VALUES('Eugene', 'Male'); 1 row affected. 1 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('Anne', 2); 1 row affected. 1 command(s) successfully processed. csql> SELECT * FROM person; === === name gender ============================================ 'Anne' 'Female' 'Eugene' 'Male' 2 rows selected. Any attempt to insert a value outside of the defined set will result in a coercion error. In the below case, trying to insert an empty string raises an error because it is not in the set of allowed values defined in the person table. csql> INSERT INTO person(name, gender) VALUES('John', 'N/A'); IN line 1, COLUMN 44, ERROR: before ' ); ' Cannot coerce 'N/A' TO type enum. 0 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('John', 4); IN line 1, COLUMN 45, ERROR: before ' ); ' Cannot coerce 4 TO type enum. 0 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('John', ''); IN line 1, COLUMN 44, ERROR: before ' ); ' Cannot coerce '' TO type enum. 0 command(s) successfully processed. Why you should use the ENUM type There are three important reasons for which you should consider using the ENUM type: Reduce storage space. Reduce join complexity. Create cheap values constraints. Storage Space CUBRID uses only 1 byte per tuple when 255 or less ENUM elements are defined or 2 bytes for 256~65535 elements. This is because, rather that storing the constant literal of the value, CUBRID stores the index in the ordered set of that value. For very large tables, this might prove to be a significant storage space save. Take, for example, a table with 1,000,000,000 records which has an ENUM column defined as ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'). If you use a VARCHAR type instead of the ENUM type to store these values, the column would require anywhere between 5GB and 9GB of storage space. Using the ENUM type, you can reduce the required space to 2 bytes per tuple, adding up to a total of 2GB. Reduce join complexity JOIN way The same effect of the ENUM type can be achieved by creating a one to many relationship on two or more tables. Considering the example above, you can store values for days of the week like this: CREATE TABLE days_of_week( id SHORT PRIMARY KEY, name VARCHAR(9) ); CREATE TABLE opening_hours( week_day SHORT, opening_time TIME, closing_time TIME, FOREIGN KEY fk_dow (week_day) REFERENCES days_of_week(id) ); Then, when you wish to display the name of the week day, you would execute a query like: SELECT d.name day_name, o.opening_time, o.closing_time FROM days_of_week d, opening_hours o WHERE d.id = o.week_day ORDER BY d.id; === === day_name opening_time closing_time ================================================== 'Monday' 09:00:00 AM 06:00:00 PM 'Tuesday' 09:00:00 AM 06:00:00 PM 'Wednesday' 09:00:00 AM 06:00:00 PM ... ENUM way You can achieve the same effect using an ENUM column: CREATE TABLE opening_hours( week_day ENUM ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), opening_time TIME, closing_time TIME ); And there’s no JOIN required to select opening hours: SELECT week_day, opening_time, closing_time FROM opening_hours ORDER BY week_day; === === week_day opening_time closing_time ================================================== 'Monday' 09:00:00 AM 06:00:00 PM 'Tuesday' 09:00:00 AM 06:00:00 PM 'Wednesday' 09:00:00 AM 06:00:00 PM ... This can prove to be very useful, especially if your queries join several tables. Value constraints ENUM columns behave like foreign key relationships in the sense that values from an ENUM column are restricted to the values specified in the column definition. For a short list of values, this is more efficient than creating foreign key relationships. While foreign key relationships use index scans to enforce the restriction, ENUM columns just have to go through a list of predefined values which is faster even for small indexes. Why/When you should NOT use the ENUM type Even though ENUM is a great feature, there are cases when you’d better not use it. For example: When ENUM type is not fixed When ENUM type has a long list of values When your application does not know the list of ENUM values ENUM type is not reusable Portability is a concern When ENUM type is not fixed If you’re not sure if the ENUM type holds all possible values for that column, you should consider using a one to many relationship instead. The only way in which an ENUM column can be changed to handle more values is by using an ALTER statement. This is a very expensive operation in any RDBMS and requires administrator rights. Also, ALTER statements are maintenance operations and should, as much as possible, be performed offline. When ENUM type has a long list of values ENUM types should not be used if you cannot limit a set of possible values to a few elements. When your application does not know the list of ENUM values There are only two ways of getting a list of values you have defined for an ENUM type: parsing the output of SHOW CREATE TABLE statement: csql> SHOW CREATE TABLE opening_hours; === === TABLE CREATE TABLE ============================================ 'opening_hours' 'CREATE TABLE [opening_hours] ([week_day] ENUM('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), [opening_time] TIME, [closing_time] TIME) selecting information from CUBRID system tables: csql> SELECT d.enumeration FROM _db_domain d, _db_attribute a WHERE a.attr_name = 'week_day' AND a.class_of.class_name = 'opening_hours' AND d IN a.domains; === === enumeration ====================== {'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'} Both might require complex coding and selecting from system tables requires administrator privileges. ENUM type is not reusable If you have several tables which require the names of week days, you will have to create an ENUM type for each of them. If you create a table to hold week days names, you can join this table with whichever other table that requires this information. Portability is a concern The ENUM type is only supported by a few RDBMSs and each one has its own idea as to how ENUM type is supposed to work. Below is a list of a few notable differences between CUBRID, MySQL and PostgreSQL: CUBRID PostgreSQL MySQL Inserting out of range value Throws error Throws error Inserts special value index 0 Comparing to char literals Compare as strings Compare as ENUM elements Compare as strings Comparing to values outside of the ENUM domain Compare as strings Throws error Compare as strings These subtle differences will most probably break your application in interesting and hard to understand ways. If you’re migrating from PostgreSQL to CUBRID for example, and you expect comparisons with char literals to be performed as ENUM comparisons, you’ll have a hard time understanding why your query returns weird results.

December 19, 2012

by Esen Sagynov

· 60,376 Views

All about Two-Phase Locking and a little bit MVCC

In this blog I will describe the concurrency control methods implemented in database management systems, and the differences between them. I will also explain about what locking technique is used in CUBRID RDBMS, about locking modes and their compatibility, and finally, the deadlocks and the solution for them. Overview When multiple transactions, which change the data, are executed simultaneously, it is required to control the order of processing these transactions to satisfy the ACID (Atomicity, Consistency, Integrity, Durability) property of the database. Executing multiple transactions simultaneously should lead to the same result as executing each transaction independently, in other words, one transaction should not be affected by another transaction. If different data is changed for each transaction, no interference between transactions is made, so there is no issue. However, if the same data is simultaneously changed by multiple transactions, the order of processing each transaction should be controlled. Types of Concurrency Control For example, the T1 transaction changes the A record from 1 to 2 and then changes the B record, the T2 transaction can simultaneously change the A record, too. Let's assume that the T2 transaction changes the A record from 2 to 4 by adding +2. If two transactions are successfully terminated, there is no issue. But it is important that all transactions can be rolled back. If the T1 transaction is rolled back, the value of the A record should be returned to 1, i.e. the value before the T1 transaction was executed. This is to satisfy the ACID property of the database. However, the T2 transaction has already changed the A record value to 3. So, it is impossible to return the A record to 1 regardless of the situation. In this case, there can be two options. Two-phase locking (2PL) The first one is when the T2 transaction tries to change the A record, it knows that the T1 transaction has already changed the A record and waits until the T1 transaction is completed because the T2 transaction cannot know whether the T1 transaction will be committed or rolled back. This method is called Two-phase locking (2PL). Multi-version concurrency control (MVCC) The other one is to allow each of them, T1 and T2 transactions, to have their own changed versions. Even when the T1 transaction has changed the A record from 1 to 2, the T1 transaction leaves the original value 1 as it is and writes that the T1 transaction version of the A record is 2. Then, the following T2 transaction changes the A record from 1 to 3, not from 2 to 4, and writes that the T2 transaction version of the A record is 3. When the T1 transaction is rolled back, it does not matter if the 2, the T1 transaction version, is not applied to the A record. After that, if the T2 transaction is committed, the 3, the T2 transaction version, will be applied to the A record. If the T1 transaction is committed prior to the T2 transaction, the A record is changed to 2, and then to 3 at the time of committing the T2 transaction. The final database status is identical to the status of executing each transaction independently, without any impact on other transactions. Therefore, it satisfies the ACID property. This method is called Multi-version concurrency control (MVCC). CUBRID has implemented 2PL method as well as DB2 and SQL Server, while Oracle, InnoDB and PostgreSQL have implemented MVCC. Two-phase locking in CUBRID The 2PL adopted by CUBRID uses locks to ensure the consistency between transactions that change the identical data. As the "lock" literally means, the locking is executed through two phases: expanding phase (acquiring) shrinking phase (releasing) More accurately, all transactions should acquire lock for the data to be accessed and the acquired locks are released only when the transaction is terminated. After a transaction has acquired the lock for a certain data (regardless of the lock type, S_LOCK for read, stands for Shared Lock, or X_LOCK for write, stands for Exclusive Lock), when another transaction tries to acquire a new lock for the data, the new lock is allowed or pended depending on the lock compatibility rule. Therefore, success or failure of the prior transaction does not have impact on the following transactions, so the data consistency is maintained. Lock Manager in CUBRID Thus, the key point of 2PL, adopted by CUBRID, is that the lock must be processed through two phases: expanding phase and shrinking phase. Then, [Figure 1] release all locks, acquired while executing a transaction, only after the transaction ends (commit or rollback). Figure 1: Two-Phase Locking. 2PL concurrency control method naturally controls access to the identical data from transactions by making all transactions observe the 2PL protocol. The following Figure 2 below shows an example of three transactions using 2PL: Transaction 1 executes B=B+A operation, Transaction 2 executes C=A+B operation, and Transaction 3 executes Print C operation. Since all three transactions are accessing the data A, B and C, the concurrency control is required. In this case, each transaction is executed according to the 2PL protocol so that there is no data conflict. Figure 2: Concurrency Control by using 2PL. Lock modes To understand the concurrency control of multiple transactions more deeply, let's discuss about lock modes, lock conversion and transaction isolation level. In the above figure, you can see that S-lock, Shared Lock, for A was first acquired by Transaction 1, but it is also acquired by Transaction 2, too. On the contrary, the transaction which requested X-lock is blocked until S-lock is released. In this matter, a variety of lock modes are used to minimize conflicts by lockers. Major types of locks utilized in DBMSs are. Shared (S) Lock: Used for read operation. It is generally set on the target record when SELECT statement is executed. It blocks a transaction from changing data which was already read by other transactions. Exclusive (X) Lock: Used for write-operations such as INSERT, UPDATE, DELETE. It blocks one data from being changed by multiple transactions. Update (U) Lock: Used to define that the target resource will be changed. It is used to minimize deadlock which may occur when multiple transactions are executing both read and write. Intent Shared (IS) Lock: Set on the upper resource (e.g. tables) to set the S-lock on some lower resources (e.g. records or pages). It is to prevent other transactions from setting X-lock on the upper resource. Intent lock will soon be described. Intent Exclusive (IX) Lock: Set on the upper resource to set X-lock on some lower resources. Shared with Intent Exclusive (SIX) Lock: Set on the upper resource to set S-lock and X-lock on some lower resources. Lock mode compatibility Among the lock modes above, intent locks are used to improve the transaction concurrency and to prevent deadlock between the upper resources and the lower resources. For example, when Transaction A tries to read Record R on Table T, it sets IS_LOCK on Table T before setting S_LOCK on Record R. Then, Transaction B is prevented from setting X_LOCK on Table T to change the structure of Table T. If Transaction A has not set IS_LOCK on Table T, Transaction B would change the structure of Table T. Then, Transaction A would perform a wrong read operation. This way Transaction B has no need to check all records in Table T to check whether there is any lock set by other transactions for setting X_LOCK on Table T. The following lock mode compatibility table will clearly show the effect of intent locks: Table 1: The lock mode compatibility table of CUBRID. Current Lock Mode NULL IS NS S IX SIX U NX X Newly-requested Lock Mode NULL True True True True True True True True True IS True True N/A True True True N/A N/A False NS True N/A True True N/A N/A False True False S True True True True False False False False False IX True True N/A False True False N/A N/A False SIX True True N/A False False False N/A N/A False U True N/A True True N/A N/A False False False NX True N/A True False N/A N/A False False False X True False False False False False False False False From the lock mode compatibility table, you can see that X_LOCK cannot be set on a table if IS_LOCK is set on the table. And only IS_LOCK can be compatible with SIX_LOCK. This means that SIX_LOCK intends to set S_LOCK and X_LOCK on the record and it will not allow any lock but IS_LOCK for S_LOCK on other non-conflicting records. From the table, you can see that IX_LOCK and IX_LOCK can be compatible with each other. IX_LOCK intends to set X_LOCK for some records. So, the compatibility is available. If there are two transactions that try to change an identical record, IX_LOCK for the table is allowed. However, there is no problem in concurrency control since only the transaction that has acquired X_LOCK for the record first can change the record (X_LOCK and X_LOCK are not compatible). The lock mode compatibility table is expressed as a global variable lock_Comp[][] in the lock_table.c file in CUBRID source code. Among CUBRID sources, most codes related to lock modes are implemented in lock_manager.c file. To set lock on a data object, the lock_object() function is used which receives three parameters: the OID of an object where the lock mode will be set, the OID of the class where the object belongs, and the desired lock mode. In the source code of the function, you can see that the function is executed in several ways based on the target of the lock mode, the lock mode for an instance object or for a class object. Take note of this: in CUBRID, a class object is also an object. Keep it in mind that a class object has an OID and all class objects are the instances of a root class, so it uses ROOTOID, the OID of the root object, as its class OID. From the code, you can see that the required intent lock is set on a class object when a lock mode is required for an instance object. And there is a concept of lock waiting time in the lock mode request. To retrieve the lock timeout value set on the current transaction, the logtb_find_wait_secs() function is called. CUBRID supports the SET TRANSACTION LOCK TIMEOUT SQL command and the setLockTimeout() method in JDBC. The command is to specify the lock timeout of the current transaction. Lock waiting time means the time for a transaction, which has made a request for lock mode, to wait when a lock mode is set on an object by a transaction and the requested lock is not compatible with the already-set lock mode. As you have seen before, the 2PL concurrency control method does not allow lock from other transactions until the existing lock is released. For the following two reasons, lock timeout should be set by a transaction: When a user does not want to wait too long because of the lock mode. To lower the frequency of deadlock. Deadlocks A deadlock occurs when two or more transactions request resources locked by each of them, so all transactions cannot be progressed. Figure 8 below shows an example of a deadlock. Figure 2: Transaction Deadlock. First, Transaction 1 executes UPDATE participant SET gold=10 WHERE host_year=2004 AND nation_code=’KOR’ statement and sets X_LOCK on the ‘KOR’ record. Transaction 2 sets X_LOCK on the ‘JPN’ record. Transaction 3 sets X_LOCK on the ‘CHN’ record. After that, Transaction 1 requests X_LOCK on the ‘JPN’ record for executing UPDATE for that record. However, the ‘JPN’ record is already locked with X_LOCK by Transaction 2. So, Transaction 1 should wait until Transaction 2 ends. Based on the 2PL protocol, the X_LOCK is released when the transaction ends. Transaction 2 requests X_LOCK on the ‘CHN’record and waits for Transaction 3. Finally, Transaction 3 waits for Transaction 1 to acquire the 'KOR' record of Transaction 1 as it has X_LOCK on the ‘CHN’ record. As a result,Transaction 1 waits for Transaction 2 to end, Transaction 2 waits for Transaction 3 to end, and Transaction 3 waits for Transaction 1 to end. So, no transaction can be progressed. This is called a deadlock. Most DBMSs which use the 2PL method, including CUBRID, use the deadlock detection method to solve the deadlock problem. It periodically checks whether the cycle illustrated in the above figure occurs by drawing a Lock Wait Graph for the transactions being executed. In CUBRID, the thread for detecting deadlock checks the Lock Wait Graph every second. When a deadlock is detected, one transaction among the transactions is randomly selected and aborted by force. This is called unilateral abort. When a transaction is selected as a victim to be sacrificed to solve the deadlock and unilaterally aborted, the corresponding SQL statement returns an error code. The error message is "The transaction has timed out due to deadlock while waiting for X_LOCK for an object. It waited until User 2 ended.” When an error is returned and the application aborts the transaction, the locks of the transaction are released and other transactions can be continuously processed. To see how the deadlock is detected, see the lock_detect_local_deadlock() function in the source code. This function is called with the intervals (in seconds) specified by the PRM_LK_RUN_DEADLOCK_INTERVAL variable (the deadlock_detection_interval_in_secs parameter in cubrid.conf file) on the background thread which executes thread_deadlock_detect_thread(). Even if a deadlock does not occur, when the execution time of a transaction is too long, other transactions should wait for too long as well. For a certain application, it is wiser to give up rather than wait. In particular, when a web server has called DB tasks and the wait time is too long, all threads of the web server are used to process the DB, so they cannot be used to process external HTTP requests any more, causing service failures. Therefore, for a web application, the threads should be returned without waiting an unlimited amount of time for DB processing even if an error occurs. Two methods are used for that: One is lock timeout supported by CUBRID. The other is query cancel. JDBC is defined with an API which can cancel the SQL statement being executed. The key data structure of the lock manager is defined in the lock_manager.c file. typedef struct lk_entry LK_ENTRY; struct lk_entry { #if defined(SERVER_MODE) struct lk_res *res_head; /* back to resource entry */ THREAD_ENTRY *thrd_entry; /* thread entry pointer */ int tran_index; /* transaction table index */ LOCK granted_mode; /* granted lock mode */ LOCK blocked_mode; /* blocked lock mode */ int count; /* number of lock requests */ struct lk_entry *next; /* next entry */ struct lk_entry *tran_next; /* list of locks that trans. holds */ struct lk_entry *class_entry; /* ptr. to class lk_entry */ LK_ACQUISITION_HISTORY *history; /* lock acquisition history */ LK_ACQUISITION_HISTORY *recent; /* last node of history list */ int ngranules; /* number of finer granules */ int mlk_count; /* number of instant lock requests */ unsigned char scanid_bitset[1]; /* PRM_LK_MAX_SCANID_BIT/8]; */ #else /* not SERVER_MODE */ int dummy; #endif /* not SERVER_MODE */ }; typedef struct lk_res LK_RES; struct lk_res { MUTEX_T res_mutex; /* resource mutex */ LOCK_RESOURCE_TYPE type; /* type of resource: class,instance */ OID oid; OID class_oid; LOCK total_holders_mode; /* total mode of the holders */ LOCK total_waiters_mode; /* total mode of the waiters */ LK_ENTRY *holder; /* lock holder list */ LK_ENTRY *waiter; /* lock waiter list */ LK_ENTRY *non2pl; /* non2pl list */ LK_RES *hash_next; /* for hash chain */ }; From the file, the lk_Gl global variable of LK_GLOBAL_DATA type is the core. The LK_ENTRY structure stands for the lock itself. For example, when the Transaction T1 has requested a lock, one LK_ENTRY is created. LK_RES is a structure that shows to which resource the lock belongs. In CUBRID, all resources are objects (instance objects and class objects), so they are shaped as OIDs. In the LK_RES structure, you can see the list of holders with LK_ENTRY type and the list of waiters. The list of holders is a list of transactions that hold the lock for the resource now. For example, when Transaction T1 and Transaction T2 have acquired S_LOCK for the data record with OID1, LK_ENTRY that corresponds to the S_LOCK of T1 and T2 will be registered in the list of holders. When Transaction T3 requests X_LOCK on the OID1 record, T3 should wait because of the existing S_LOCK. So, the LK_ENTRY corresponding to X_LOCK of T3 will be registered to the list of waiters. Which lock is held by which transaction is maintained in the tran_lock_table variable which has the LK_TRAN_LOCK structure as a table. The Wait For Graph for detecting a deadlock is expressed as TWFG_node and TWFG_edge of the LK_WFG_NODE structure and the LK_WFG_EDGE structure. The lock_detect_local_deadlock() function creates a Wait For Graph and detects whether there is a cycle on the graph. When a cycle is detected, the lock_select_deadlock_victim() function selects a victim transaction to be sacrificed for solving the deadlock. For reference, transactions are continuously executed while a Wait For Graph is drawn up and checked, the information of the ended transaction is removed from the graph. The victim transaction is selected based on the following criteria: If a transaction is not a holder, it cannot be a victim. When a transaction is in the commit phase or the rollback phase, it cannot be selected as a victim. Select a transaction of which lock timeout is not set to -1 (unlimited waiting) first. Select the latest transaction rather than the older one. (The transaction ID is an incremental number. A transaction with smaller transaction number is the older one.) Conclusion This concludes the talk about Two-Phase Locking in CUBRID. I briefly covered the types of concurrency control, the difference between 2PL and MVCC, about what locking technique is used in CUBRID RDBMS, about locking modes and their compatibility, and finally, the deadlocks and the solution for them. In this article I have mentioned about OID (Object Identifiers) which are used to identify instance objects as well as class objects. In the next article I will continue this talk and explain what Object, Class, and OID are.

December 14, 2012

by Esen Sagynov

· 11,258 Views · 1 Like

Checking DB Connection Using Groovy

Here is a simple Groovy script to verify Oracle database connection using JDBC. @GrabConfig(systemClassLoader=true) @Grab('com.oracle:ojdbc6:11g') url= "jdbc:oracle:thin:@localhost:1521:XE" username = "system" password = "mypassword123" driver = "oracle.jdbc.driver.OracleDriver" // Groovy Sql connection test import groovy.sql.* sql = Sql.newInstance(url, username, password, driver) try { sql.eachRow('select sysdate from dual'){ row -> println row } } finally { sql.close() } This script should let you test connection and perform any quick ad hoc queries programmatically. However, when you first run it, it would likely failed without finding the Maven dependency for JDBC driver jar. In this case, you would need to first install the Oracle JDBC jar into maven local repository. This is due to Oracle has not publish their JDBC jar into any public Maven repository. So we are left with manually steps by installing it. Here are the onetime setup steps: 1. Download Oracle JDBC jar from their site: http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html. 2. Unzip the file into C:/ojdbc directory. 3. Now you can install the jar file into Maven local repository using Cygwin. bash> cd /cygdrive/c/ojdbc bash> mvn install:install-file -DgroupId=com.oracle -DartifactId=ojdbc6 -Dversion=11g -Dpackaging=jar -Dfile=ojdbc6-11g.jar That should make your script run successfully. The Groovy way of using Sql has many sugarcoated methods that you let you quickly query and see data on screens. You can see more Groovy feature by studying their API doc. Note that you would need systemClassLoader=true to make Groovy load the JDBC jar into classpath and use it properly. Oh, BTW, if you are using Oracle DB production, you will likely using a RAC configuration. The JDBC url connection string for that should look something like this: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=MY_DB))) Update: 12/07/2012 It appears that the groovy.sql.Sql class has a static withInstance method. This let you run onetime DB work without writing try/finally block. See this example: @GrabConfig(systemClassLoader=true) @Grab('com.oracle:ojdbc6:11g') url= "jdbc:oracle:thin:@localhost:1521:XE" username = "system" password = "mypassword123" driver = "oracle.jdbc.driver.OracleDriver" import groovy.sql.* Sql.withInstance(url, username, password, driver) { sql -> sql.eachRow('select sysdate from dual'){ row -> println row } } It's much more compact. But be aware of performance if you run it multiple times, because you will open and close the a java.sql.Connection per each call! I have also collected couple other popular databases connection test examples. These should have their driver jars already in Maven central, so Groovy Grab should able to grab them just fine. // MySQL database test @GrabConfig(systemClassLoader=true) @Grab('mysql:mysql-connector-java:5.1.22') import groovy.sql.* Sql.withInstance("jdbc:mysql://localhost:3306/mysql", "root", "mypassword123", "com.mysql.jdbc.Driver"){ sql -> sql.eachRow('SELECT * FROM USER'){ row -> println row } } // H2Database @GrabConfig(systemClassLoader=true) @Grab('com.h2database:h2:1.3.170') import groovy.sql.* Sql.withInstance("jdbc:h2:~/test", "sa", "", "org.h2.Driver"){ sql -> sql.eachRow('SELECT * FROM INFORMATION_SCHEMA.TABLES'){ row -> println row } }

December 12, 2012

by Zemian Deng

· 29,423 Views

Configuring IIS methods for ASP.NET Web API on Windows Azure Websites

That’s a pretty long title, I agree. When working on my implementation of RFC2324, also known as the HyperText Coffee Pot Control Protocol, I’ve been struggling with something that you will struggle with as well in your ASP.NET Web API’s: supporting additional HTTP methods like HEAD, PATCH or PROPFIND. ASP.NET Web API has no issue with those, but when hosting them on IIS you’ll find yourself in Yellow-screen-of-death heaven. The reason why IIS blocks these methods (or fails to route them to ASP.NET) is because it may happen that your IIS installation has some configuration leftovers from another API: WebDAV. WebDAV allows you to work with a virtual filesystem (and others) using a HTTP API. IIS of course supports this (because flagship product “SharePoint” uses it, probably) and gets in the way of your API. Bottom line of the story: if you need those methods or want to provide your own HTTP methods, here’s the bit of configuration to add to your Web.config file: Here’s what each part does: Under modules, the WebDAVModule is being removed. Just to make sure that it’s not going to get in our way ever again. The security/requestFiltering element I’ve added only applies if you want to define your own HTTP methods. So unless you need the XYZ method I’ve defined here, don’t add it to your config. Under handlers, I’m removing the default handlers that route into ASP.NET. Then, I’m adding them again. The important part? The "verb attribute. You can provide a list of comma-separated methods that you want to route into ASP.NET. Again, I’ve added my XYZ methodbut you probably don’t need it. This will work on any IIS server as well as on Windows Azure Websites. It will make your API… happy.

December 11, 2012

by Maarten Balliauw

· 20,562 Views

Using YAML for Java Application Configuration

YAML is well-known format within Ruby community, quite widely used for a long time now. But we as Java developers mostly deal with property files and XMLs in case we need some configuration for our apps. How many times we needed to express complicated configuration by inventing our own XML schema or imposing property names convention? Though JSON is becoming a popular format for web applications, using JSON files to describe the configuration is a bit cumbersome and, in my opinion, is not as expressive as YAML. Let's see what YAML can do for us to make our life easier. For sure, let's start with the problem. In order for our application to function properly, we need to feed it following data somehow: version and release date database connection parameters list of supported protocols list of users with their passwords This list of parameters sounds a bit weird, but the purpose is to demonstrate different data types in work: strings, numbers, dates, lists and maps. The Java model consists of two simple classes: Connection package com.example.yaml; public final class Connection { private String url; private int poolSize; public String getUrl() { return url; } public void setUrl(String url) { this.url = url; } public int getPoolSize() { return poolSize; } public void setPoolSize(int poolSize) { this.poolSize = poolSize; } @Override public String toString() { return String.format( "'%s' with pool of %d", getUrl(), getPoolSize() ); } } and Configuration, both are typical Java POJOs, verbose because of property setters and getters (we get used to it, right?). package com.example.yaml; import static java.lang.String.format; import java.util.Date; import java.util.List; import java.util.Map; public final class Configuration { private Date released; private String version; private Connection connection; private List< String > protocols; private Map< String, String > users; public Date getReleased() { return released; } public String getVersion() { return version; } public void setReleased(Date released) { this.released = released; } public void setVersion(String version) { this.version = version; } public Connection getConnection() { return connection; } public void setConnection(Connection connection) { this.connection = connection; } public List< String > getProtocols() { return protocols; } public void setProtocols(List< String > protocols) { this.protocols = protocols; } public Map< String, String > getUsers() { return users; } public void setUsers(Map< String, String > users) { this.users = users; } @Override public String toString() { return new StringBuilder() .append( format( "Version: %s\n", version ) ) .append( format( "Released: %s\n", released ) ) .append( format( "Connecting to database: %s\n", connection ) ) .append( format( "Supported protocols: %s\n", protocols ) ) .append( format( "Users: %s\n", users ) ) .toString(); } } ow, as model is quite clear, let us try to express it as the human being normally does it. Looking back to our list of required configuration, let's try to write it down one by one. 1. version and release date version: 1.0 released: 2012-11-30 2. database connection parameters connection: url: jdbc:mysql://localhost:3306/db poolSize: 5 3. list of supported protocols protocols: - http - https 4. list of users with their passwords users: tom: passwd bob: passwd And this is it, our configuration expressed in YAML syntax is completed! The whole file sample.yml looks like this: version: 1.0 released: 2012-11-30 # Connection parameters connection: url: jdbc:mysql://localhost:3306/db poolSize: 5 # Protocols protocols: - http - https # Users users: tom: passwd bob: passwd To make it work in Java, we just need to use the awesome library called snakeyml, respectively the Maven POM file is quite simple: 4.0.0 com.example yaml 0.0.1-SNAPSHOT jar UTF-8 org.yaml snakeyaml 1.11 org.apache.maven.plugins maven-compiler-plugin 2.3.1 1.7 1.7 Please notice the usage of Java 1.7, the language extensions and additional libraries simplify a lot of regular tasks as we could see looking into YamlConfigRunner: package com.example.yaml; import java.io.IOException; import java.io.InputStream; import java.nio.file.Files; import java.nio.file.Paths; import org.yaml.snakeyaml.Yaml; public class YamlConfigRunner { public static void main(String[] args) throws IOException { if( args.length != 1 ) { System.out.println( "Usage: " ); return; } Yaml yaml = new Yaml(); try( InputStream in = Files.newInputStream( Paths.get( args[ 0 ] ) ) ) { Configuration config = yaml.loadAs( in, Configuration.class ); System.out.println( config.toString() ); } } } The code snippet here loads the configuration from file (args[ 0 ]), tries to parse it and fill up the Configuration class with meaningful data using JavaBeans conventions, converting to the declared types where possible. Running this class with sample.yml as an argument generates the following output: Version: 1.0 Released: Thu Nov 29 19:00:00 EST 2012 Connecting to database: 'jdbc:mysql://localhost:3306/db' with pool of 5 Supported protocols: [http, https] Users: {tom=passwd, bob=passwd} Totally identical to the values we have configured!

December 10, 2012

by Andriy Redko

· 240,404 Views · 6 Likes

Writing Acceptance Tests for Openshift and MongoDb Applications

Acceptance testing is used to determine if the requirements of a specification are met. It should be run in an environment as similar as possible of the production one. So if your application is deployed into Openshift, you will require a parallel account to the one used in production for running the tests. In this post we are going to write an acceptance test for an application deployed into Openshift that uses MongoDb as database backend. The application deployed is a simple library which returns all the books available for lending. This application uses MongoDb for storing all information related to books. So let's start describing the goal, feature, user story, and acceptance criteria for previous applications. Goal: Expanding a lecture to the most people. Feature: Display available books. User Story: Browse Catalog -> In order to find books I would like to borrow, As a User, I want to be able to browse through all books. Acceptance Criteria: Should see all available books. Scenario: Given I want to borrow a book When I am at catalog page Then I should see information about available books: The Lord Of The Jars - 1299 - LOTRCoverUrl , The Hobbit - 293 - HobbitCoverUrl Notice that this is a very simple application, so the acceptance criteria is simple too. For this example, we need two test frameworks, the first one for writing and running acceptance tests, and the other one for managing the NoSQL backend. In this post we are going to use Thucydides for ATDD and NoSQLUnit for dealing with MongoDb. The application is already deployed in Openshift, and you can take a look at https://books-lordofthejars.rhcloud.com/GetAllBooks Thucydides is a tool designed to make writing automated acceptance and regression tests easier. Thucydides uses WebDriver API to access HTML page elements. But it also helps you to organise your tests and user stories by using a concrete programming model, create reports of executed tests, and finally it also measures functional cover. To write acceptance tests with Thucydides next steps should be followed. First of all, choose a user story of one of your features. Then implement the PageObject class. PageObject is a pattern which models web application's user interface elements as objects, so tests can interact with them programmatically. Note that in this case we are coding "how" we are accessing to html page. Next step is implementing steps library. This class will contain all steps that are required to execute an action. For example creating a new book requires to open addnewbook page, insert new data, and click to submit button. In this case we are coding "what" we need to implement the acceptance criteria. And finally coding the chosen user story following defined Acceptance Criteria and using previous step classes. NoSQLUnit is a JUnit extension that aims us to manage lifecycle of required NoSQL engine, help us to maintain database into known state and standarize the way we write tests for NoSQL applications. NoSQLUnit is composed by two groups of JUnit rules, and two annotations. In current case, we don't need to manage lifecycle of NoSQL engine, because it is managed by external entity (Openshift). So let's getting down on work: First thing we are going to do is create a feature class which contains no test code; it is used as a way of representing the structure of requirements. public class Application { @Feature public class Books { public class ListAllBooks {} } } Note that each implemented feature should be contained within a class annotated with @Feature annotation. Every method of featured class represents a user story. Next step is creating the PageObject class. Remember that PageObject pattern models web application's user interface as object. So let's see the html file to inspect what elements must be mapped. List of Available BooksTitleNumber Of PagesCover ..... The most important thing here is that table tag has an id named listBooks which will be used in PageObject class to get a reference to its parameters and data. Let's write the page object: @DefaultUrl("http://books-lordofthejars.rhcloud.com/GetAllBooks") public class FindAllBooksPage extends PageObject { @FindBy(id = "listBooks") private WebElement tableBooks; public FindAllBooksPage(WebDriver driver) { super(driver); } public TableWebElement getBooksTable() { Map> tableValues = new HashMap>(); tableValues.put("titles", titles()); tableValues.put("numberOfPages", numberOfPages()); tableValues.put("covers", coversUrl()); return new TableWebElement(tableValues); } private List titles() { List namesWebElement = tableBooks.findElements(By.className("title")); return with(namesWebElement).convert(toStringValue()); } private List numberOfPages() { List numberOfPagesWebElement = tableBooks.findElements(By.className("numberOfPages")); return with(numberOfPagesWebElement).convert(toStringValue()); } private List coversUrl() { List coverUrlWebElement = tableBooks.findElements(By.className("cover")); return with(coverUrlWebElement).convert(toImageUrl()); } private Converter toImageUrl() { return new Converter() { @Override public String convert(WebElement from) { WebElement imgTag = from.findElement(By.tagName("img")); return imgTag.getAttribute("src"); } }; } private Converter toStringValue() { return new Converter() { @Override public String convert(WebElement from) { return from.getText(); } }; } } Using @DefaultUrl we are setting which URL is being mapped, with @FindBy we map the web element with id listBooks, and finally getBooksTable() method which returns the content of generated html table. The next thing to do is implementing the steps class; in this simple case we only need two steps, the first one that opens the GetAllBooks page, and the other one which asserts that table contains the expected elements. public class EndUserSteps extends ScenarioSteps { public EndUserSteps(Pages pages) { super(pages); } private static final long serialVersionUID = 1L; @Step public void should_obtain_all_inserted_books() { TableWebElement booksTable = onFindAllBooksPage().getBooksTable(); List titles = booksTable.getColumn("titles"); assertThat(titles, hasItems("The Lord Of The Rings", "The Hobbit")); List numberOfPages = booksTable.getColumn("numberOfPages"); assertThat(numberOfPages, hasItems("1299", "293")); List covers = booksTable.getColumn("covers"); assertThat(covers, hasItems("http://upload.wikimedia.org/wikipedia/en/6/62/Jrrt_lotr_cover_design.jpg", "http://upload.wikimedia.org/wikipedia/en/4/4a/TheHobbit_FirstEdition.jpg")); } @Step public void open_find_all_page() { onFindAllBooksPage().open(); } private FindAllBooksPage onFindAllBooksPage() { return getPages().currentPageAt(FindAllBooksPage.class); } } And finally class for validating the acceptance criteria: @Story(Application.Books.ListAllBooks.class) @RunWith(ThucydidesRunner.class) public class FindBooksStory { private final MongoDbConfiguration mongoDbConfiguration = mongoDb() .host("127.0.0.1").databaseName("books") .username(MongoDbConstants.USERNAME) .password(MongoDbConstants.PASSWORD).build(); @Rule public final MongoDbRule mongoDbRule = newMongoDbRule().configure( mongoDbConfiguration).build(); @Managed(uniqueSession = true) public WebDriver webdriver; @ManagedPages(defaultUrl = "http://books-lordofthejars.rhcloud.com") public Pages pages; @Steps public EndUserSteps endUserSteps; @Test @UsingDataSet(locations = "books.json", loadStrategy = LoadStrategyEnum.CLEAN_INSERT) public void finding_all_books_should_return_all_available_books() { endUserSteps.open_find_all_page(); endUserSteps.should_obtain_all_inserted_books(); } } There are some things that should be considered in previous class: @Story should receive a class defined with @Feature annotation, so Thucydides can create correctly the report. We use MongoDbRule to establish a connection to remote MongoDb instance. Note that we can use localhost address because of port forwarding Openshift capability so although localhost is used, we are really managing remote MongoDb instance. Using @Steps Thucydides will create an instance of previous step library. And finally @UsingDataSet annotation to populate data into MongoDb database before running the test. { "book":[ { "title": "The Lord Of The Rings", "numberOfPages": "1299", "cover": "http:\/\/upload.wikimedia.org\/wikipedia\/en\/6\/62\/Jrrt_lotr_cover_design.jpg" }, { "title": "The Hobbit", "numberOfPages": "293", "cover": "http:\/\/upload.wikimedia.org\/wikipedia\/en\/4\/4a\/TheHobbit_FirstEdition.jpg" } ] } Note that NoSQLUnit maintains the database into known state by cleaning database before each test execution and populating it with known data defined into a json file. Also keep in mind that this example is very simple so only and small subset of capabilities of Thucydides and NoSQLUnit has been shown. Keep watching both sites: http://thucydides.info and https://github.com/lordofthejars/nosql-unit We keep learning, Alex. Love Is A Burning Thing, And It Makes A Fiery Ring, Bound By Wild Desire, I Fell Into A Ring Of Fire (Ring of Fire - Johnny Cash)

December 9, 2012

by Alex Soto

· 5,988 Views

A C# .NET Client Proxy for RabbitMQ Management API

RabbitMQ comes with a very nice Management UI and a HTTP JSON API, that allows you to configure and monitor your RabbitMQ broker. From the website: “The rabbitmq-management plugin provides an HTTP-based API for management and monitoring of your RabbitMQ server, along with a browser-based UI and a command line tool, rabbitmqadmin. Features include: Declare, list and delete exchanges, queues, bindings, users, virtual hosts and permissions. Monitor queue length, message rates globally and per channel, data rates per connection, etc. Send and receive messages. Monitor Erlang processes, file descriptors, memory use. Export / import object definitions to JSON. Force close connections, purge queues.” Wouldn’t it be cool if you could do all these management tasks from your .NET code? Well now you can. I’ve just added a new project to EasyNetQ called EasyNetQ.Management.Client. This is a .NET client-side proxy for the HTTP-based API. It’s on NuGet, so to install it, you simply run: PM> Install-Package EasyNetQ.Management.Client To give an overview of the sort of things you can do with EasyNetQ.Client.Management, have a look at this code. It first creates a new Virtual Host and a User, and gives the User permissions on the Virtual Host. Then it re-connects as the new user, creates an exchange and a queue, binds them, and publishes a message to the exchange. Finally it gets the first message from the queue and outputs it to the console. var initial = new ManagementClient("http://localhost", "guest", "guest"); // first create a new virtual host var vhost = initial.CreateVirtualHost("my_virtual_host"); // next create a user for that virutal host var user = initial.CreateUser(new UserInfo("mike", "topSecret")); // give the new user all permissions on the virtual host initial.CreatePermission(new PermissionInfo(user, vhost)); // now log in again as the new user var management = new ManagementClient("http://localhost", user.name, "topSecret"); // test that everything's OK management.IsAlive(vhost); // create an exchange var exchange = management.CreateExchange(new ExchangeInfo("my_exchagne", "direct"), vhost); // create a queue var queue = management.CreateQueue(new QueueInfo("my_queue"), vhost); // bind the exchange to the queue management.CreateBinding(exchange, queue, new BindingInfo("my_routing_key")); // publish a test message management.Publish(exchange, new PublishInfo("my_routing_key", "Hello World!")); // get any messages on the queue var messages = management.GetMessagesFromQueue(queue, new GetMessagesCriteria(1, false)); foreach (var message in messages) { Console.Out.WriteLine("message.payload = {0}", message.payload); } This library is also ideal for monitoring queue levels, channels and connections on your RabbitMQ broker. For example, this code prints out details of all the current connections to the RabbitMQ broker: var connections = managementClient.GetConnections(); foreach (var connection in connections) { Console.Out.WriteLine("connection.name = {0}", connection.name); Console.WriteLine("user:\t{0}", connection.client_properties.user); Console.WriteLine("application:\t{0}", connection.client_properties.application); Console.WriteLine("client_api:\t{0}", connection.client_properties.client_api); Console.WriteLine("application_location:\t{0}", connection.client_properties.application_location); Console.WriteLine("connected:\t{0}", connection.client_properties.connected); Console.WriteLine("easynetq_version:\t{0}", connection.client_properties.easynetq_version); Console.WriteLine("machine_name:\t{0}", connection.client_properties.machine_name); } On my machine, with one consumer running it outputs this: connection.name = [::1]:64754 -> [::1]:5672 user: guest application: EasyNetQ.Tests.Performance.Consumer.exe client_api: EasyNetQ application_location: D:\Source\EasyNetQ\Source\EasyNetQ.Tests.Performance.Consumer\bin\Debug connected: 14/11/2012 15:06:19 easynetq_version: 0.9.0.0 machine_name: THOMAS You can see the name of the application that’s making the connection, the machine it’s running on and even its location on disk. That’s rather nice. From this information it wouldn’t be too hard to auto-generate a complete system diagram of your distributed messaging application. Now there’s an idea :)

December 7, 2012

by Mike Hadlow

· 8,093 Views

Easy Integration Testing with Spring+Hibernate

I am guilty of not writing integration testing (At least for database related transactions) up until now. So in order to eradicate the guilt i read up on how one can achieve this with minimal effort during the weekend. Came up with a small example depicting how to achieve this with ease using Spring and Hibernate. With integration testing, you can test your DAO(Data access object) layer without ever having to deploy the application. For me this is a huge plus since now i can even test my criteria's, named queries and the sort without having to run the application. There is a property in hibernate that allows you to specify an sql script to run when the Session factory is initialized. With this, i can now populate tables with data that required by my DAO layer. The property is as follows; import.sql According to the hibernate documentation, you can have many comma separated sql scripts.One gotcha here is that you cannot create tables using the script. Because the schema needs to be created first in order for the script to run. Even if you issue a create table statement within the script, this is ignored when executing the script as i saw it. Let me first show you the DAO class i am going to test; package com.unittest.session.example1.dao; import org.springframework.transaction.annotation.Propagation; import org.springframework.transaction.annotation.Transactional; import com.unittest.session.example1.domain.Employee; @Transactional(propagation = Propagation.REQUIRED) public interface EmployeeDAO { public Long createEmployee(Employee emp); public Employee getEmployeeById(Long id); } package com.unittest.session.example1.dao.hibernate; import org.springframework.orm.hibernate3.support.HibernateDaoSupport; import com.unittest.session.example1.dao.EmployeeDAO; import com.unittest.session.example1.domain.Employee; public class EmployeeHibernateDAOImpl extends HibernateDaoSupport implements EmployeeDAO { @Override public Long createEmployee(Employee emp) { getHibernateTemplate().persist(emp); return emp.getEmpId(); } public Employee getEmployeeById(Long id) { return getHibernateTemplate().get(Employee.class, id); } } Nothing major, just a simple DAO with two methods where one is to persist and one is to retrieve. For me to test the retrieval method i need to populate the Employee table with some data. This is where the import sql script which was explained before comes into play. The import.sql file is as follows; insert into Employee (empId,emp_name) values (1,'Emp test'); This is just a basic script in which i am inserting one record to the employee table. Note again here that the employee table should be created through the hibernate auto create DDL option in order for the sql script to run. More info can be found here. Also the import.sql script in my instance is within the classpath. This is required in order for it to be picked up to be executed when the Session factory is created. Next up let us see how easy it is to run integration tests with Spring. package com.unittest.session.example1.dao.hibernate; import static org.junit.Assert.*; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import org.springframework.test.context.transaction.TransactionConfiguration; import com.unittest.session.example1.dao.EmployeeDAO; import com.unittest.session.example1.domain.Employee; @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations="classpath:spring-context.xml") @TransactionConfiguration(defaultRollback=true,transactionManager="transactionManager") public class EmployeeHibernateDAOImplTest { @Autowired private EmployeeDAO employeeDAO; @Test public void testGetEmployeeById() { Employee emp = employeeDAO.getEmployeeById(1L); assertNotNull(emp); } @Test public void testCreateEmployee() { Employee emp = new Employee(); emp.setName("Emp123"); Long key = employeeDAO.createEmployee(emp); assertEquals(2L, key.longValue()); } } A few things to note here is that you need to instruct to run the test within a Spring context. We use the SpringJUnit4ClassRunner for this. Also the transction attribute is set to defaultRollback=true. Note that with MySQL, for this to work, your tables must have the InnoDB engine set as the MyISAM engine does not support transactions. And finally i present the spring configuration which wires everything up; com.unittest.session.example1.**.* org.hibernate.dialect.MySQLDialect com.mysql.jdbc.Driver jdbc:mysql://localhost:3306/hbmex1 root password true org.hibernate.dialect.MySQLDialect create import.sql That is about it. Personally i would much rather use a more light weight in-memory database such as hsqldb in order to run my integration tests. Here is the eclipse project for anyone who would like to run the program and try it out.

November 27, 2012

by Dinuka Arseculeratne

· 56,301 Views · 2 Likes

IndexedDBViewer: Take a Look Inside your IndexedDB

Some days ago I released a new version of the IndexedDBViewer 1.1.0. The IndexedDBViewer is intended for web developers who want to sneak into their indexedDB database. It allows you to inspect the structure of your database as well as the data stored inside this structure. The difference with the previous version is that it no longer needs the jQueryUI library. This way I eliminated at least one reference. The following references are still necessary: jQuery (version 1.8.2 or higher) linq2indexedDB (version 1.0.19 or higher) indexedDBViewer (1.1.0 – JavaScript + css file) If you are using nuget, you can get all the resource by searching for the indexedDBViewer. The second major change is that the viewer can easily be added to an existing page. The only thing you need to do is add a div with “indexedDBViewer” as id and data-dbName attribute to pass the database you want to inspect. The rest will be handled by the script in the viewer. Once this is done and you navigate to the page with the viewer, you will get the following result In the bottom you will see the view appear. On the left pane you get an overview of the database structure. This a list with on top the name of the database. Under that you will find child nodes that represent the object stores present in the database. If we descend an other level we can see the indexes present on the object store. If you click on the “+” or “-“ next to the names, you can expand or hide the structure beneath. If you click on the database name in the navigation pane, you will get information about the database and it’s structure. In the general block you will see the name of the database and the version it is currently in The object stores block gives you an overview of all the object stores present and how they are configured. The indexes block shows all the present indexes and how they are configured. When you click on one of the object store names in the navigation pane, you will get all the data present in the object store. Because the data is saved as a key/value pair, you will see the key with his corresponding value. If the value is an object or contains objects, then you can inspect them by clicking on the “+” to expand and “-” to hide the details. If you click on one of the index names in the navigation pane, you will get – similar as for object stores – all the data present in the object store. But in this case you will see a little more. Besides the key of the index and the value you will see the key the value has in the object store. This can be found under the “primary” key column. As last there are some little extra features: If you click on the top border of the viewer and drag it up or down, then you can change the height of the viewer. if you click on the “-” in the right top of the viewer, you can hide the viewer. If you want it to appear again, then you have to click on the “+” on the right bottom of the page. Conclusion With this Chrome like indexedDBViewer you can inspect the structure of your database inclusive all data stored within it. This with the advantage that it runs inside the browser, so you can use it cross-browser.

November 26, 2012

by Kristof Degrave

· 5,521 Views

Migrating from BIND DNS servers to Amazon Route 53 by using cli53

Amazon Route 53 offers DNS as a hosted service. They maintain robust DNS service for you with easy to user web interface and API. Name servers are part of any serious hosting infrastructure. They have been traditionally run on BIND or similar UNIX daemons. The usual hosting tasks including mapping domain names and subdomains (A, CNAME, et. al records) to servers and delivering email (MX and SPF records). Usually a basic hosting plan is only good for setting A records for few subdomains like www, but on the instant you need something more advanced like routing MX records to Google Apps email you need to start considering a real DNS service. GoDaddy and other big name registrants provide tools for this, but they are usually specific only for top level domains (TLDs) the registrant supports. If you come from a cold country like .fi you cannot centralize your DNS management to big popular domain registrants. The bad thing with running your own name servers is that when the service goes down the whole your infrastructure becomes virtually inaccessible – web sites, email, anything. Thus you usually run at least two BIND daemons on two physically separated servers just in case one blows up the other can still keep DNS records running. Also DNS, being core part of internet infrastructure, is subject for many kind of attacks and keeping your skills and knowledge up-to-date with BIND may be time consuming. Like with email servers, hosting name servers is a thing you really don’t want to do. Enter Amazon Route 53. (tha name Route 53 probably comes from the famous U.S. road Route 67 and DNS port 53?). Route 53 is ridiculous easy to use. Just go to the site and register one for yourself. However if you have legacy BIND servers running around there exist ways to import data automatically instead of manually reading through zone files and typing them in to Route 53 by hand. 1. Setting up cli53 cli53 is a Python based command line interface for Route 53. It uses buildout for installation (of Python package dependencies). These instructions have been tested with Ubuntu 8.04 server. Prerequisitements for using this stuff is knowing basic Ubuntu server management from the command line. Run everything as root. Install wget and unzip apt-get install wget unzip Download ZIP copy of cli53 from Github (we don’t do git clone here – no need to install git on the server): wget --no-check-certificate https://github.com/barnybug/cli53/zipball/master Extract unzip master Install as user local under the extracted folder (note that the folder varies across GitHub exports) cd barnybug-cli53-3b468b7 python bootstrap.py bin/buildout This will generate the local bin/cli53 command. Now you need to set up AWS credentials. Route 53, and other AWS service, are operated using access keys which you can get from Security Credentials page in AWS web console. Set the access keys for your shell/SSH session using enviroment variables: export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXX export AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXX Side note: as this point you need to check that your server clock is in correct time your AWS remote operations will fail (300 seconds thresold).. Use date command to see the server clock time and correct it with the same date command using this obscure date command syntax. Now you can run a test by listing your Amazon Route 53 Hosted Zones (Hosted Zone = zonefile = one per top level domain usually) using list command: bin/cli53 list The proper output for one domain which has been created throught-the-web in Route 53 control panel is something like: HostedZones: - CallerReference: 9265CCC3-9C41-98CE-8820-E26C0356C478 Config: Comment: test Id: /hostedzone/Z2L3NT0WCS8OWA Name: xxx.fi. 2. Importing zone files Now when cli53 is running and working it is time to rock’n'roll. On our server configuration BIND zonefiles lived in /srv/dns/zones though this is not the default location for Ubuntu BIND. There is one file per each zone and the filename is the domain name. [root@foobar][23:02][/srv/dns/zones]# ls -l total 176 -rw-r--r-- 1 root root 425 Dec 30 2008 abc.com -rw-r--r-- 1 root root 1713 Dec 30 2008 abcabc.com -rw-r--r-- 1 root root 1719 Dec 30 2008 abcabcabc.com We can simple create an one-liner shell script which will loop through all the files and import them to Amazon Route 53. However – there is a trick. You’ll get unknown origin exception if your zone files lack a certain line: Traceback (most recent call last): File "/root/barnybug-cli53-3b468b7/bin/cli53", line 22, in cli53.cli53.main() File "/root/barnybug-cli53-3b468b7/src/cli53/cli53.py", line 495, in main args.func(args) File "/root/barnybug-cli53-3b468b7/src/cli53/cli53.py", line 268, in cmd_import raise Exception, 'Could not find origin' Exception: Could not find origin This is because the zone file doesn’t have $ORIGIN directive. it’s optional with BIND as BIND uses the filename as $ORIGIN if it’s not defined, but the directive is mandatory for cli53 (cli53 doesn’t yet synthetize $ORIGIN). You can simple add $ORIGIN to the beginning of each file and copy files to a working directory with the following shell one-liner (it’s always safe to make copies of processed files than try to fix them in place):./transfer/$i ; done mkdir ../transfer for i in * ; do echo "\$ORIGIN $i." > ../transfer/$i ; cat $i >> ../transfer/$i ; done Then do a test run for one import: ~/barnybug-cli53-3b468b7/bin/cli53 create xxx.fi ~/barnybug-cli53-3b468b7/bin/cli53 import xxx.fi --file xxx.fi --replace --wait Note: deleting zones in Route 53 control panel is painful difficult, so make sure you import zones you only really need. If everything looks good we are ok to upload everything to Route 53. Again, a shell one-liner does the trick for us: for i in * ; do ~/barnybug-cli53-3b468b7/bin/cli53 create $i ; ~/barnybug-cli53-3b468b7/bin/cli53 import $i --file $i --wait ; done This will take abour 15-20 seconds per domain. Note: if you need to re-import add –replace flag. 3. The ugly part Each zone has Source of Authority record with authoritative nameserves (NS records). These will be change from your BIND server IP addresses to Amazon ones. Route 53 will re-assign its own name servers for each imported Hosted Zone. However, you cannot know beforehand what name servers the dice has chosen for your Hosted Zone, so automatizing this process is little bit difficult. NS records servers are overriden when the zone file is uploaded to Route 53. You can see new SOA and NS records when you choose the domain in Route 53 control panel and press Go to Record Sets. To make things easier later on here is a script which will dump all the name servers of all domains you have in Route 53 and collect are info to domain-info.txt file: for i in * ; do ~/barnybug-cli53-3b468b7/bin/cli53 info $i >> /tmp/domain-info.txt ; done 4. Testing Route 53 DNS Browser through imported domain records in Route 53 control panel and see if they look ok. You can also test the actual DNS functionality. When you use ping command, your web browser or any other “normal” mean which queries DNS records they use the Source of Authority name servers as set by in your domain registrant settings. When in the middle of migration to Route 53, the authorative name servers are still pointing to the old name servers in this point. dig command can be used to query a specific name server for DNS records. In Route 53 control panel pick any domain and any of its name servers Use dig command to query the records of this domain Example: dig -t ANY @ns-1340.awsdns-39.org yourdomainname.com Output should be something like this: ; <<>> DiG 9.4.2-P1 <<>> -t ANY @ns-1340.awsdns-39.org yourdomain.com ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9352 ;; flags: qr aa rd; QUERY: 1, ANSWER: 10, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;yourdomain.com. IN ANY ;; ANSWER SECTION: yourdomain.com. 172800 IN NS ns-470.awsdns-58.com. yourdomain.com. 172800 IN NS ns-553.awsdns-05.net. yourdomain.com. 172800 IN NS ns-1340.awsdns-39.org. yourdomain.com. 172800 IN NS ns-1706.awsdns-21.co.uk. yourdomain.com. 900 IN SOA ns-1340.awsdns-39.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400 yourdomain.com. 0 IN MX 10 aspmx.l.google.com. yourdomain.com. 0 IN MX 20 alt1.aspmx.l.google.com. yourdomain.com. 0 IN MX 20 alt2.aspmx.l.google.com. yourdomain.com. 0 IN TXT "v=spf1 a mx a:smtp.something.fi a:auth-smtp.something.fi include:aspmx.googlemail.com ~all" yourdomain.com. 0 IN A 84.20.128.49 ;; Query time: 32 msec ;; SERVER: 205.251.197.60#53(205.251.197.60) ;; WHEN: Wed Nov 23 00:04:53 2011 ;; MSG SIZE rcvd: 413 (subdomains are not listed) 5. Finalizing it Go to your domain name registrant and update the domain name servers point to your Route 53 servers. For each domain in domain-info.txt Go to the corresponding registrant from whom you obtained the domain (e.g. domain.ficora.fi) Update the authoritative name servers to be as stated in the file Wait 24-48 hours or whatever time-to-live time you have for your name servers – usually this is longish period See that your services still run Take down your old name servers 6. More info http://hathology.com/2011/02/how-to-import-bind-zone-files-into-amazon-route-53/ http://docs.amazonwebservices.com/Route53/latest/DeveloperGuide/index.html?MigratingDNS.html#Step_UpdateRegistrar

November 23, 2012

by Mikko Ohtamaa

· 13,688 Views

API Server Design - Making De-Normalization the Norm

In database design classes in Computer Science, we learn that normalization is a good thing. And it certainly is a good thing, for databases. In the case of APIs, it is a different story. If a client must do multiple GETs to obtain the data it needs, or multiple PUTs or POSTs to send up data, just because your database happens to be normalized, then something is wrong. One of the functions of an API Server is to de-normalize your data so that clients are spared from making extra REST API calls, with all of the overhead which goes with that. Mugunth Kumar explains this very well in this excellent presentation, using Twitter as an example. When you do a GET on a tweet, it not only returns you the Tweet itself, but also other information (e.g. description of the Twitter user who sent the tweet). This saves the API client (often a mobile app) from making another request for that data. Effectively, the API Server has gathered up that data, which may come from different database tables, and de-normalized it for the response. You can try it out yourself here, by looking at the JSON which comes back from this Twitter API GET the most recent Tweet from my timeline. Many Vordel customers are using the API Server to gather together the data which is returned to the API clients, often taking this data from multiple sources (not only databases, but also message queues and even from other APIs). This data is then amalgamated into single JSON or XML structures. It often then cached at the API Server, in this structure. In this way, clients are spared from doing multiple calls, and instead (like the Twitter API example above) get the data they need in one request, or can PUT or POST up data in one action, rather than piecemeal. De-normalization is key to this process, and is one of the great benefits of an API Server.

November 21, 2012

by Oren Eini

· 9,876 Views

Spock and testing RESTful API services

Spock is a BBD testing framework that allows for easy BDD tests to be written. The framework is an extension upon JUnit which allows for easy IDE integration and using existing JUnit functionality. Spock tests are written in Groovy and can be used for writing a wide range of tests from small unit tests to full application integration tests. Without going into too much detail on how to write Spock based tests (see below for a few excellent links), lets go through how we can use the framework to build integration tests for testing a RESTful API. Our first RESTful API Test package com.wolfware.integration import groovyx.net.http.RESTClient import spock.lang.* import spock.lang.Specification import com.movideo.spock.extension.APIVersion import com.movideo.spock.extension.EnvironmentEndPoint @APIVersion(minimimApiVersion="1.0.0.0") class GetAuthenticationToken extends Specification { @EnvironmentEndPoint protected def environmentHost def "Get authentication token XML from API for valid account"() { given: "a valid account" def authenticationTokenRequestParams = ['key':"AAABBBCCC123", 'user':"[email protected]"] and: "a client to get the authentication token XML" def client = new RESTClient(environmentHost) when: "we attempt to retrieve authentication token XML" def resp = client.get(path : "/authenticate", query : authenticationTokenRequestParams) then: "we should get a valid authentication token XML response" assert resp.data.token.isEmpty() == false // lots more asserts } } As you can see, apart from the @APIVersion and @EnvironmentEndPoint annotations (these are Spock extensions as explained later), the spec is a fairly simple Spock test. This specification has a feature that, as the name suggests, gets a authentication token in XML format and validates it. Lets look at each step: Given The url parameters required to get a authentication token from the RESTful service When using the Groovy RestClient to call the RESTful service for the authentication token details Then We can assert all the details of the response. The thing I really like about Spock is the readability of the tests. From the name being a descriptive sentence rather than some short hand with _ throughout to make a valid method name to being able to easily see where setup of the test is done and then the expectations and assertions. Trying to test any environment RESTful service I've found that when trying to write integration tests, there has either been: Hard coded environment details and the code branched for each environment making it near impossible to keep code in sync as merge hell becomes the norm. Config files that define the environment are used to define environment details, again checked into each branch for each environment. Trying to follow the principles of continuous delivery, it would be great to be able to use the same code base to test against any environment. This is where Spock Extensions come into play to help us out. Spock Extensions In short Spock allows us to extend it to perform other functionality during the test life-cycle (a great post on extensions can be read on this excellent blog post). I've developed two extensions which help to make the idea of running the same test suite across different environments easier. The @EnvironmentEndPoint Extension The aim of this Spock extension is to have a placeholder variable in code that at run-time, can be defined with the environment host of the RESTful services that we want to test. package com.movideo.runtime.extension.custom import org.apache.commons.logging.Log import org.apache.commons.logging.LogFactory import org.spockframework.runtime.extension.AbstractAnnotationDrivenExtension import org.spockframework.runtime.extension.AbstractMethodInterceptor import org.spockframework.runtime.extension.IMethodInvocation import org.spockframework.runtime.model.FieldInfo import org.spockframework.runtime.model.SpecInfo /** * Spock Environment Annotation Extension */ class EnvironmentEndPointExtension extends AbstractAnnotationDrivenExtension { private static final Log LOG = LogFactory.getLog(getClass()); private static def config = new ConfigSlurper().parse(new File('src/test/resources/SpockConfig.groovy').toURL()) /** * env environment variable * * Defaults to {@code LOCAL_END_POINT} */ private static final String envString = System.getProperties().getProperty("env", config.envHost); static { LOG.info("Environment End Point [" + envString + "]") } /** * {@inheritDoc} */ @Override void visitFieldAnnotation(EnvironmentEndPoint annotation, FieldInfo field) { def interceptor = new EnvironmentInterceptor(field, envString) interceptor.install(field.parent.getTopSpec()) } } /** * * Environment Intercepter * */ class EnvironmentInterceptor extends AbstractMethodInterceptor { private final FieldInfo field private final String envString EnvironmentInterceptor(FieldInfo field, String envString) { this.field = field this.envString = envString } private void injectEnvironmentHost(target) { field.writeValue(target, envString) } @Override void interceptSetupMethod(IMethodInvocation invocation) { injectEnvironmentHost(invocation.target) invocation.proceed() } @Override void install(SpecInfo spec) { spec.setupMethod.addInterceptor this } } The EnvironmentEndPointExtension class defines the following: config: is a ConfigSlurper that parses a config file 'SpockConfig.groovy' that is used to define the default environment host (envHost) envString: gets the value of 'env' from all System Properties (these include run-time properties) and defaults to config.envHost With the environment host able to be accessed by Spock, now we need to inject this into the placeholder variable for Spock tests to access. An interceptor is created which is used to inject(field.writeValue method) the value of the environment host into the placeholder variable. This placeholder is the one that the @EnvironmentEndPoint is annotating. When the test is run, the interceptor sets the placeholder variable and the test can then use this value as the host for the RestClient object. When running the Spock tests either the default value from the config file will be used or the JVM argument -Denv=? can be used. This makes running the same test code base against any environment so much easier. A note on Gradle builds. By default, Gradle will not pass through JVM arguments through to forked processes such as running tests. The code snippet below shows how to achieve this: /* * Required to pass all system properties to Test tasks. * Not default for Gradle to pass system properties through to forked processes. */ tasks.withType(Test) { def config = new ConfigSlurper().parse(new File('src/test/resources/SpockConfig.groovy').toURL()) systemProperty 'env', System.getProperty('env', config.envHost) } This allows all tasks that are a type of 'Test' to have some custom code run. In this case, we are defining the 'SpockConfig.groovy' config file and then setting 'systemPropery' within Gradle Test tasks to 'env' and either getting the value from the passed in JVM argument or from the config file. With this code in the build.gradle, we're able to run all tests via a Gradle test build, which will produce lovely test reports (in Gradle HTML and JUnit XML). The @APIVersion Extension Another integration testing problem I've found is that if we try and develop our tests first (or at least during the process of developing a feature or bug fix) that running the same tests against an environment that doesn't yet have the new code base (but we are using the same test code base everywhere), we'll have failing tests that aren't really failures as the new code isn't there yet. To help solve this problem, I've developed the @APIVersion extension to help with this issue. As newly developed code should be deployed with a new version, we can use this version to compare to a minimum version that a test can be run against. package com.movideo.runtime.extension.custom import groovyx.net.http.RESTClient import java.lang.annotation.Annotation import java.util.regex.Pattern import org.apache.commons.logging.Log import org.apache.commons.logging.LogFactory import org.spockframework.runtime.extension.AbstractAnnotationDrivenExtension import org.spockframework.runtime.model.FeatureInfo import org.spockframework.runtime.model.SpecInfo /** * API Version Extension * */ class APIVersionExtension extends AbstractAnnotationDrivenExtension { /** * Logger */ private static final Log LOG = LogFactory.getLog(getClass()); /** * */ private static def config = new ConfigSlurper().parse(new File('src/test/resources/SpockConfig.groovy').toURL()) /** * env environment variable * * Defaults to {@code LOCAL_END_POINT} */ private static final String envString = System.getProperties().getProperty("env", config.envHost); /** * Version REGX pattern */ private static final def VERSION_PATTERN = Pattern.compile(".", Pattern.LITERAL); /** * Max version length */ private static final def MAX_VERSION_LENGTH = 4; /** * Current API Version */ private static final def CURRENT_API_VERSION = getDeployedAPIVersion(); /** * {@inheritDoc} */ @Override void visitFeatureAnnotation(APIVersion annotation, FeatureInfo feature) { if(!isApiVersionGreaterThanMinApiVersion(annotation, feature.name)) { feature.setSkipped(true) } } /** * {@inheritDoc} */ @Override public void visitSpecAnnotation(APIVersion annotation, SpecInfo spec) { if(!isApiVersionGreaterThanMinApiVersion(annotation, spec.name)) { spec.setSkipped(true) } } /** * Get the current deployed API version * * Performs a HTTP request to the current deployed API version. Parses the returned data and get the {@code version} node data. * @return current deployed API version */ private static String getDeployedAPIVersion() { def apiVersion = null try { def client = new RESTClient(envString) def resp = client.get(path : config.versionServiceUri) apiVersion = resp.data.version LOG.info("Current deployed API version [" + apiVersion + "]"); } catch (ex) { APIVersionError apiVersionError = new APIVersionError("Error occurred attempting to get current deployed API version from %s", envString + config.versionServiceUri); apiVersionError.setStackTrace(ex.stackTrace); throw apiVersionError; } return apiVersion } * @param annotation * @param infoName * @return */ private boolean isApiVersionGreaterThanMinApiVersion(APIVersion annotation, String infoName) { def isApiVersionGreaterThanMinApiVersion = true def minApiVersionRequired = annotation.minimimApiVersion(); // normalise both version id's def apiVersionNormalised = normaliseVersion(CURRENT_API_VERSION); def minApiVersionRequiredNormalised = normaliseVersion(minApiVersionRequired); // compare version id's int cmp = apiVersionNormalised.compareTo(minApiVersionRequiredNormalised); // if the comparison is less than 0, min API version is greater than the deployed API version if(cmp < 0) { LOG.info("min api version [" + minApiVersionRequired + "] greater than api version [" + CURRENT_API_VERSION + "], skipping [" + infoName + "]") isApiVersionGreaterThanMinApiVersion = false } return isApiVersionGreaterThanMinApiVersion } * @param version * @return */ private String normaliseVersion(String version) { String[] split = VERSION_PATTERN.split(version); StringBuilder sb = new StringBuilder(); for (String s : split) { sb.append(String.format("%" + MAX_VERSION_LENGTH + 's', s)); } return sb.toString(); } } The @APIVersion extension defines the same environment config as the @EnvironmentEndPoint extension does so that the environment can be injected and used purely for accessing the API version endpoint without the need for @EnvironmentEndPoint. The RESTful API version endpoint is required to be setup and publicly available. The @APIVersion extension will call this service to get details about the version of RESTful API. The version response data should be as follows: Media API 1.51.1 The @APIVersion extension will look for the version data to define what the current deployed version of the RESTful API is. Once the version of the RESTful API is known, the extension then checks the minimum API version required. Example @APIVersion(minimimApiVersion="1.0.0.0") The extension then uses this value to compare against the response data version and if the required version is greater than that of the deployed RESTful API services, then the test is skipped. This extension annotation can be placed on Specification's or Feature's allowing whole Specs to have a minimum version and / or Features to have their own minimum version. This extension has made writing integration tests with Spock even more portable and allows for a 'build once' set of tests that can be run against any environment, with some small changes to allow getting the API version. The SpockConfig.groovy file Here is an example of the SpockConfig.groovy config file used to configure defaults for both @EnvironmentEndPoint and @APIVersion extensions. versionServiceUri="/public/serviceInformation" envHost="http://api.preview.movideo.com" The 'versionServiceUri' is required for @APIVersion extension as the URI for the RESTful API version The 'envHost' is required for both @APIVersion and @EnvironmentEndPoint extensions as the host of the RESTful API Go and start testing Hopefully these Spock extensions might help your Spock integration tests. The framework is really easy and fun to use to build essential tests for the whole test stack. Checkout my GitHub projects for the code for both extensions. Hope this post has been helpful and hopefully I'll post something sooner for my next post. References and really helpful links Spock Homepage Annotation Driven Extensions With Spock

November 14, 2012

by Christian Strzadala

· 39,920 Views · 1 Like

Composite Keys in Cassandra

Introduction A composite key consists of one or more primary key fields. Each field must be of data type supported by underlying data-store. In JPA (Java Persistence API), there are two ways of specifying composite keys: 1. Composite Primary Key: @Entity @IdClass(TimelineId.class) public class Timeline { @Id int userId; @Id long tweetId; //Other non-primary key fields } Class TimelineId { int userId; long tweetId; } 2. Embedded Primary Key: @Entity public class Timeline { @EmbeddedId TimelineId id; //Other non-primary key fields } @Embeddable Class TimelineId { int userId; long tweetId; } Above Timeline entity is inspired from famous twissandra example. Starting 1.1 release, Cassandra supports composite keys. Cassandra Composite Keys in Action Visit this page in order to understand Cassandra Schema in general. In this section I will give you a feel of how composite keys are stored in Cassandra. Let's start Cassandra 1.1.x server and run following commands from Cassandra/bin directory: CQL: ./cqlsh -3 localhost 9160 CREATE KEYSPACE twissandra with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor=1; use twissandra; CREATE TABLE timeline( user_id varchar, tweet_id varchar, tweet_device varchar, author varchar, body varchar, PRIMARY KEY(user_id,tweet_id,tweet_device)); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't1', 'web', 'Amresh', 'Here is my first tweet'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('xamry', 't2', 'sms', 'Saurabh', 'Howz life Xamry'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't1', 'iPad', 'Kuldeep', 'You der?'); INSERT INTO timeline (user_id, tweet_id, tweet_device, author, body) VALUES ('mevivs', 't2', 'mobile', 'Vivek', 'Yep, I suppose'); cqlsh:twissandra> select * from timeline; user_id | tweet_id | author | body ---------+----------+---------+------------------------ xamry | t1 | Amresh | Here is my first tweet xamry | t2 | Saurabh | Howz life Xamry mevivs | t1 | Kuldeep | You der? mevivs | t2 | Vivek | Yep, I suppose cqlsh:twissandra> SELECT * FROM timeline WHERE user_id='xamry'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+---------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet xamry | t2 | sms | Saurabh | Howz life Xamry cqlsh:twissandra> select * from timeline where tweet_id = 't1'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+---------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet mevivs | t1 | iPad | Kuldeep | You der? cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id='t1'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+--------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet cqlsh:twissandra> select * from timeline where user_id = 'xamry' and author='Amresh'; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_device='web'; Bad Request: PRIMARY KEY part tweet_device cannot be restricted (preceding part tweet_id is either not restricted or by a non-EQ relation) cqlsh:twissandra> select * from timeline where user_id = 'xamry' and tweet_id = 't1' and tweet_device='web'; user_id | tweet_id | tweet_device | author | body ---------+----------+--------------+--------+------------------------ xamry | t1 | web | Amresh | Here is my first tweet Cassandra-cli: impadmin@impetus-ubuntu:/usr/local/apache-cassandra-1.1.2/bin$ ./cassandra-cli -h localhost -p 9160 Connected to: "Test Cluster" on localhost/9160 Welcome to Cassandra CLI version 1.1.2 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use twissandra; Authenticated to keyspace: twissandra [default@twissandra] list timeline; Using default limit of 100 Using default column limit of 100 ------------------- RowKey: xamry => (column=t1:web:author, value=Amresh, timestamp=1343729388951000) => (column=t1:web:body, value=Here is my first tweet, timestamp=1343729388951001) => (column=t2:sms:author, value=Saurabh, timestamp=1343729388973000) => (column=t2:sms:body, value=Howz life Xamry, timestamp=1343729388973001) ------------------- RowKey: mevivs => (column=t1:iPad:author, value=Kuldeep, timestamp=1343729388991000) => (column=t1:iPad:body, value=You der?, timestamp=1343729388991001) => (column=t2:mobile:author, value=Vivek, timestamp=1343729389941000) => (column=t2:mobile:body, value=Yep, I suppose, timestamp=1343729389941001) Observations First part of composite key (user_id) is called "Partition Key", rest (tweet_id, tweet_device) are remaining keys. Cassandra stores columns differently when composite keys are used. Partition key becomes row key. Remaining keys are concatenated with each column name (":" as separator) to form column names. Column values remain unchanged. Remaining keys (other than partition keys) are ordered, and it's not allowed to search on any random column, you have to start with the first one and then you can move to the second one and so on. This is evident from "Bad Request" error.

November 14, 2012

by Amresh Singh

· 20,389 Views · 1 Like

Integration Testing with MongoDB & Spring Data

Integration Testing is an often overlooked area in enterprise development. This is primarily due to the associated complexities in setting up the necessary infrastructure for an integration test. For applications backed by databases, it’s fairly complicated and time-consuming to setup databases for integration tests, and also to clean those up once test is complete (ex. data files, schemas etc.), to ensure repeatability of tests. While there have been many tools (ex. DBUnit) and mechanisms (ex. rollback after test) to assist in this, the inherent complexity and issues have been there always. But if you are working with MongoDB, there’s a cool and easy way to do your unit tests, with almost the simplicity of writing a unit test with mocks. With ‘EmbedMongo’, we can easily setup an embedded MongoDB instance for testing, with in-built clean up support once tests are complete. In this article, we will walkthrough an example where EmbedMongo is used with JUnit for integration testing a Repository Implementation. Here’s the technology stack that we will be using. MongoDB 2.2.0 EmbedMongo 1.26 Spring Data – Mongo 1.0.3 Spring Framework 3.1 The Maven POM for the above setup looks like this. 4.0.0 com.yohanliyanage.blog.mongoit mongo-it 1.0 org.springframework.data spring-data-mongodb 1.0.3.RELEASE compile junit junit 4.10 test org.springframework spring-context 3.1.3.RELEASE compile de.flapdoodle.embed de.flapdoodle.embed.mongo 1.26 test Or if you prefer Gradle (by the way, Gradle is an awesome build tool which you should check out if you haven’t done so already). apply plugin: 'java' apply plugin: 'eclipse' sourceCompatibility = 1.6 group = "com.yohanliyanage.blog.mongoit" version = '1.0' ext.springVersion = '3.1.3.RELEASE' ext.junitVersion = '4.10' ext.springMongoVersion = '1.0.3.RELEASE' ext.embedMongoVersion = '1.26' repositories { mavenCentral() maven { url 'http://repo.springsource.org/release' } } dependencies { compile "org.springframework:spring-context:${springVersion}" compile "org.springframework.data:spring-data-mongodb:${springMongoVersion}" testCompile "junit:junit:${junitVersion}" testCompile "de.flapdoodle.embed:de.flapdoodle.embed.mongo:${embedMongoVersion}" } To begin with, here’s the document that we will be storing in Mongo. package com.yohanliyanage.blog.mongoit.model; import org.springframework.data.mongodb.core.index.Indexed; import org.springframework.data.mongodb.core.mapping.Document; /** * A Sample Document. * * @author Yohan Liyanage * */ @Document public class Sample { @Indexed private String key; private String value; public Sample(String key, String value) { super(); this.key = key; this.value = value; } public String getKey() { return key; } public void setKey(String key) { this.key = key; } public String getValue() { return value; } public void setValue(String value) { this.value = value; } } To assist with storing and managing this document, let’s write up a simple Repository implementation. The Repository Interface is as follows. package com.yohanliyanage.blog.mongoit.repository; import java.util.List; import com.yohanliyanage.blog.mongoit.model.Sample; /** * Sample Repository API. * * @author Yohan Liyanage * */ public interface SampleRepository { /** * Persists the given Sample. * @param sample */ void save(Sample sample); /** * Returns the list of samples with given key. * @param sample * @return */ List findByKey(String key); } And the implementation… package com.yohanliyanage.blog.mongoit.repository; import java.util.List; import static org.springframework.data.mongodb.core.query.Query.query; import static org.springframework.data.mongodb.core.query.Criteria.*; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.mongodb.core.MongoOperations; import org.springframework.stereotype.Repository; import com.yohanliyanage.blog.mongoit.model.Sample; /** * Sample Repository MongoDB Implementation. * * @author Yohan Liyanage * */ @Repository public class SampleRepositoryMongoImpl implements SampleRepository { @Autowired private MongoOperations mongoOps; /** * {@inheritDoc} */ public void save(Sample sample) { mongoOps.save(sample); } /** * {@inheritDoc} */ public List findByKey(String key) { return mongoOps.find(query(where("key").is(key)), Sample.class); } /** * Sets the MongoOps implementation. * * @param mongoOps the mongoOps to set */ public void setMongoOps(MongoOperations mongoOps) { this.mongoOps = mongoOps; } } To wire this up, we need a Spring Bean Configuration. Note that we do not need this for testing. But for the sake of completion, I have included this. The XML configuration is as follows. And now we are ready to write the Integration Test for our Repository Implementation using Embed Mongo. Ideally, the integration tests should be placed in a separate source directory, just like we place our unit tests (ex. src/test/java => src/integration-test/java). However, neither Maven nor Gradle supports this out of the box (yet – v1.2. For Gradle, there’s an on going discussion for this facility). Nevertheless, both Maven and Gradle are flexible, so you can configure the POM / build.gradle to handle this. However, to keep this discussion simple and focused, I will be placing the Integration Tests in the ‘src/test/java’, but I do not recommend this for a real application. Let’s start writing up the Integration Test. First, let’s begin with a simple JUnit based Test for the methods. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.fail; import org.junit.After; import org.junit.Before; import org.junit.Test; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private SampleRepositoryMongoImpl repoImpl; @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); } @After public void tearDown() throws Exception { } @Test public void testSave() { fail("Not yet implemented"); } @Test public void testFindByKey() { fail("Not yet implemented"); } } When this JUnit Test Case initializes, we need to fire up EmbedMongo to start an embedded Mongo server. Also, when the Test Case ends, we need to cleanup the DB. The below code snippet does this. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.fail; import java.io.IOException; import org.junit.*; import org.springframework.data.mongodb.core.MongoTemplate; import com.mongodb.Mongo; import com.yohanliyanage.blog.mongoit.model.Sample; import de.flapdoodle.embed.mongo.MongodExecutable; import de.flapdoodle.embed.mongo.MongodProcess; import de.flapdoodle.embed.mongo.MongodStarter; import de.flapdoodle.embed.mongo.config.MongodConfig; import de.flapdoodle.embed.mongo.config.RuntimeConfig; import de.flapdoodle.embed.mongo.distribution.Version; import de.flapdoodle.embed.process.extract.UserTempNaming; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private static final String LOCALHOST = "127.0.0.1"; private static final String DB_NAME = "itest"; private static final int MONGO_TEST_PORT = 27028; private SampleRepositoryMongoImpl repoImpl; private static MongodProcess mongoProcess; private static Mongo mongo; private MongoTemplate template; @BeforeClass public static void initializeDB() throws IOException { RuntimeConfig config = new RuntimeConfig(); config.setExecutableNaming(new UserTempNaming()); MongodStarter starter = MongodStarter.getInstance(config); MongodExecutable mongoExecutable = starter.prepare(new MongodConfig(Version.V2_2_0, MONGO_TEST_PORT, false)); mongoProcess = mongoExecutable.start(); mongo = new Mongo(LOCALHOST, MONGO_TEST_PORT); mongo.getDB(DB_NAME); } @AfterClass public static void shutdownDB() throws InterruptedException { mongo.close(); mongoProcess.stop(); } @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); template = new MongoTemplate(mongo, DB_NAME); repoImpl.setMongoOps(template); } @After public void tearDown() throws Exception { template.dropCollection(Sample.class); } @Test public void testSave() { fail("Not yet implemented"); } @Test public void testFindByKey() { fail("Not yet implemented"); } } The initializeDB() method is annotated with @BeforeClass to start this before test case beings. This method fires up an embedded MongoDB instance which is bound to the given port, and exposes a Mongo object which is set to use the given database. Internally, EmbedMongo creates the necessary data files in temporary directories. When this method executes for the first time, EmbedMongo will download the necessary Mongo implementation (denoted by Version.V2_2_0 in above code) if it does not exist already. This is a nice facility specially when it comes to Continuous Integration servers. You don’t have to manually setup Mongo in each of the CI servers. That’s one less external dependency for the tests. In the shutdownDB() method, which is annotated with @AfterClass, we stop the EmbedMongo process. This triggers the necessary cleanups in EmbedMongo to remove the temporary data files, restoring the state to where it was before Test Case was executed. We have now updated setUp() method to build a Spring MongoTemplate object which is backed by the Mongo instance exposed by EmbedMongo, and to setup our RepoImpl with that template. The tearDown() method is updated to drop the ‘Sample’ collection to ensure that each of our test methods start with a clean state. Now it’s just a matter of writing the actual test methods. Let’s start with the save method test. @Test public void testSave() { Sample sample = new Sample("TEST", "2"); repoImpl.save(sample); int samplesInCollection = template.findAll(Sample.class).size(); assertEquals("Only 1 Sample should exist collection, but there are " + samplesInCollection, 1, samplesInCollection); } We create a Sample object, pass it to repoImpl.save(), and assert to make sure that there’s only one Sample in the Sample collection. Simple, straight-forward stuff. And here’s the test method for findByKey method. @Test public void testFindByKey() { // Setup Test Data List samples = Arrays.asList( new Sample("TEST", "1"), new Sample("TEST", "25"), new Sample("TEST2", "66"), new Sample("TEST2", "99")); for (Sample sample : samples) { template.save(sample); } // Execute Test List matches = repoImpl.findByKey("TEST"); // Note: Since our test data (populateDummies) have only 2 // records with key "TEST", this should be 2 assertEquals("Expected only two samples with key TEST, but there are " + matches.size(), 2, matches.size()); } Initially, we setup the data by adding a set of Sample objects into the data store. It’s important that we directly use template.save() here, because repoImpl.save() is a method under-test. We are not testing that here, so we use the underlying “trusted” template.save() during data setup. This is a basic concept in Unit / Integration testing. Then we execute the method under test ‘findByKey’, and assert to ensure that only two Samples matched our query. Likewise, we can continue to write more tests for each of the repository methods, including negative tests. And here’s the final Integration Test file. package com.yohanliyanage.blog.mongoit.repository; import static org.junit.Assert.*; import java.io.IOException; import java.util.Arrays; import java.util.List; import org.junit.*; import org.springframework.data.mongodb.core.MongoTemplate; import com.mongodb.Mongo; import com.yohanliyanage.blog.mongoit.model.Sample; import de.flapdoodle.embed.mongo.MongodExecutable; import de.flapdoodle.embed.mongo.MongodProcess; import de.flapdoodle.embed.mongo.MongodStarter; import de.flapdoodle.embed.mongo.config.MongodConfig; import de.flapdoodle.embed.mongo.config.RuntimeConfig; import de.flapdoodle.embed.mongo.distribution.Version; import de.flapdoodle.embed.process.extract.UserTempNaming; /** * Integration Test for {@link SampleRepositoryMongoImpl}. * * @author Yohan Liyanage */ public class SampleRepositoryMongoImplIntegrationTest { private static final String LOCALHOST = "127.0.0.1"; private static final String DB_NAME = "itest"; private static final int MONGO_TEST_PORT = 27028; private SampleRepositoryMongoImpl repoImpl; private static MongodProcess mongoProcess; private static Mongo mongo; private MongoTemplate template; @BeforeClass public static void initializeDB() throws IOException { RuntimeConfig config = new RuntimeConfig(); config.setExecutableNaming(new UserTempNaming()); MongodStarter starter = MongodStarter.getInstance(config); MongodExecutable mongoExecutable = starter.prepare(new MongodConfig(Version.V2_2_0, MONGO_TEST_PORT, false)); mongoProcess = mongoExecutable.start(); mongo = new Mongo(LOCALHOST, MONGO_TEST_PORT); mongo.getDB(DB_NAME); } @AfterClass public static void shutdownDB() throws InterruptedException { mongo.close(); mongoProcess.stop(); } @Before public void setUp() throws Exception { repoImpl = new SampleRepositoryMongoImpl(); template = new MongoTemplate(mongo, DB_NAME); repoImpl.setMongoOps(template); } @After public void tearDown() throws Exception { template.dropCollection(Sample.class); } @Test public void testSave() { Sample sample = new Sample("TEST", "2"); repoImpl.save(sample); int samplesInCollection = template.findAll(Sample.class).size(); assertEquals("Only 1 Sample should exist in collection, but there are " + samplesInCollection, 1, samplesInCollection); } @Test public void testFindByKey() { // Setup Test Data List samples = Arrays.asList( new Sample("TEST", "1"), new Sample("TEST", "25"), new Sample("TEST2", "66"), new Sample("TEST2", "99")); for (Sample sample : samples) { template.save(sample); } // Execute Test List matches = repoImpl.findByKey("TEST"); // Note: Since our test data (populateDummies) have only 2 // records with key "TEST", this should be 2 assertEquals("Expected only two samples with key TEST, but there are " + matches.size(), 2, matches.size()); } } On a side note, one of the key concerns with Integration Tests is the execution time. We all want to keep our test execution times as low as possible, ideally a couple of seconds to make sure that we can run all the tests during CI, with minimal build and verification times. However, since Integration Tests rely on underlying infrastructure, usually Integration Tests take time to run. But with EmbedMongo, this is not the case. In my machine, above test suite runs in 1.8 seconds, and each test method takes only .166 seconds max. See the screenshot below. I have uploaded the code for above project into GitHub. You can download / clone it from here: https://github.com/yohanliyanage/blog-mongo-integration-tests. For more information regarding EmbedMongo, refer to their site at GitHub https://github.com/flapdoodle-oss/embedmongo.flapdoodle.de.

November 11, 2012

by Yohan Liyanage

· 26,525 Views

Applying a Namespace During JAXB Unmarshal

For some an XML schema is a strict set of rules for how the XML document must be structured. But for others it is a general guideline to indicate what the XML should look like. This means that sometimes people want to accept input that doesn't conform to the XML schema for some reason. In this example I will demonstrate how this can be done by leveraging a SAX XMLFilter. Java Model Below is the Java model that will be used for this example. Customer package blog.namespace.sax; import javax.xml.bind.annotation.XmlRootElement; @XmlRootElement public class Customer { private String name; public String getName() { return name; } public void setName(String name) { this.name = name; } } package-info We will use the package level @XmlSchema annotation to specify the namespace qualification for our model. @XmlSchema( namespace="http://www.example.com/customer", elementFormDefault=XmlNsForm.QUALIFIED) package blog.namespace.sax; import javax.xml.bind.annotation.*; XML Input (input.xml) Even though our metadata specified that all the elements should be qualified with a namespace (http://www.example.com/customer) our input document is not namespace qualified. An XMLFilter will be used to add the namespace during the unmarshal operation. Jane Doe XMLFilter (NamespaceFilter) The easiest way to create an XMLFilter is to extend XMLFilterImpl. For our use case we will override the startElement and endElement methods. In each of these methods we will call the corresponding super method passing in the default namespace as the URI parameter. package blog.namespace.sax; import org.xml.sax.*; import org.xml.sax.helpers.XMLFilterImpl; public class NamespaceFilter extends XMLFilterImpl { private static final String NAMESPACE = "http://www.example.com/customer"; @Override public void endElement(String uri, String localName, String qName) throws SAXException { super.endElement(NAMESPACE, localName, qName); } @Override public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { super.startElement(NAMESPACE, localName, qName, atts); } } Demo In the demo code below we will do a SAX parse of the XML document. The XMLReader will be wrapped in our XMLFilter. We will leverage JAXB's UnmarshallerHandler as the ContentHandler. Once the parse has been done we can ask the UnmarshallerHandler for the resulting Customer object. package blog.namespace.sax; import javax.xml.bind.*; import javax.xml.parsers.*; import org.xml.sax.*; public class Demo { public static void main(String[] args) throws Exception { // Create the JAXBContext JAXBContext jc = JAXBContext.newInstance(Customer.class); // Create the XMLFilter XMLFilter filter = new NamespaceFilter(); // Set the parent XMLReader on the XMLFilter SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser sp = spf.newSAXParser(); XMLReader xr = sp.getXMLReader(); filter.setParent(xr); // Set UnmarshallerHandler as ContentHandler on XMLFilter Unmarshaller unmarshaller = jc.createUnmarshaller(); UnmarshallerHandler unmarshallerHandler = unmarshaller .getUnmarshallerHandler(); filter.setContentHandler(unmarshallerHandler); // Parse the XML InputSource xml = new InputSource("src/blog/namespace/sax/input.xml"); filter.parse(xml); Customer customer = (Customer) unmarshallerHandler.getResult(); // Marshal the Customer object back to XML Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); marshaller.marshal(customer, System.out); } } Output Below is the output from running the demo code. Note how the output contains the namespace qualification based on the metadata. Jane Doe Further Reading If you enjoyed this post then you may also be interested in: JAXB & Namespaces Preventing Entity Expansion Attacks in JAXB

November 10, 2012

by Blaise Doughan

· 64,713 Views · 1 Like

20 Database Design Best Practices

Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID ...). Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names. Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you'll write “Student Course”. StudentCourse is much better). Don’t use unnecessary prefixes or suffixes for table names (i.e. use School instead of TblSchool, SchoolTable etc.). Keep passwords as encrypted for security. Decrypt them in application when required. Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing ...). Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems. Use bit fields for boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”. Provide authentication for database access. Don’t give admin role to each user. Avoid “select *” queries until it is really needed. Use "select [required_columns_list]" for better performance. Use an ORM (object relational mapping) framework (i.e. hibernate, iBatis ...) if application code is big enough. Performance issues of ORM frameworks can be handled by detailed configuration parameters. Partition big and unused/rarely used tables/table parts to different physical storages for better query performance. For big, sensitive and mission critic database systems, use disaster recovery and security services like failover clustering, auto backups, replication etc. Use constraints (foreign key, check, not null ...) for data integrity. Don’t give whole control to application code. Lack of database documentation is evil. Document your database design with ER schemas and instructions. Also write comment lines for your triggers, stored procedures and other scripts. Use indexes for frequently used queries on big tables. Analyser tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better. Database server and the web server must be placed in different machines. This will provide more security (attackers can’t access data directly) and server CPU and memory performance will be better because of reduced request number and process usage. Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables. Normalization must be used as required, to optimize the performance. Under-normalization will cause excessive repetition of data, over-normalization will cause excessive joins across too many tables. Both of them will get worse performance. Spend time for database modeling and design as much as required. Otherwise saved(!) design time will cause (saved(!) design time) * 10/100/1000 maintenance and re-design time.

November 4, 2012

by Cagdas Basaraner

· 257,291 Views · 13 Likes