Databases Resources

The Latest Databases Topics

A Puppet Automation + MySQL Tutorial: Wordpress Install in 7 Short Steps

[This article was written by Koby Nachmany.] If you are familiar with configuration management (aka CM) and automation, you probably know a thing or two about Puppet, and the amazing and rich collection of modules it offers. Puppet Forge contains a wealth of third party modules that enable us to do some pretty nifty stuff with almost no effort. Puppet helps deal with the messy parts of CM, like installing binaries and running installation scripts that are tedious to do manually. Tools such as Puppet were originally created for IT operations people, that are for the most part infrastructure-centric, and are best suited for setup and maintenance of hosts in a physical data center. Dealing with applications and certainly managing applications on an elastic virtualized or even cloudified environment, brings a set of new challenges despite the agility and other benefits it provides. Now imagine we can have this goodness coupled with an intelligent orchestration framework for an entire deployment? In this blog post I'd like to demonstrate how a cloud application orchestrator can complement already existing automation processes powered by configuration management tools, in this case we will demonstrate with Puppet. I will use the nodecellar application and the popular WordPress content management framework as examples. This will hopefully provide a good introduction to Cloudify blueprints. Overview So we've seen how Cloudify 3 allows us to easily orchestrate the "nodecellar" application Read about it Cloudify blueprints here. With the "nodecellar" example, Cloudify deploys a complex application using workflows that map deployment lifecycle events to bash scripts using Cloudify's bash runner plugin. Cloudify's Puppet integration now makes this pretty easy. Cloudify 3.0 - Taking Puppet to the Next Level of Orchestration. Check it out. Go The synergy between Cloudify and Puppet not only allows you to enjoy the benefits of your Puppet environment, but it also amplifies its usability by introducing unique advantages that will answer the following common challenges involved with configuration management tools: Agent Installation: Provision your service VMs, install a Puppet agent (if you like) and wires them up with the Puppet Master. Or, if you choose to run standalone, you can install the agent with the appropriate manifests needed for that service, as well. Order of Dependencies: Define the dependencies between application stacks, services and infrastructure resources. Which will then be launched based on that order. Remote Execution and Updates: Other than the basic install/uninstall, Cloudify enables customized application workflows that allow you to execute tools like remote shell scripts on a group of instances that belong to a particular service, or to a specific instance in a group. This feature is useful to run maintenance operations, such as snapshots in the case of a database, or code pushes in a continuous deployment model. In addition, you can run puppet apply whenever you feel it's right for your service. Post Deployment: Once your application is up, Cloudify will be able to glue your monitoring tool of choice, or you can choose to use the built-in one. A robust policy engine, enables auto-healing and even auto-scaling according to your service's required SLA. I'm now going to take a deep dive on my experience with a WordPress example that I feel is a very good representation of how Puppet and Cloudify work in sync. Let's say we want to deploy the popular WordPress application stack on two VMs . Something as follows: The flow is quite simple: -server 3.5.1 with the basic following modules installed: |-- hunner-wordpress (v0.6.0) |-- puppetlabs-apache (v1.0.1) - with php mods enabled |-- puppetlabs-mysql (v2.1.0) Your site.pp file should resemble something like this: node /^apache_web.*/ { include apache class { 'wordpress': create_db => false, create_db_user => false, } } node /^mysql.*/ { class { '::mysql::server': root_password => 'password', override_options => { 'mysqld' => { 'bind_address' => '0.0.0.0' } } } include mysql::client include wordpress } As we can see, we have an Apache PHP application that will likely require a database connection string (IP, port, user and password). This is where Cloudify facilitates the "gluing" of all the pieces together, by allowing us to inject dynamic/static custom facts to the dependent node (Apache server). Cloudify supports both standalone agents and PuppetMaster environments. Step 2: Tweaking the Original WordPress Module. Some minor adaptations to the wordpress init class of the WordPress module will allow us to embed these facts during Puppet agent invocation. Below is a code snippet (With defaults truncated): class wordpress ( $db_host_ip = $cloudify_related_host_ip, $db_user, = $cloudify_properties_db_user, $db_password = $cloudify_properties_db_pw, . . ) And some tweaking to the templates/wp-config.php.erb: /** MySQL hostname */ define('DB_HOST', ''); Let's add some tags for finer control of manifest execution: The MySQL node will not require the application part to run on it, so I've excluded it using a Puppet "tag" (read more about Puppet tags). Cloudify, of course, supports this and will provide the appropriate tags during agent invocation. -> class { 'wordpress::app': tag => ['postconfigure'], install_dir => $install_dir, install_url => $install_url, version => $version, db_name => $db_name, . .} Step 3: Creating the Blueprint In a similar way to the "nodecellar" blueprint, first lets create a folder with the name of "wp_puppet" and create a blueprint.yaml file within it. This file will then serve as the blueprint file. Now let's declare the name of this blueprint. blueprint: name: wp_puppet nodes: Now we can start creating the topology. Step 4: Creating VM Nodes Since, in this case I use the OpenStack provider to create the nodes, let's import the "OpenStack types" plugin. imports: - http://www.getcloudify.org/spec/openstack-plugin/1.0/plugin.yaml Since the VMs are the same, I declared a generic template for a VM host: vm_host: derived_from: cloudify.openstack.server properties: - install_agent: true - worker_config: user: ubuntu port: 22 # example for ssh key file (see `key_name` below) # this file matches the agent key configured during the bootstrap key: ~/.ssh/agent.key # Uncomment and update `management_network_name` when working a n neutron enabled openstack - management_network_name: cfy-mng-network - server: image: 8c096c29-a666-4b82-99c4-c77dc70cfb40 flavor: 102 key_name: cfy-agnt-kp security_groups: ['cfy-agent-default', 'wp_security_group'] # This is how we inject the puppet server's ip userdata: | #!/bin/bash -ex grep -q puppet /etc/hosts || echo "x.x.x.x puppet" | sudo -A tee -a /etc/hosts Create the MySQL and Apache VMs: - name: mysql_db_vm type: vm_host instances: deploy: 1 - name: apache_web_vm type: vm_host instances: deploy: 1 Step 5: Declaring Apache and MySQL Servers Since we are using the Puppet plugin to create those servers, first we have to import it: plugins: puppet_plugin: derived_from: cloudify.plugins.agent_plugin properties: url: https://github.com/cloudify-cosmo/cloudify-puppet-plugin/archive/nightly.zip The plugin defines server types as follows: middleware_server, app_server, db_server, web_server, message_bus_server, app_module. They are virtually the same, but serve the purpose of enabling better readability for the user and GUI visualization A Puppet server type is derived_from: cloudify.types.server type, but includes some puppet-specific properties and lifecycle events. For documentation see: Puppet Types So we now will go ahead and declare the server types: cloudify.types.puppet.web_server: derived_from: cloudify.types.web_server properties: # All Puppet related configuration goes inside # the "puppet_config" property. - puppet_config interfaces: cloudify.interfaces.lifecycle: # Specifically "start" operation. Otherwise tags must be # provided. - start: puppet_plugin.operations.operation cloudify.types.puppet.app_module: derived_from: cloudify.types.app_module properties: - puppet_config interfaces: cloudify.interfaces.lifecycle: - configure: puppet_plugin.operations.operation cloudify.types.puppet.db_server: derived_from: cloudify.types.db_server properties: - puppet_config interfaces: cloudify.interfaces.lifecycle: - start: puppet_plugin.operations.operation Step 6: Instantiating the Apache and MySQL nodes: Here we provide the Puppet configuration and tags and define the relationships between the nodes. Cloudify's agent will use those relationships in order to decide the appropriate facts to inject. - name: apache_web_server type: cloudify.types.puppet.web_server properties: port: 8080 puppet_config: server: puppet environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: apache_web_vm - name: wordpress_app type: cloudify.types.puppet.app_module properties: db_user: wordpress db_pass: passwd puppet_config: server: puppet tags: ['postconfigure'] environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: apache_web_server - type: wp_connected_to_mysql target: mysql_db_server - name: mysql_db_server type: cloudify.types.puppet.db_server properties: db_user: wordpress db_pass: passwd puppet_config: server: puppet environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: mysql_db_vm Step 7: Upload the Blueprint and Create the Deployment (via CLI or GUI) Then execute your deployment (via CLI or GUI). ubuntu@koby-n-cfy3-cli:~/cosmo_cli$ cfy blueprints upload -b wp9 wordpress/blueprint.yaml ubuntu@koby-n-cfy3-cli:~/cosmo_cli$ cfy deployments create -b wp9 -d WordPress_Deployment_1 Step 8: Take a Quick Coffee Break. Step 9: Enjoy your Orchestrated WordPress Stack!

August 21, 2014

by Sharone Zitzman

· 9,174 Views

Lambda Architecture Principles

"Lambda Architecture" (introduced by Nathan Marz) has gained a lot of traction recently. Fundamentally, it is a set of design patterns of dealing with Batch and Real time data processing workflow that fuel many organization's business operations. Although I don't realize any novice ideas has been introduced, it is the first time these principles are being outlined in such a clear and unambiguous manner. In this post, I'd like to summarize the key principles of the Lambda architecture, focus more in the underlying design principles and less in the choice of implementation technologies, which I may have a different favors from Nathan. One important distinction of Lambda architecture is that it has a clear separation between the batch processing pipeline (ie: Batch Layer) and the real-time processing pipeline (ie: Real-time Layer). Such separation provides a means to localize and isolate complexity for handling data update. To handle real-time query, Lambda architecture provide a mechanism (ie: Serving Layer) to merge/combine data from the Batch Layer and Real-time Layer and return the latest information to the user. Data Source Entry At the very beginning, data flows in Lambda architecture as follows ... Transaction data starts streaming in from OLTP system during business operations. Transaction data ingestion can be materialized in the form of records in OLTP systems, or text lines in App log files, or incoming API calls, or an event queue (e.g. Kafka) This transaction data stream is replicated and fed into both the Batch Layer and Realtime Layer Here is an overall architecture diagram for Lambda. Batch Layer For storing the ground truth, "Master dataset" is the most fundamental DB that captures all basic event happens. It stores data in the most "raw" form (and hence the finest granularity) that can be used to compute any perspective at any given point in time. As long as we can maintain the correctness of master dataset, every perspective of data view derived from it will be automatically correct. Given maintaining the correctness of master dataset is crucial, to avoid the complexity of maintenance, master dataset is "immutable". Specifically data can only be appended while update and delete are disallowed. By disallowing changes of existing data, it avoids the complexity of handling the conflicting concurrent update completely. Here is a conceptual schema of how the master dataset can be structured. The center green table represents the old, traditional-way of storing data in RDBMS. The surrounding blue tables illustrates the schema of how the master dataset can be structured, with some key highlights Data are partitioned by columns and stored in different tables. Columns that are closely related can be stored in the same table NULL values are not stored Each data record is associated with a time stamp since then the record is valid Notice that every piece of data is tagged with a time stamp at which the data is changed (or more precisely, a change record that represents the data modification is created). The latest state of an object can be retrieved by extracting the version of the object with the largest time stamp. Although master dataset stores data in the finest granularity and therefore can be used to compute result of any query, it usually take a long time to perform such computation if the processing starts with such raw form. To speed up the query processing, various data at intermediate form (called Batch View) that aligns closer to the query will be generated in a periodic manner. These batch views (instead of the original master dataset) will be used to serve the real-time query processing. To generate these batch views, the "Batch Layer" use a massively parallel, brute force approach to process the original master dataset. Notice that since data in master data set is timestamped, the data candidate can be identified simply from those that has the time stamp later than the last round of batch processing. Although less efficient, Lambda architecture advocates that at each round of batch view generation, the previous batch view should just be simply discarded and the new batch view is computed from master dataset. This simple-mind, compute-from-scratch approach has some good properties in stopping error propagation (since error cannot be accumulated), but the processing may not be optimized and may take a longer time to finish. This can increase the "staleness" of the batch view. Real time Layer As discussed above, generating the batch view requires scanning a large volume of master dataset that takes few hours. The batch view will therefore be stale for at least the processing time duration (ie: between the start and end of the Batch processing). But the maximum staleness can be up to the time period between the end of this Batch processing and the end of next Batch processing (ie: the batch cycle). The following diagram illustrate this staleness. Even the batch view is stale period, business operates as usual and transaction data will be streamed in continuously. To answer user's query with the latest, up-to-date information. The business transaction records need to be captured and merged into the real-time view. This is the responsibility of the Real-time Layer. To reduce the latency of latest information availability close to zero, the merge mechanism has to be done in an incremental manner such that no batching delaying the processing will be introduced. This requires the real time view update to be very different from the batch view update, which can tolerate a high latency. The end goal is that the latest information that is not captured in the Batch view will be made available in the Realtime view. The logic of doing the incremental merge on Realtime view is application specific. As a common use case, lets say we want to compute a set of summary statistics (e.g. mean, count, max, min, sum, standard deviation, percentile) of the transaction data since the last batch view update. To compute the sum, we can simply add the new transaction data to the existing sum and then write the new sum back to the real-time view. To compute the mean, we can multiply the existing count with existing mean, adding the transaction sum and then divide by the existing count plus one. To implement this logic, we need to READ data from the Realtime view, perform the merge and WRITE the data back to the Realtime view. This requires the Realtime serving DB (which host the Realtime view) to support both random READ and WRITE. Fortunately, since the realtime view only need to store the stale data up to one batch cycle, its scale is limited to some degree. Once the batch view update is completed, the real-time layer will discard the data from the real time serving DB that has time stamp earlier than the batch processing. This not only limit the data volume of Realtime serving DB, but also allows any data inconsistency (of the realtime view) to be clean up eventually. This drastically reduce the requirement of sophisticated multi-user, large scale DB. Many DB system support multiple user random read/write and can be used for this purpose. Serving Layer The serving layer is responsible to host the batch view (in the batch serving database) as well as hosting the real-time view (in the real-time serving database). Due to very different accessing pattern, the batch serving DB has a quite different characteristic from the real-time serving DB. As mentioned in above, while required to support efficient random read at large scale data volume, the batch serving DB doesn't need to support random write because data will only be bulk-loaded into the batch serving DB. On the other hand, the real-time serving DB will be incrementally (and continuously) updated by the real-time layer, and therefore need to support both random read and random write. To maintain the batch serving DB updated, the serving layer need to periodically check the batch layer progression to determine whether a later round of batch view generation is finished. If so, bulk load the batch view into the batch serving DB. After completing the bulk load, the batch serving DB has contained the latest version of batch view and some data in the real-time view is expired and therefore can be deleted. The serving layer will orchestrate these processes. This purge action is especially important to keep the size of the real-time serving DB small and hence can limit the complexity for handling real-time, concurrent read/write. To process a real-time query, the serving layer disseminates the incoming query into 2 different sub-queries and forward them to both the Batch serving DB and Realtime serving DB, apply application-specific logic to combine/merge their corresponding result and form a single response to the query. Since the data in the real-time view and batch view are different from a timestamp perspective, the combine/merge is typically done by concatenate the results together. In case of any conflict (same time stamp), the one from Batch view will overwrite the one from Realtime view. Final Thoughts By separating different responsibility into different layers, the Lambda architecture can leverage different optimization techniques specifically designed for different constraints. For example, the Batch Layer focuses in large scale data processing using simple, start-from-scratch approach and not worrying about the processing latency. On the other hand, the Real-time Layer covers where the Batch Layer left off and focus in low-latency merging of the latest information and no need to worry about large scale. Finally the Serving Layer is responsible to stitch together the Batch View and Realtime View to provide the final complete picture. The clear demarcation of responsibility also enable different technology stacks to be utilized at each layer and hence can tailor more closely to the organization's specific business need. Nevertheless, using a very different mechanism to update the Batch view (ie: start-from-scratch) and Realtime view (ie: incremental merge) requires two different algorithm implementation and code base to handle the same type of data. This can increase the code maintenance effort and can be considered to be the price to pay for bridging the fundamental gap between the "scalability" and "low latency" need. Nathan's Lambda architecture also introduce a set of candidate technologies which he has developed and used in his past projects (e.g. Hadoop for storing Master dataset, Hadoop for generating Batch view, ElephantDB for batch serving DB, Cassandra for realtime serving DB, STORM for generating Realtime view). The beauty of Lambda architecture is that the choice of technologies is completely decoupled so I intentionally do not describe any of their details in this post. On the other hand, I have my own favorite which is different and that will be covered in my future posts.

August 20, 2014

by Ricky Ho

· 12,228 Views

JPA Tutorial: Setting Up JPA in a Java SE Environment

There are many reasons to learn an ORM tool like JPA, but it's not a magic bullet that will solve all your problems.

August 18, 2014

by MD Sayem Ahmed

· 145,983 Views · 4 Likes

The Dark Side of Hibernate Auto Flush

Introduction Now that I described the the basics of JPA and Hibernate flush strategies, I can continue unraveling the surprising behavior of Hibernate’s AUTO flush mode. Not all queries trigger a Session flush Many would assume that Hibernate always flushes the Session before any executing query. While this might have been a more intuitive approach, and probably closer to the JPA’s AUTO FlushModeType, Hibernate tries to optimize that. If the current executed query is not going to hit the pending SQL INSERT/UPDATE/DELETE statements then the flush is not strictly required. As stated in the reference documentation, the AUTO flush strategy may sometimessynchronize the current persistence context prior to a query execution. It would have been more intuitive if the framework authors had chosen to name it FlushMode.SOMETIMES. JPQL/HQL and SQL Like many other ORM solutions, Hibernate offers a limited Entity querying language (JPQL/HQL) that’s very much based on SQL-92 syntax. The entity query language is translated to SQL by the current database dialect and so it must offer the same functionality across different database products. Since most database systems are SQL-92 complaint, the Entity Query Language is an abstraction of the most common database querying syntax. While you can use the Entity Query Language in many use cases (selecting Entities and even projections), there are times when its limited capabilities are no match for an advanced querying request. Whenever we want to make use of some specific querying techniques, such as: Window functions Pivot table Common Table Expressions we have no other option, but to run native SQL queries. Hibernate is a persistence framework. Hibernate was never meant to replace SQL. If some query is better expressed in a native query, then it’s not worth sacrificing application performance on the altar of database portability. AUTO flush and HQL/JPQL First we are going to test how the AUTO flush mode behaves when an HQL query is about to be executed. For this we define the following unrelated entities: The test will execute the following actions: A Person is going to be persisted. Selecting User(s) should not trigger a the flush. Querying for Person, the AUTO flush should trigger the entity state transition synchronization (A person INSERT should be executed prior to executing the select query). 1 2 3 4 Product product = newProduct(); session.persist(product); assertEquals(0L, session.createQuery("select count(id) from User").uniqueResult()); assertEquals(product.getId(), session.createQuery("select p.id from Product p").uniqueResult()); Giving the following SQL output: 1 2 3 4 [main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: f76f61e2-f3e3-4ea4-8f44-82e9804ceed0, using strategy: org.hibernate.id.UUIDGenerator Query:{[selectcount(user0_.id) as col_0_0_ from user user0_][]} Query:{[insert into product (color, id) values (?, ?)][12,f76f61e2-f3e3-4ea4-8f44-82e9804ceed0]} Query:{[selectproduct0_.idas col_0_0_ from product product0_][]} As you can see, the User select hasn’t triggered the Session flush. This is because Hibernate inspects the current query space against the pending table statements. If the current executing query doesn’t overlap with the unflushed table statements, the a flush can be safely ignored. HQL can detect the Product flush even for: Sub-selects 1 2 3 4 5 session.persist(product); assertEquals(0L, session.createQuery( "select count(*) "+ "from User u "+ "where u.favoriteColor in (select distinct(p.color) from Product p)").uniqueResult()); Resulting in a proper flush call: 1 2 Query:{[insert into product (color, id) values (?, ?)][Blue,2d9d1b4f-eaee-45f1-a480-120eb66da9e8]} Query:{[selectcount(*) as col_0_0_ from user user0_ where user0_.favoriteColor in(selectdistinct product1_.color from product product1_)][]} Or theta-style joins 1 2 3 4 5 session.persist(product); assertEquals(0L, session.createQuery( "select count(*) "+ "from User u, Product p "+ "where u.favoriteColor = p.color").uniqueResult()); Triggering the expected flush : 1 2 Query:{[insert into product (color, id) values (?, ?)][Blue,4af0b843-da3f-4b38-aa42-1e590db186a9]} Query:{[selectcount(*) as col_0_0_ from user user0_ cross joinproduct product1_ where user0_.favoriteColor=product1_.color][]} The reason why it works is because Entity Queries are parsed and translated to SQL queries. Hibernate cannot reference a non existing table, therefore it always knows the database tables an HQL/JPQL query will hit. So Hibernate is only aware of those tables we explicitly referenced in our HQL query. If the current pending DML statements imply database triggers or database level cascading, Hibernate won’t be aware of those. So even for HQL, the AUTO flush mode can cause consistency issues. If you enjoy reading this article, you might want to subscribe to my newsletter and get a discount for my book as well. AUTO flush and native SQL queries When it comes to native SQL queries, things are getting much more complicated. Hibernate cannot parse SQL queries, because it only supports a limited database query syntax. Many database systems offer proprietary features that are beyond Hibernate Entity Query capabilities. Querying the Person table, with a native SQL query is not going to trigger the flush, causing an inconsistency issue: 1 2 3 Product product = newProduct(); session.persist(product); assertNull(session.createSQLQuery("select id from product").uniqueResult()); 1 2 3 DEBUG [main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: 718b84d8-9270-48f3-86ff-0b8da7f9af7c, using strategy: org.hibernate.id.UUIDGenerator Query:{[selectidfrom product][]} Query:{[insert into product (color, id) values (?, ?)][12,718b84d8-9270-48f3-86ff-0b8da7f9af7c]} The newly persisted Product was only inserted during transaction commit, because the native SQL query didn’t triggered the flush. This is major consistency problem, one that’s hard to debug or even foreseen by many developers. That’s one more reason for always inspecting auto-generated SQL statements. The same behaviour is observed even for named native queries: 1 2 3 4 @NamedNativeQueries( @NamedNativeQuery(name = "product_ids", query = "select id from product") ) assertNull(session.getNamedQuery("product_ids").uniqueResult()); So even if the SQL query is pre-loaded, Hibernate won’t extract the associated query space for matching it against the pending DML statements. Overruling the current flush strategy Even if the current Session defines a default flush strategy, you can always override it on a query basis. Query flush mode The ALWAYS mode is going to flush the persistence context before any query execution (HQL or SQL). This time, Hibernate applies no optimization and all pending entity state transitions are going to be synchronized with the current database transaction. 1 assertEquals(product.getId(), session.createSQLQuery("select id from product").setFlushMode(FlushMode.ALWAYS).uniqueResult()); Instructing Hibernate which tables should be syncronized You could also add a synchronization rule on your current executing SQL query. Hibernate will then know what database tables need to be syncronzied prior to executing the query. This is also useful for second level caching as well. 1 assertEquals(product.getId(), session.createSQLQuery("select id from product").addSynchronizedEntityClass(Product.class).uniqueResult()); If you enjoyed this article, I bet you are going to love my book as well. Conclusion The AUTO flush mode is tricky and fixing consistency issues on a query basis is a maintainer’s nightmare. If you decide to add a database trigger, you’ll have to check all Hibernate queries to make sure they won’t end up running against stale data. My suggestion is to use the ALWAYS flush mode, even if Hibernate authors warned us that: this strategy is almost always unnecessary and inefficient. Inconsistency is much more of an issue that some occasional premature flushes. While mixing DML operations and queries may cause unnecessary flushing this situation is not that difficult to mitigate. During a session transaction, it’s best to execute queries at the beginning (when no pending entity state transitions are to be synchronized) and towards the end of the transaction (when the current persistence context is going to be flushed anyway). The entity state transition operations should be pushed towards the end of the transaction, trying to avoid interleaving them with query operations (therefore preventing a premature flush trigger).

August 15, 2014

by Vlad Mihalcea

· 36,103 Views · 3 Likes

DB Queries in spring's application-context.xml.

While solving one of my assignment, I realize the age long practice of writing db queries in java class file is really a big pain. They are difficult to read and understand. Further if you want to modify it again a lot of task. Thus, to get rid of the pain, I put the db queries in spring’s context xml. Later I realize using such injections using xml, its not only easier to maintain and understand also we can have module wise segregation of db queries in different or in different "xyzDAO-queries.xml". Further its easier to add more customers to the existing application. Please see the code snippet area for the context.xml called "userDAO-queries.xml". I've created Map with an id. In the Map I put the key as and values as .db queries goes here.... , which contains the query. The java classes are DBQueriesImpl.java and UserDAOImpl.java for fetching the exact query from the "xyzDAO-queries.xml". Please see the comments in the java class. Context XML (userDAO-queries.xml) part : SELECT user_id, user_name, user_password, user_fname, user_mname, user_lname, user_create_time FROM ty_users SELECT user_id, user_name, user_password FROM ty_users WHERE user_name = ? and user_password = ? SELECT user_id, user_name, user_password FROM ty_users WHERE user_id=? INSERT INTO ty_users (user_name, user_password) VALUES (?, ?) Java Code Part: UserDAOImpl.java public class UserDAOImpl implements UserDAO{ private static final Logger logger = LoggerFactory.getLogger(UserDAOImpl.class);// logger DBQueries dbQuery = new DBQueriesImpl("userDAO-queries.xml");// DBConnection dbConn = ConnectionFactory.getInstance().getConnectionMySQL(); @Override public List authenitcUserDetails(String userName, String userPassword) { Map queryMap = null; String query ; List> mappedUserList = null; List listUser = null; try{ listUser = new ArrayList(); queryMap = dbQuery.getQueryBeanFromFactory("VALIDATE_USER_CREDENTIAL"); //passing the query-name or Map Id. query = dbQuery.getQuery(queryMap);//passing the query-map to get the query. //rest of the code is common fetching code. mappedUserList = dbConn.dbQueryRead(query, new Object[]{userName,userPassword}); if(mappedUserList.size()>0){ listUser = new QueryResultSetMapperImpl().mapRersultSetToObject(mappedUserList, User.class); } }catch(Exception ex){ logger.error("fetchAllUser Error:: ", ex); } return listUser; } } DBQueriesImpl.java public class DBQueriesImpl implements DBQueries { private static final Logger logger = LoggerFactory.getLogger(DBQueriesImpl.class); private ApplicationContext queriesCtx ; /** * @Desc: Loading the queryContext xml * @param queryContext */ public DBQueriesImpl(String queryContext) { queriesCtx = new ClassPathXmlApplicationContext(queryContext); } /** * @Desc: Reading from the loaded application context and getting the query-map, . */ @Override public Map getQueryBeanFromFactory(String beanId){ Map queryMap = null; if (queriesCtx != null && beanId != null) { queryMap = (Map)queriesCtx.getBean(beanId); } return queryMap; } /** * @Desc: Getting the exact query from the query-map, . */ @Override public String getQuery(Map queryMap) { String query=null; try{ if(queryMap.containsKey(QueryConstants.QUERY_NODE)){ query = (String) queryMap.get(QueryConstants.QUERY_NODE); queryMap.remove(QueryConstants.QUERY_NODE); }else{ throw new NoSuchFieldError(); } }catch(Exception excp){ excp.printStackTrace(); } return query; } } With this kind of setting I can also solve few other problems. In one of my assignment I used this trick. In the application, various clients have different set of data requirements thus, different set of sql-queries. Now instead of writing java classes to met client requirement we just write the set of spring-context.xmls having sql and used those spring-context at runtime and fetch data.

August 13, 2014

by Shantanu Sikdar

· 5,460 Views · 1 Like

6 Rules of Thumb for MongoDB Schema Design: Part 3

By William Zola, Lead Technical Support Engineer at MongoDB This is our final stop in this tour of modeling One-to-N relationships in MongoDB. In the first post, I covered the three basic ways to model a One-to-N relationship. Last time, I covered some extensions to those basics: two-way referencing and denormalization. Denormalization allows you to avoid some application-level joins, at the expense of having more complex and expensive updates. Denormalizing one or more fields makes sense if those fields are read much more often than they are updated. Read part two if you’ve missed it. Whoa! Look at All These Choices! So, to recap: You can embed, reference from the “one” side, or reference from the “N” side, or combine a pair of these techniques You can denormalize as many fields as you like into the “one” side or the “N” side Denormalization, in particular, gives you a lot of choices: if there are 8 candidates for denormalization in a relationship, there are 2 8 (1024) different ways to denormalize (including not denormalizing at all). Multiply that by the three different ways to do referencing, and you have over 3,000 different ways to model the relationship. Guess what? You now are stuck in the “paradox of choice” — because you have so many potential ways to model a “one-to-N” relationship, your choice on how to model it just got harder. Lots harder. Rules of Thumb: Your Guide Through the Rainbow Here are some “rules of thumb” to guide you through these indenumberable (but not infinite) choices One: favor embedding unless there is a compelling reason not to Two: needing to access an object on its own is a compelling reason not to embed it Three: Arrays should not grow without bound. If there are more than a couple of hundred documents on the “many” side, don’t embed them; if there are more than a few thousand documents on the “many” side, don’t use an array of ObjectID references. High-cardinality arrays are a compelling reason not to embed. Four: Don’t be afraid of application-level joins: if you index correctly and use the projection specifier (as shown in part 2) then application-level joins are barely more expensive than server-side joins in a relational database. Five: Consider the write/read ratio when denormalizing. A field that will mostly be read and only seldom updated is a good candidate for denormalization: if you denormalize a field that is updated frequently then the extra work of finding and updating all the instances is likely to overwhelm the savings that you get from denormalizing. Six: As always with MongoDB, how you model your data depends — entirely — on your particular application’s data access patterns. You want to structure your data to match the ways that your application queries and updates it. Your Guide To The Rainbow When modeling “One-to-N” relationships in MongoDB, you have a variety of choices, so you have to carefully think through the structure of your data. The main criteria you need to consider are: What is the cardinality of the relationship: is it “one-to-few”, “one-to-many”, or “one-to-squillions”? Do you need to access the object on the “N” side separately, or only in the context of the parent object? What is the ratio of updates to reads for a particular field? Your main choices for structuring the data are: For “one-to-few”, you can use an array of embedded documents For “one-to-many”, or on occasions when the “N” side must stand alone, you should use an array of references. You can also use a “parent-reference” on the “N” side if it optimizes your data access pattern. For “one-to-squillions”, you should use a “parent-reference” in the document storing the “N” side. Once you’ve decided on the overall structure of the data, then you can, if you choose, denormalize data across multiple documents, by either denormalizing data from the “One” side into the “N” side, or from the “N” side into the “One” side. You’d do this only for fields that are frequently read, get read much more often than they get updated, and where you don’t require strong consistency, since updating a denormalized value is slower, more expensive, and is not atomic. Productivity and Flexibility The upshot of all of this is that MongoDB gives you the ability to design your database schema to match the needs of your application. You can structure your data in MongoDB so that it adapts easily to change, and supports the queries and updates that you need to get the most out of your application.

August 13, 2014

by Francesca Krihely

· 8,761 Views

6 Rules of Thumb for MongoDB Schema Design: Part 2

By William Zola, Lead Technical Support Engineer at MongoDB This is the second stop on our tour of modeling One-to-N relationships in MongoDB. Last time I covered the three basic schema designs: embedding, child-referencing, and parent-referencing. I also covered the two factors to consider when picking one of these designs: Will the entities on the “N” side of the One-to-N ever need to stand alone? What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions? With these basic techniques under our belt, I can move on to covering more sophisticated schema designs, involving two-way referencing and denormalization. Intermediate: Two-Way Referencing If you want to get a little bit fancier, you can combine two techniques and include both styles of reference in your schema, having both references from the “one” side to the “many” side and references from the “many” side to the “one” side. For an example, let’s go back to that task-tracking system. There’s a “people” collection holding Person documents, a “tasks” collection holding Task documents, and a One-to-N relationship from Person -> Task. The application will need to track all of the Tasks owned by a Person, so we will need to reference Person -> Task. With the array of references to Task documents, a single Person document might look like this: db.person.findOne() { _id: ObjectID("AAF1"), name: "Kate Monster", tasks [ // array of references to Task documents ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73") // etc ] } On the other hand, in some other contexts this application will display a list of Tasks (for example, all of the Tasks in a multi-person Project) and it will need to quickly find which Person is responsible for each Task. You can optimize this by putting an additional reference to the Person in the Task document. db.tasks.findOne() { _id: ObjectID("ADF9"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), owner: ObjectID("AAF1") // Reference to Person document } This design has all of the advantages and disadvantages of the “One-to-Many” schema, but with some additions. Putting in the extra ‘owner’ reference into the Task document means that its quick and easy to find the Task’s owner, but it also means that if you need to reassign the task to another person, you need to perform two updates instead of just one. Specifically, you’ll have to update both the reference from the Person to the Task document, and the reference from the Task to the Person. (And to the relational gurus who are reading this — you’re right: using this schema design means that it is no longer possible to reassign a Task to a new Person with a single atomic update. This is OK for our task-tracking system: you need to consider if this works with your particular use case.) Intermediate: Denormalizing With “One-To-Many” Relationships Beyond just modeling the various flavors of relationships, you can also add denormalization into your schema. This can eliminate the need to perform the application-level join for certain cases, at the price of some additional complexity when performing updates. An example will help make this clear. Denormalizing from Many -> One For the parts example, you could denormalize the name of the part into the ‘parts[]’ array. For reference, here’s the version of the Product document without denormalization. > db.products.findOne() { name : 'left-handed smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ // array of references to Part documents ObjectID('AAAA'), // reference to the #4 grommet above ObjectID('F17C'), // reference to a different Part ObjectID('D2AA'), // etc ] } Denormalizing would mean that you don’t have to perform the application-level join when displaying all of the part names for the product, but you would have to perform that join if you needed any other information about a part. > db.products.findOne() { name : 'left-handed smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ { id : ObjectID('AAAA'), name : '#4 grommet' }, // Part name is denormalized { id: ObjectID('F17C'), name : 'fan blade assembly' }, { id: ObjectID('D2AA'), name : 'power switch' }, // etc ] } While making it easier to get the part names, this would add just a bit of client-side work to the application-level join: // Fetch the product document > product = db.products.findOne({catalog_number: 1234}); // Create an array of ObjectID()s containing *just* the part numbers > part_ids = product.parts.map( function(doc) { return doc.id } ); // Fetch all the Parts that are linked to this Product > product_parts = db.parts.find({_id: { $in : part_ids } } ).toArray() ; Denormalizing saves you a lookup of the denormalized data at the cost of a more expensive update: if you’ve denormalized the Part name into the Product document, then when you update the Part name you must also update every place it occurs in the ‘products’ collection. Denormalizing only makes sense when there’s an high ratio of reads to updates. If you’ll be reading the denormalized data frequently, but updating it only rarely, it often makes sense to pay the price of slower updates — and more complex updates — in order to get more efficient queries. As updates become more frequent relative to queries, the savings from denormalization decrease. For example: assume the part name changes infrequently, but the quantity on hand changes frequently. This means that while it makes sense to denormalize the part name into the Product document, it does not make sense to denormalize the quantity on hand. Also note that if you denormalize a field, you lose the ability to perform atomic and isolated updates on that field. Just like with the two-way referencing example above, if you update the part name in the Part document, and then in the Product document, there will be a sub-second interval where the denormalized ‘name’ in the Product document will not reflect the new, updated value in the Part document. Denormalizing from One -> Many You can also denormalize fields from the “One” side into the “Many” side: > db.parts.findOne() { _id : ObjectID('AAAA'), partno : '123-aff-456', name : '#4 grommet', product_name : 'left-handed smoke shifter', // Denormalized from the ‘Product’ document product_catalog_number: 1234, // Ditto qty: 94, cost: 0.94, price: 3.99 } However, if you’ve denormalized the Product name into the Part document, then when you update the Product name you must also update every place it occurs in the ‘parts’ collection. This is likely to be a more expensive update, since you’re updating multiple Parts instead of a single Product. As such, it’s significantly more important to consider the read-to-write ratio when denormalizing in this way. Intermediate: Denormalizing With “One-To-Squillions” Relationships You can also denormalize the “one-to-squillions” example. This works in one of two ways: you can either put information about the “one” side (from the ‘hosts’ document) into the “squillions” side (the log entries), or you can put summary information from the “squillions” side into the “one” side. Here’s an example of denormalizing into the “squillions” side. I’m going to add the IP address of the host (from the ‘one’ side) into the individual log message: > db.logmsg.findOne() { time : ISODate("2014-03-28T09:42:41.382Z"), message : 'cpu is on fire!', ipaddr : '127.66.66.66', host: ObjectID('AAAB') } Your query for the most recent messages from a particular IP address just got easier: it’s now just one query instead of two. > last_5k_msg = db.logmsg.find({ipaddr : '127.66.66.66'}).sort({time : -1}).limit(5000).toArray() In fact, if there’s only a limited amount of information you want to store at the “one” side, you can denormalize it ALL into the “squillions” side and get rid of the “one” collection altogether: > db.logmsg.findOne() { time : ISODate("2014-03-28T09:42:41.382Z"), message : 'cpu is on fire!', ipaddr : '127.66.66.66', hostname : 'goofy.example.com', } On the other hand, you can also denormalize into the “one” side. Lets say you want to keep the last 1000 messages from a host in the ‘hosts’ document. You could use the $each / $slice functionality introduced in MongoDB 2.4 to keep that list sorted, and only retain the last 1000 messages: The log messages get saved in the ‘logmsg’ collection as well as in the denormalized list in the ‘hosts’ document: that way the message isn’t lost when it ages out of the ‘hosts.logmsgs’ array. // Get log message from monitoring system logmsg = get_log_msg(); log_message_here = logmsg.msg; log_ip = logmsg.ipaddr; // Get current timestamp now = new Date() // Find the _id for the host I’m updating host_doc = db.hosts.findOne({ipaddr : log_ip },{_id:1}); // Don’t return the whole document host_id = host_doc._id; // Insert the log message, the parent reference, and the denormalized data into the ‘many’ side db.logmsg.save({time : now, message : log_message_here, ipaddr : log_ip, host : host_id ) }); // Push the denormalized log message onto the ‘one’ side db.hosts.update( {_id: host_id }, {$push : {logmsgs : { $each: [ { time : now, message : log_message_here } ], $sort: { time : 1 }, // Only keep the latest ones $slice: -1000 } // Only keep the latest 1000 } ); Note the use of the projection specification ( {_id:1} ) to prevent MongoDB from having to ship the entire ‘hosts’ document over the network. By telling MongoDB to only return the _id field, I reduce the network overhead down to just the few bytes that it takes to store that field (plus just a little bit more for the wire protocol overhead). Just as with denormalizing in the “One-to-Many” case, you’ll want to consider the ratio of reads to updates. Denormalizing the log messages into the Host document makes sense only if log messages are infrequent relative to the number of times the application needs to look at all of the messages for a single host. This particular denormalization is a bad idea if you want to look at the data less frequently than you update it. Recap In this post, I’ve covered the additional choices that you have past the basics of embed, child-reference, or parent-reference. You can use bi-directional referencing if it optimizes your schema, and if you are willing to pay the price of not having atomic updates If you are referencing, you can denormalize data either from the “One” side into the “N” side, or from the “N” side into the “One” side When deciding whether or not to denormalize, consider the following factors: You cannot perform an atomic update on denormalized data Denormalization only makes sense when you have a high read to write ratio Next time, I’ll give you some guidelines to pick and choose among all of these options. To learn more Schema Design tips, see our recent Back to Basics webinar on “Thinking in Documents” Sign up for the MongoDB Newsletter to get MongoDB updates right to your inbox

August 11, 2014

by Francesca Krihely

· 12,416 Views

Introducing BIRT iHub F-Type

Actuate recently released a new, free BIRT server called the BIRT iHub F-Type. It incorporates all the functionality of BIRT iHub and is limited only by the capacity of output it can deliver on a daily basis. It is ideal for departmental and smaller scale applications. When BIRT F-Type reaches its maximum output capacity, additional capacity can be purchased on a subscription based model. Some of the key features of BIRT iHub F-Type that will help improve your BIRT content applications are: Interactivity – Allow end-users to modify and personalize reports, and answer questions themselves. Scheduling – Automate report generation based on rules and calendar, and then notify users. Sharing – Secure document management and distribution that allows users to only access content/data they are entitled to. Excel Emitter – Export as native Excel (not CSV) with formulas/pivot tables/worksheets/charts. Integration – JavaScript API to embed dynamic reports and visualizations in your web app. Downloading BIRT iHub F-Type Before we get started with the installation process, we need to download BIRT iHub F-Type. There are three downloads available: Windows, Linux, and a VMware image. This blog will cover the Windows installation. If you’re installing either of the other types, you’ll find links to guides for them at the bottom of this blog post. Once you click on your chosen download, you’ll be asked to register. If you’ve already registered, click the “Click to Login” button. If not, fill out the short registration form to get started. Next, read and accept the license agreement. Once you’ve done that, click the checkbox, and a link for the download will appear. Click that to start your download. At this point, you should also receive an email with an activation code. Be sure to check your spam folder if you don’t see it in your inbox. Installing BIRT iHub F-Type After the download is complete, launch the executable file named ActuateBIRTiHubFType.exe. A welcome message will appear. Press Next to continue. You must read and accept the license agreement on the next screen. Choose a destination folder for the installation. The default is C:\Actuate\BIRTiHub. If you have existing BIRT designs that depend on a JDBC database driver, you can optionally specify the folder where these drivers are located. Press Next to continue. Once the installation has finished, press Finish to launch the BIRT iHub F-Type. A desktop shortcut is also created that points to the iHub F-Type URL at http://localhost:8700/iportal. The first time you launch the BIRT iHub F-Type, you will need to activate it. Enter the activation code that you should have received in an e-mail. After entering a valid activation code, you should receive a message that the code was accepted and the BIRT iHub F-Type should start initializing services. Once that has completed, you will be presented with the login screen. The default user name is “administrator” and the password is blank for your first log in. You’ll be able to change this after you have logged in. Press “Log In” to continue. The first time you launch the BIRT iHub F-Type, you will be in tutorial mode which will help you get started loading your BIRT content and required resources. You can bypass the tutorial mode at any time by pressing the “Exit Tutorial” button at the top right. Select a BIRT design (*.rptdesign) file and press the Upload button. If you don’t have a BIRT design, you can download a sample from the link on the same page. The BIRT design file is automatically inspected and if there are any dependent files needed, like images, data files, BIRT report libraries, CSS styles, or other linked BIRT designs, you will be asked to upload those files as well. Once your BIRT design and dependent files are uploaded, your BIRT report will be displayed in the BIRT iHub F-Type and is now ready to explore. Thanks for reading. Now, it’s time to unleash the full power of BIRT into your application. If you have any questions or comments, please feel free to use the comments section below or visit the BIRT iHub F-Type forum. -Virgil For more blogs in the “Introducing BIRT iHub F-Type” series, see the list below: Installing iHub F-Type: Linux | VMWare Image - See more at: http://blogs.actuate.com/introducing-birt-ihub-f-type-installing-on-windows/#sthash.QPJhv2gw.dpuf

August 6, 2014

by Michael Singer

· 1,982 Views

Distributed Big Balls of Mud

if you want evidence that the software development industry is susceptible to fashion, just go and take a look at all of the hype around microservices. it's everywhere! for some people microservices is "the next big thing", whereas for others it's simply a lightweight evolution of the big soap service-oriented architectures that we saw 10 years ago "done right". i do like a lot of what the current microservice architectures are doing, but it's by no means a silver bullet. okay, i know that sounds obvious, but i think many people are jumping on them for the wrong reason. i often show this slide in my conference talks, and i've blogged about this before , but basically there are different ways to build software systems. on the one side we have traditional monolithic systems, where everything is bundled up inside a single deployable unit. this is probably where most of the industry is. caveats apply, but monoliths can be built quickly and are easy to deploy, but they provide limited agility because even tiny changes require a full redeployment. we also know that monoliths often end up looking like a big ball of mud because of the way that software often evolves over time. for example, many monolithic systems are built using a layered architecture, and it's relatively easy for layered architectures to be abused (e.g. skipping "around" a service to call the repository/data access layer directly). on the other side we have service-based architectures, where a software system is made up of many separately deployable services. again, caveats apply but, if done well, service-based architectures buy you a lot of flexibility and agility because each service can be developed, tested, deployed, scaled, upgraded and rewritten separately, especially if the services are decoupled via asynchronous messaging. the downside is increased complexity because your software system now has many more moving parts than a monolith. as robert says, the complexity is still there, you're just moving it somewhere else . there is, of course, a mid-ground here. we can build monolithic systems that are made up of in-process components, each of which has an explicit well-defined interface and set of responsibilities. this is old-school component-based design that talks about high cohesion and low coupling, but i usually sense some hesitation when i talk about it. and this seems odd to me. before i explain why, let me quote something from a blog post that i read earlier this morning about the rationale behind a team adopting a microservices approach. when we started building karma, we decided to split the project into two main parts: the backend api, and the frontend application. the backend is responsible for handling orders from the store, usage accounting, user management, device management and so forth, while the frontend offers a dashboard for users which accesses this api. along the way we noticed that if the whole backend api is monolithic it doesn't work very well because everything gets entangled. the blog post also mentions scaling, versioning and multiple languages/frameworks as other reasons to choose microservices. again, there are no silver bullets here, everything is a trade-off. anyway, "everything getting entangled" is not a reason to switch from monoliths to microservices. if you're building a monolithic system and it's turning into a big ball of mud, perhaps you should consider whether you're taking enough care of your software architecture. do you really understand what the core structural abstractions are in your software? are their interfaces and responsibilities clear too? if not, why do you think moving to a microservices architecture will help? sure, the physical separation of services will force you to not take some shortcuts, but you can achieve the same separation between components in a monolith. a little design thinking and an architecturally-evident coding style will help to achieve this without the baggage of going distributed. many of the teams i've spoken to are building monolithic systems and don't want to look at component-based design. the mid-ground seems to be a hard-sell. i ran a software architecture sketching workshop with a team earlier this year where we diagrammed one of their software systems. the diagram started as a strictly layered architecture (presentation, business services, data access) with all arrows pointing downwards and each layer only ever calling the layer directly beneath it. the code told a different story though and the eventual diagram didn't look so neat anymore. we discussed how adopting a package by component approach could fix some of these problems, but the response was, "meh, we like building software using layers". it seems as if teams are jumping on microservices because they're sexy, but the design thinking and decomposition strategy required to create a good microservices architecture are the same as those needed to create a well structured monolith. if teams find it hard to create a well structured monolith, i don't rate their chances of creating a well structured microservices architecture. as michael feathers recently said, " there's a bit of overhead involved in implementing each microservice. if they ever become as easy to create as classes, people will have a freer hand to create trouble - hulking monoliths at a different scale. ". i agree. a world of distributed big balls of mud worries me.

August 4, 2014

by Simon Brown

· 9,314 Views

ACID Compliance: What It Means and Why You Should Care

Originally written by Dave Anselmi The presence of four components — atomicity, consistency, isolation and durability — can ensure that a database transaction is completed in a timely manner. When databases possess these components, they are said to be ACID-compliant. So just what is ACID compliance, and why should you care? Let’s take a look: Atomicity: Database transactions, like atoms, can be broken down into smaller parts. When it comes to your database, atomicity refers to the integrity of the entire database transaction, not just a component of it. In other words, if one part of a transaction doesn’t work like it’s supposed to, the other will fail as a result—and vice versa. For example, if you’re shopping on an e-commerce site, you must have an item in your cart in order to pay for it. What you can’t do is pay for something that’s not in your cart. (You can add something into your cart and not pay for it, but that database transaction won’t be complete, and thus not ‘atomic’, until you pay for it.) Consistency: For any database to operate as it’s intended to operate, it must follow the appropriate data validation rules. Thus, consistency means that only data which follows those rules is permitted to be written to the database. If a transaction occurs and results in data that does not follow the rules of the database, it will be ‘rolled back’ to a previous iteration of itself (or ‘state’) which complies with the rules. On the other hand, following a successful transaction, new data will be added to the database and the resulting state will be consistent with existing rules. Isolation: It’s safe to say that at any given time on Amazon, there is far more than one transaction occurring on the platform… In fact, an incredibly huge amount of database transactions are occurring simultaneously! For a database, isolation refers to the ability to concurrently process multiple transactions in a way that one does not affect another. So, imagine you and your neighbor are both trying to buy something from the same e-commerce platform at the same time. There are 10 items for sale: your neighbor wants five and you want six. Isolation means that one of those transactions would be completed ahead of the other one. In other words, if your neighbor clicked first, they will get five items, and only five items will be remaining in stock. So you will only get to buy five items. If you clicked first, you will get the six items you want, and they will only get four. Thus,. isolation ensures that eleven items aren’t sold when only ten exist. Durability: All technology fails from time to time… the goal is to make those failures invisible to the end-user. In databases that possess durability, data is saved once a transaction is completed, even if a power outage or system failure occurs. Imagine you’re buying in-demand concert tickets on a site similar to Ticketmaster.com. Right when tickets go on sale, you’re ready to make a purchase. After being stuck in the digital waiting room for some time, you’re finally able to add those tickets to your cart.. You then make the purchase and get your confirmation. However if that database lacks durability, even after your ticket purchase was confirmed, if the database suffers a failure incident your transaction would still be lost! As you might expect, this is a really bad thing to happen for an online e-commerce site, so transaction durability is a must-have. ClustrixDB, a NewSQL cloud database, comes with the added benefit of guaranteed ACID transactions, critical for e-commerce success! Click here to learn more.

July 30, 2014

by Lisa Schultz

· 11,393 Views

CRUD Operation with ASP.NET MVC and Fluent Nhibernate

Before some time I have written a post about Getting Started with Nhibernate and ASP.NET MVC –CRUD operations. It’s one of the most popular post blog post on my blog. I get lots of questions via email and other medium why you are not writing a post about Fluent Nhibernate and ASP.NET MVC. So I thought it will be a good idea to write a blog post about it. What is Fluent Nhibernate: Convection over configuration that is mantra of Fluent Hibernate If you have see the my blog post about Nhibernate then you might have found that we need to create xml mapping to map table. Fluent Nhibernate uses POCO mapping instead of XML mapping and I firmly believe in Convection over configuration that is why I like Fluent Nhibernate a lot. But it’s a matter of Personal Test and there is no strong argument why you should use Fluent Nhibernate instead of Nhibernate. Fluent Nhibernate is team Comprises of James Gregory, Paul Batum, Andrew Stewart and Hudson Akridge. There are lots of committers and it’s a open source project. You can find all more information about at following site. http://www.fluentnhibernate.org/ On this site you can find definition of Fluent Nhibernate like below. Fluent, XML-less, compile safe, automated, convention-based mappings for NHibernate. Get your fluent on. They have excellent getting started guide on following url. You can easily walk through it and learned it. https://github.com/jagregory/fluent-nhibernate/wiki/Getting-started ASP.NET MVC and Fluent Nhibernate: So all set it’s time to write a sample application. So from visual studio 2013 go to file – New Project and add a new web application project with ASP.NET MVC. Once you are done with creating a web project with ASP.NET MVC It’s time to add Fluent Nhibernate to Application. You can add Fluent Nhibernate to application via following NuGet Package. And you can install it with package manager console like following. Now we are done with adding a Fluent Nhibernate to it’s time to add a new database. Now I’m going to create a simple table called “Employee” with four columns Id, FirstName, LastName and Designation. Following is a script for that. CREATE TABLE [dbo].[Employee] ( [Id] INT NOT NULL PRIMARY KEY, [FirstName] NVARCHAR(50) NULL, [LastName] NVARCHAR(50) NULL, [Designation] NVARCHAR(50) NULL ) Now table is ready it’s time to create a Model class for Employee. So following is a code for that. namespace FluentNhibernateMVC.Models { public class Employee { public virtual int EmployeeId { get; set; } public virtual string FirstName { get; set; } public virtual string LastName { get; set; } public virtual string Designation{ get; set; } } } Now Once we are done with our model classes and other stuff now it’s time to create a Map class which map our model class to our database table as we know that Fluent Nhibernate is using POCO classes to map instead of xml mappings. Following is a map class for that. using FluentNHibernate.Mapping; namespace FluentNhibernateMVC.Models { public class EmployeeMap : ClassMap { public EmployeeMap() { Id(x => x.EmployeeId); Map(x => x.FirstName); Map(x => x.LastName); Map(x => x.Designation); Table("Employee"); } } } If you see Employee Map class carefully you will see that I have mapped Id column with EmployeeId in table and another things you can see I have written a table with “Employee” which will map Employee class to employee class. Now once we are done with our mapping class now it’s time to add Nhibernate Helper which will connect to SQL Server database. using FluentNHibernate.Cfg; using FluentNHibernate.Cfg.Db; using NHibernate; using NHibernate.Tool.hbm2ddl; namespace FluentNhibernateMVC.Models { public class NHibernateHelper { public static ISession OpenSession() { ISessionFactory sessionFactory = Fluently.Configure() .Database(MsSqlConfiguration.MsSql2008 .ConnectionString(@"Data Source=(LocalDB)\v11.0;AttachDbFilename=C:\data\Blog\Samples\FluentNhibernateMVC\FluentNhibernateMVC\FluentNhibernateMVC\App_Data\FNhibernateDemo.mdf;Integrated Security=True") .ShowSql() ) .Mappings(m => m.FluentMappings .AddFromAssemblyOf()) .ExposeConfiguration(cfg => new SchemaExport(cfg) .Create(false, false)) .BuildSessionFactory(); return sessionFactory.OpenSession(); } } } Now we are done with classes It’s time to scaffold our Controller for the application. Right click Controller folder and click on add new controller. It will ask for scaffolding items like following. I have selected MVC5 controller with read/write action. Once you click Add It will ask for controller name. Now our controller is ready. It’s time to write code for our actions. Listing: Following is a code for our listing action. public ActionResult Index() { using (ISession session = NHibernateHelper.OpenSession()) { var employees = session.Query().ToList(); return View(employees); } } You add view for listing via right click on View and Add View. That’s is you are done with listing of your database. I have added few records to database table to see whether its working or not. Following is a output as expected. Create: Now it’s time to create code and views for creating a Employee. Following is a code for action results. public ActionResult Create() { return View(); } // POST: Employee/Create [HttpPost] public ActionResult Create(Employee employee) { try { using (ISession session = NHibernateHelper.OpenSession()) { using (ITransaction transaction = session.BeginTransaction()) { session.Save(employee); transaction.Commit(); } } return RedirectToAction("Index"); } catch (Exception exception) { return View(); } } Here you can see one if for blank action and another for posting data and saving to SQL Server with HttpPost Attribute. You can add view for create via right click on view on action result like following. Now when you run this you will get output as expected. Once you click on create it will return back to listing screen like this. Edit: Now we are done with listing/adding employee it’s time to write code for editing/updating employee. Following is a code Action Methods. public ActionResult Edit(int id) { using (ISession session = NHibernateHelper.OpenSession()) { var employee = session.Get(id); return View(employee); } } // POST: Employee/Edit/5 [HttpPost] public ActionResult Edit(int id, Employee employee) { try { using (ISession session = NHibernateHelper.OpenSession()) { var employeetoUpdate = session.Get(id); employeetoUpdate.Designation = employee.Designation; employeetoUpdate.FirstName = employee.FirstName; employeetoUpdate.LastName = employee.LastName; using (ITransaction transaction = session.BeginTransaction()) { session.Save(employeetoUpdate); transaction.Commit(); } } return RedirectToAction("Index"); } catch { return View(); } } You can create edit view via following via right click on view. Now when you run application it will work as expected. Details: You can write details code like following. public ActionResult Details(int id) { using (ISession session = NHibernateHelper.OpenSession()) { var employee = session.Get(id); return View(employee); } } You can view via right clicking on view as following. Now when you run application following is a output as expected. Delete: Following is a code for deleting Employee one action method for confirmation of delete and another one for deleting data from employee table. public ActionResult Delete(int id) { using (ISession session = NHibernateHelper.OpenSession()) { var employee = session.Get(id); return View(employee); } } [HttpPost] public ActionResult Delete(int id, Employee employee) { try { using (ISession session = NHibernateHelper.OpenSession()) { using (ITransaction transaction = session.BeginTransaction()) { session.Delete(employee); transaction.Commit(); } } return RedirectToAction("Index"); } catch (Exception exception) { return View(); } } You can add view for delete via right click on view like following. When you run it’s running as expected. Once you click on delete it will delete employee. That’s it. You can see it’s very easy. Fluent Nhibernate supports POCO mapping without writing any complex xml mappings. You can find whole source code at GitHub at following location. https://github.com/dotnetjalps/FluentNhiberNateMVC Hope you like it. Stay tuned for more!!

July 28, 2014

by Jalpesh Vadgama

· 24,358 Views · 1 Like

From JPA to Hibernate's Legacy and Enhanced Identifier Generators

Read about enhanced identifier generators, like JPA and Hibernate.

July 16, 2014

by Vlad Mihalcea

· 19,480 Views

Hibernate Identity, Sequence and Table (Sequence) Generator

Learn about Identity, Sequence, and Table in Hibernate.

July 9, 2014

by Vlad Mihalcea

· 178,064 Views · 2 Likes

SpringBoot: Introducing SpringBoot

SpringBoot...there is a lot of buzz about SpringBoot nowadays. So what is SpringBoot? SpringBoot is a new spring portfolio project which takes opinionated view of building production-ready Spring applications by drastically reducing the amount of configuration required. Spring Boot is taking the convention over configuration style to the next level by registering the default configurations automatically based on the classpath libraries available at runtime. Well.. you might have already read this kind of introduction to SpringBoot on many blogs. So let me elaborate on what SpringBoot is and how it helps developing Spring applications more quickly. Spring framework was created by Rod Johnson when many of the Java developers are struggling with EJB 1.x/2.x for building enterprise applications. Spring framework makes developing the business components easy by using Dependency Injection and Aspect Oriented Programming concepts. Spring became very popular and many more Spring modules like SpringSecurity, Spring Batch, Spring Data etc become part of Spring portfolio. As more and more features added to Spring, configuring all the spring modules and their dependencies become a tedious task. Adding to that Spring provides atleast 3 ways of doing anything :-). Some people see it as flexibility and some others see it as confusing. Slowly, configuring all the Spring modules to work together became a big challenge. Spring team came up with many approaches to reduce the amount of configuration needed by introducing Spring XML DSLs, Annotations and JavaConfig. In the very beginning I remember configuring a big pile of jar version declarations in section and lot of declarations. Then I learned creating maven archetypes with basic structure and minimum required configurations. This reduced lot of repetitive work, but not eliminated completely. Whether you write the configuration by hand or generate by some automated ways, if there is code that you can see then you have to maintain it. So whether you use XML or Annotations or JavaConfig, you still need to configure(copy-paste) the same infrastructure setup one more time. On the other hand, J2EE (which is dead long time ago) emerged as JavaEE and since JavaEE6 it became easy (compared to J2EE and JavaEE5) to develop enterprise applications using JavaEE platform. Also JavaEE7 released with all the cool CDI, WebSockets, Batch, JSON support etc things became even more simple and powerful as well. With JavaEE you don't need so much XML configuration and your war file size will be in KBs (really??? for non-helloworld/non-stageshow apps also :-)). Naturally this "convention over configuration" and "you no need to glue APIs together appServer already did it" arguments became the main selling points for JavaEE over Spring. Then Spring team addresses this problem with SpringBoot :-). Now its time to JavaEE to show whats the SpringBoot's counterpart in JavaEE land :-) JBoss Forge?? I love this Spring vs JavaEE thing which leads to the birth of powerful tools which ultimately simplify the developers life :-). Many times we need similar kind of infrastructure setup using same libraries. For example, take a web application where you map DispatcherServlet url-pattern to "/", implement RESTFul webservices using Jackson JSON library with Spring Data JPA backend. Similarly there could be batch or spring integration applications which needs similar infrastructure configuration. SpringBoot to the rescue. SpringBoot look at the jar files available to the runtime classpath and register the beans for you with sensible defaults which can be overridden with explicit settings. Also SpringBoot configure those beans only when the jars files available and you haven't define any such type of bean. Altogether SpringBoot provides common infrastructure without requiring any explicit configuration but lets the developer overrides if needed. To make things more simpler, SpringBoot team provides many starter projects which are pre-configured with commonly used dependencies. For example Spring Data JPA starter project comes with JPA 2.x with Hibernate implementation along with Spring Data JPA infrastructure setup. Spring Web starter comes with Spring WebMVC, Embedded Tomcat, Jackson JSON, Logback setup. Aaah..enough theory..lets jump onto coding. I am using latest STS-3.5.1 IDE which provides many more starter project options like Facebbok, Twitter, Solr etc than its earlier version. Create a SpringBoot starter project by going to File -> New -> Spring Starter Project -> select Web and Actuator and provide the other required details and Finish. This will create a Spring Starter Web project with the following pom.xml and Application.java 4.0.0 com.sivalabs hello-springboot 1.0-SNAPSHOT jar hello-springboot Spring Boot Hello World org.springframework.boot spring-boot-starter-parent 1.1.3.RELEASE org.springframework.boot spring-boot-starter-actuator org.springframework.boot spring-boot-starter-web org.springframework.boot spring-boot-starter-test test UTF-8 com.sivalabs.springboot.Application 1.7 org.springframework.boot spring-boot-maven-plugin package com.sivalabs.springboot; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.EnableAutoConfiguration; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; @Configuration @ComponentScan @EnableAutoConfiguration public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); } } Go ahead and run this class as a standalone Java class. It will start the embedded Tomcat server on 8080 port. But we haven't added any endpoints to access, lets go ahead and add a simple REST endpoint. @Configuration @ComponentScan @EnableAutoConfiguration @Controller public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); } @RequestMapping(value="/") @ResponseBody public String bootup() { return "SpringBoot is up and running"; } } Now point your browser to http://localhost:8080/ and you should see the response "SpringBoot is up and running". Remember while creating project we have added Actuator starter module also. With Actuator you can obtain many interesting facts about your application. Try accessing the following URLs and you can see lot of runtime environment configurations that are provided by SpringBoot. http://localhost:8080/beans http://localhost:8080/metrics http://localhost:8080/trace http://localhost:8080/env http://localhost:8080/mappings http://localhost:8080/autoconfig http://localhost:8080/dump SpringBoot actuator deserves a dedicated blog post to cover its vast number of features, I will cover it in my upcoming posts. I hope this article provides some basic introduction to SpringBoot and how it simplifies the Spring application development. More on SpringBoot in upcoming articles. - See more at: http://www.sivalabs.in/2014/07/springboot-introducing-springboot.html#sthash.7syCIt8V.dpuf

July 4, 2014

by Siva Prasad Reddy Katamreddy

· 12,448 Views · 5 Likes

Reporting Back from MongoDB World 2014, NYC, Planet JSON

Closely approaching the one year mark of when I first joined MongoLab (and the MongoDB community), I had the pleasure of attending the inaugural MongoDB World conference put together by the incredible MongoDB team. Second only to the excitement around major MongoDB feature announcements was the collective disbelief that this was MongoDB’s first multi-day conference ever. A big congratulations to all those that worked hard to put on such a massive (did you see the Intrepid!?) event. All this planning would have been for naught if MongoDB leaders and engineers failed to deliver announcements and features that would meet and exceed expectations. From major public cloud announcements to the reveal of document-level locking in version 2.8, developers and conference goers had plenty to be excited about. There was a lot to digest from the conference… we’ll cover the major highlights in case you missed them. Big announcements in public cloud Our time at the MongoLab booth yielded many high-quality conversations, predominantly those about offloading previously internal processes and workloads to the public cloud. It was remarkable to see and hear so many enterprise teams with the exact same message: the public cloud is the future, and the future is now. It’s no surprise then that MongoDB, Inc. released not one, but two press releases around MongoDB solutions for the public cloud. Fully-managed MongoDB on the Microsoft Azure Store Nearly one year ago, MongoDB, Inc. chose to partner with the MongoLab team to build a production-ready MongoDB solution for developers on Microsoft Azure. On the first day of World, MongoDB, Inc. announced the product of our collaboration – a fully-managed highly available MongoDB-as-a-Service Add-On offering on the Microsoft Azure Store. This new service runs MongoDB Enterprise and offers replication, monitoring and support from MongoDB, Inc. It’s also backed by MongoDB Management Service (MMS), allowing for point-in-time recovery of MongoDB deployments. Now, teams without the expertise or resources to manage their MongoDB deployment(s) can outsource all the database operations (monitoring and alerting, backups, performance tuning, etc.) to both MongoLab and MongoDB’s expert support teams. You can check out the MongoDB add-on in the Azure Store: https://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/ MongoDB solutions on Google Cloud Platform MongoDB, Inc. also announced the arrival of new resources to help Google Cloud Platform customers deploy MongoDB on Google Compute Engine. These resources include a “Click to Deploy” feature and a MongoDB on Google Compute Engine Solutions paper covering MongoDB best practices. If you are looking for a fully-managed solution, with automated provisioning, backups, integrated monitoring and alerting, along with expert support, MongoLab recently announced the arrival of production-ready replica sets on Google. Product Roadmap – MongoDB version 2.8 On the second day of MongoDB World, Eliot Horowitz, MongoDB, Inc. CTO & Co-founder, took center stage and announced two huge changes to the MongoDB core project: document-level locking and pluggable storage engines. These features not only reflect improvements to the core project, but also signal to the community that the MongoDB team is listening to its users and is capable of delivering the software needed to power the workloads of tomorrow. Document-level locking The slides above from Eliot’s keynote point to a current obstacle (database-level locking) in MongoDB that limits overall scalability. With database-level locking, any write operation to the database holds the write lock and prevents subsequent writes from executing on the database until the original operation holding the write lock completes. Eliot’s announcement of document-level locking moves the write lock contention from the database level to the document (MongoDB equivalent to SQL “records”) level. This change will allow users to achieve much higher write throughput (we saw a 10x performance improvement in the live demo) across their MongoDB deployments, improving write scalability. If you’d like to try out document-level locking, the MongoDB team has already pushed the feature to the master branch on GitHub. This should only be used for experimentation, not to be run in production. Pluggable storage engine As MongoDB matures, feature releases like document level locking will continue to allow developers to build robust systems on top of MongoDB. But as the number of use cases grows, different tooling tailored to specific use cases may prove to be extremely beneficial. For example, if Company X decides that they want to use MongoDB to warehouse some of their data, they would likely want to optimize their database for slow-moving data and storage efficiency (compression). With the introduction of pluggable storage engines, many new possibilities are open to the community. Teams can now write their own storage engine for a particular use case, configure replica set nodes with different storage engines for specific situations, or collaborate with the open-source community to architect innovative solutions. This feature not only allows for more granular control of the database, but also encourages the MongoDB community to work together. Takeaways: A maturing and thriving ecosystem Roughly a year ago, MongoLab CTO Todd Dampier recapped MongoSF 2013 and spoke to the health of the MongoDB ecosystem. How far we’ve come! After attending the inaugural MongoDB World and chatting with MongoDB Masters, interns, hackathon winners, power users and those new to the community, the enthusiasm is still surging and as positive as ever. This enthusiasm is well placed. Developers and hackers use MongoDB because so much rich data on the web is shared as JSON (think Facebook, Twitter, Google, etc.). As a result, MongoDB is the de-facto database for hackathons and bootstrapped projects. Just learn the API for the site you want to mine, throw the JSON in MongoDB and query your data with the rich query language- it’s that easy. The MongoDB ecosystem is maturing as well. Take a look at the Customer Success Stories and you’ll get a feel for the extent in which enterprises leverage the solution and use it in production. To further drive enterprise adoption, MongoDB, Inc.’s public cloud solutions and product roadmap features aim to help teams run MongoDB in production and give teams the confidence that MongoDB will continue to improve scalability and meet their growing project requirements. Congratulations again to the MongoDB team on their big announcements and for creating such a fantastic forum at which to learn and meet fellow MongoDB users. Our team at MongoLab had a great time making new friends and talking shop; we look forward to meeting more MongoDB users soon (at a MongoDB Days near you)! -Chris@MongoLab

July 2, 2014

by Chris Chang

· 6,501 Views

Why %util Number from Iostat is Meaningless for MySQL Capacity Planning

Earlier this month I wrote about vmstat iowait cpu numbers, and some of the comments I got were advertising the use of util% as reported by the iostat tool instead. I find this number even more useless for MySQL performance tuning and capacity planning. Now let me start by saying this is a really tricky and deceptive number. Many DBAs who report instances of their systems having a very busy IO subsystem said the util% in vmstat was above 99% and therefore they believe this number is a good indicator of an overloaded IO subsystem. Indeed – when your IO subsystem is busy, up to its full capacity, the utilization should be very close to 100%. However, it is perfectly possible for the IO subsystem and MySQL with it to have plenty more capacity than when utilization is showing 100% – as I will show in an example. Before that though lets see what the iostat manual page has to say on this topic – from this main page we can read: %util Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits. Which says right here that the number is useless for pretty much any production database server that is likely to be running RAID, Flash/SSD, SAN or cloud storage (such as EBS) capable of handling multiple requests in parallel. Let’s look at the following illustration. I will run sysbench on a system with a rather slow storage data size larger than buffer pool and uniform distribution to put pressure on the IO subsystem. I will use a read-only benchmark here as it keeps things more simple… sysbench –num-threads=1 –max-requests=0 –max-time=6000000 –report-interval=10 –test=oltp –oltp-read-only=on –db-driver=mysql –oltp-table-size=100000000 –oltp-dist-type=uniform –init-rng=on –mysql-user=root –mysql-password= run I’m seeing some 9 transactions per second, while disk utilization from iostat is at nearly 96%: [ 80s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 171.82ms (95%) [ 90s] threads: 1, tps: 9.20, reads/s: 128.80, writes/s: 0.00 response time: 157.72ms (95%) [ 100s] threads: 1, tps: 9.00, reads/s: 126.00, writes/s: 0.00 response time: 215.38ms (95%) [ 110s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 141.39ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 127.90 0.70 4070.40 28.00 31.87 1.01 7.83 7.52 96.68 This makes a lot of sense – with read single thread read workload the drive should be only used getting data needed by the query, which will not be 100% as there is some extra time needed to process the query on the MySQL side as well as passing the result set back to sysbench. So 96% utilization; 9 transactions per second, this is a close to full-system capacity with less than 5% of device time to spare, right? Let’s run a benchmark with more concurrency – 4 threads at the time; we’ll see… [ 110s] threads: 4, tps: 21.10, reads/s: 295.40, writes/s: 0.00 response time: 312.09ms (95%) [ 120s] threads: 4, tps: 22.00, reads/s: 308.00, writes/s: 0.00 response time: 297.05ms (95%) [ 130s] threads: 4, tps: 22.40, reads/s: 313.60, writes/s: 0.00 response time: 335.34ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 295.40 0.90 9372.80 35.20 31.75 4.06 13.69 3.38 100.01 So we’re seeing 100% utilization now, but what is interesting – we’re able to reclaim much more than less than 5% which was left if we look at utilization – throughput of the system increased about 2.5x Finally let’s do the test with 64 threads – this is more concurrency than exists at storage level which is conventional hard drives in RAID on this system… [ 70s] threads: 64, tps: 42.90, reads/s: 600.60, writes/s: 0.00 response time: 2516.43ms (95%) [ 80s] threads: 64, tps: 42.40, reads/s: 593.60, writes/s: 0.00 response time: 2613.15ms (95%) [ 90s] threads: 64, tps: 44.80, reads/s: 627.20, writes/s: 0.00 response time: 2404.49ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 601.20 0.80 19065.60 33.60 31.73 65.98 108.72 1.66 100.00 In this case we’re getting 4.5x of throughput compared to single thread and 100% utilization. We’re also getting almost double throughput of the run with 4 thread where 100% utilization was reported. This makes sense – there are 4 drives which can work in parallel and with many outstanding requests they are able to optimize their seeks better hence giving a bit more than 4x. So what have we so ? The system which was 96% capacity but which could have driven still to provide 4.5x throughput – so it had plenty of extra IO capacity. More powerful storage might have significantly more ability to run requests in parallel so it is quite possible to see 10x or more room after utilization% starts to be reported close to 100% So if utilization% is not very helpful what can we use to understand our database IO capacity better ? First lets look at the performance reported from those sysbench runs. If we look at 95% response time you can see 1 thread and 4 threads had relatively close 95% time growing just from 150ms to 250-300ms. This is the number I really like to look at- if system is able to respond to the queries with response time not significantly higher than it has with concurrency of 1 it is not overloaded. I like using 3x as multiplier – ie when 95% spikes to be more than 3x of the single concurrency the system might be getting to the overload. With 64 threads the 95% response time is 15-20x of the one we see with single thread so it is surely overloaded. Do we have anything reported by iostat which we can use in a similar way? It turns out we do! Check out the “await” column which tells us how much the requester had to wait for the IO request to be serviced. With single concurrency it is 7.8ms which is what this drives can do for random IO and is as good as it gets. With 4 threads it is 13.7ms – less than double of best possible, so also good enough… with concurrency of 64 it is however 108ms which is over 10x of what this device could produce with no waiting and which is surely sign of overload. A couple words of caution. First, do not look at svctm which is not designed with parallel processing in mind. You can see in our case it actually gets better with high concurrency while really database had to wait a lot longer for requests submitted. Second, iostat mixes together reads and writes in single statistics which specifically for databases and especially on RAID with BBU might be like mixing apples and oranges together – writes should go to writeback cache and be acknowledged essentially instantly while reads only complete when actual data can be delivered. The tool pt-diskstats from Percona Tookit breaks them apart and so can be much more for storage tuning for database workload. Some of the recent operating systems also ship with sysstat/iostat which breaks out await to r_await and w_await which is much more useful. Final note – I used a read-only workload on purpose – when it comes to writes things can be even more complicated – MySQL buffer pool can be more efficient with more intensive writes plus group commit might be able to commit a lot of transactions with single disk write. Still, the same base logic will apply. Summary: The take away is simple – util% only shows if a device has at least one operation to deal with or is completely busy, which does not reflect actual utilization for a majority of modern IO subsystems. So you may have a lot of storage IO capacity left even when utilization is close to 100%.

July 1, 2014

by Peter Zaitsev

· 5,882 Views

The Dangers of SQL JOINs & Aggregate Functions: How to Avoid Fanouts

Every SQL developer uses JOINs and aggregate functions, but some may not be aware that the two interacting can return incorrect results if not handled carefully. Brett Sauve at Looker has covered the issue in this recent post - specifically, he focuses on the issue of "fanouts" - and it's an easy problem to overlook. A fanout occurs when the primary table in a query contains fewer rows than the combined table resulting from a JOIN. Because the new table contains more rows, aggregate functions like COUNT or SUM may include duplicates, throwing off the results. According to Sauve, the answer is careful ordering of JOIN tables: Herein lies a key point: to help avoid fanouts, begin your joins with the most granular table. Sauve also provides some tips to help SQL developers understand how to avoid fanouts and ensure the validity of their data. He points to three types of JOINs: 1-to-1 Many-to-1 1-to-Many Knowing which JOIN you are using can rule out the possibility of a fanout, for example, because it is not possible in a 1-to-1 or Many-to-1 JOIN. If you are using a 1-to-Many JOIN, though, Sauve has some tips there, too. For example, a messy JOIN can be cleaned up by grouping data to create a 1-to-1 relationship. Check out the full article for the rest of Sauve's tips to make sure your aggregate functions are working correctly, and if you're looking for a few more tips to keep your SQL skills sharp, you might try one of these: Yet Another 10 Common Mistakes Java Developers Make When Writing SQL 6 Simple Performance Tips for SQL SELECT Statements

June 30, 2014

by Alec Noller

· 16,637 Views

Option.fold() Considered Unreadable

We had a lengthy discussion recently during code review whether scala.Option.fold() is idiomatic and clever or maybe unreadable and tricky? Let's first describe what the problem is. Option.fold does two things: maps a function f overOption's value (if any) or returns an alternative alt if it's absent. Using simple pattern matching we can implement it as follows: val option: Option[T] = //... def alt: R = //... def f(in: T): R = //... val x: R = option match { case Some(v) => f(v) case None => alt } If you prefer one-liner, fold is actually a combination of map and getOrElse val x: R = option map f getOrElse alt Or, if you are a C programmer that still wants to write in C, but using Scala compiler: val x: R = if (option.isDefined) f(option.get) else alt Interestingly this is similar to how fold() is actually implemented, but that's an implementation detail. OK, all of the above can be replaced with single Option.fold(): val x: R = option.fold(alt)(f) Technically you can even use /: and \: operators (alt /: option) - but that would be simply masochistic. I have three problems with option.fold() idiom. First of all - it's anything but readable. We are folding (reducing) over Option - which doesn't really make much sense. Secondly it reverses the ordinary positive-then-negative-case flow by starting with failure (absence, alt) condition followed by presence block (f function; see also: Refactoring map-getOrElse to fold). Interestingly this method would work great for me if it was named mapOrElse: ** * Hypothetical in Option */ def mapOrElse[B](f: A => B, alt: => B): B = this map f getOrElse alt Actually there is already such method in Scalaz, called OptionW.cata. cata. Here is whatMartin Odersky has to say about it: "I personally find methods like cata that take two closures as arguments are often overdoing it. Do you really gain in readability over map + getOrElse? Think of a newcomer to your code[...]"While cata has some theoretical background, Option.fold just sounds like a random name collision that doesn't bring anything to the table, apart from confusion. I know what you'll say, that TraversableOnce has fold and we are sort-of doing the same thing. Why it's a random collision rather than extending the contract described inTraversableOnce? fold() method in Scala collections typically just delegates to one offoldLeft()/foldRight() (the one that works better for given data structure), thus it doesn't guarantee order and folding function has to be associative. But inOption.fold() the contract is different: folding function takes just one parameter rather than two. If you read my previous article about folds you know that reducing function always takes two parameters: current element and accumulated value (initial value during first iteration). But Option.fold() takes just one parameter: current Option value! This breaks the consistency, especially when realizing Option.foldLeft() andOption.foldRight() have correct contract (but it doesn't mean they are more readable). The only way to understand folding over option is to imagine Option as a sequence with0 or 1 elements. Then it sort of makes sense, right? No. def double(x: Int) = x * 2 Some(21).fold(-1)(double) //OK: 42 None.fold(-1)(double) //OK: -1 but: Some(21).toList.fold(-1)(double) : error: type mismatch; found : Int => Int required: (Int, Int) => Int Some(21).toList.fold(-1)(double) ^ If we treat Option[T] as a List[T], awkward Option.fold() breaks because it has different type than TraversableOnce.fold(). This is my biggest concern. I can't understand why folding wasn't defined in terms of the type system (trait?) and implemented strictly. As an example take a look at: Data.Foldable in Haskell (advanced) Data.Foldable typeclass describes various flavours of folding in Haskell. There are familiar foldl/foldr/foldl1/foldr1, in Scala namedfoldLeft/foldRight/reduceLeft/reduceRight accordingly. They have the same type as Scala and behave unsurprisingly with all types that you can fold over, including Maybe, lists, arrays, etc. There is also a function named fold, but it has a completely different meaning: class Foldable t where fold :: Monoid m => t m -> m While other folds are quite complex, this one barely takes a foldable container of ms (which have to be Monoids) and returns the same Monoid type. A quick recap: a type can be aMonoid if there exists a neutral value of that type and an operation that takes two values and produces just one. Applying that function with one of the arguments being neutral value yields the other argument. String ([Char]) is a good example with empty string being neutral value (mempty) and string concatenation being such operation (mappend). Notice that there are two different ways you can construct monoids for numbers: under addition with neutral value being 0 (x + 0 == 0 + x == x for any x) and under multiplication with neutral 1 (x * 1 == 1 * x == x for any x). Let's stick to strings. If I fold empty list of strings, I'll get an empty string. But when a list contains many elements, they are being concatenated: > fold ([] :: [String]) "" > fold [] :: String "" > fold ["foo", "bar"] "foobar" In the first example we have to explicitly say what is the type of empty list []. Otherwise Haskell compiler can't figure out what is the type of elements in a list, thus which monoid instance to choose. In second example we declare that whatever is returned from fold [], it should be a String. From that the compiler infers that [] actually must have a type of [String]. Last fold is the simplest: the program folds over elements in list and concatenates them because concatenation is the operation defined in Monoid Stringtypeclass instance. Back to options (or more precisely Maybe). Folding over Maybe monad having type parameter being Monoid (I can't believe I just said it) has an interesting interpretation: it either returns value inside Maybe or a default Monoid value: > fold (Just "abc") "abc" > fold Nothing :: String "" Just "abc" is same as Some("abc") in Scala. You can see here that if Maybe Stringis Nothing, neutral String monoid value is returned, that is an empty string. Summary Haskell shows that folding (also over Maybe) can be at least consistent. In ScalaOption.fold is unrelated to List.fold, confusing and unreadable. I advise avoiding it and staying with slightly more verbose map/getOrElse transformations or pattern matching. PS: Did I mention there is also Either.fold() (with even different contract) but noTry.fold()?

June 26, 2014

by Tomasz Nurkiewicz

· 9,622 Views

How to Install Mono on a Raspberry Pi

This post exists to help with an MSDN Magazine article that I am authoring It provides some of the low-level details for the article How to install Mono and root certificates on a raspberry pi How to create an Azure mobile service How to create a Custom API inside Azure mobile services that the raspberry pi can call into How to create an Azure storage account MONO - HOW TO INSTALL ON A RASPBERRY PI Why Mono? How to install Mono on a raspberry pi Installing trusted root certificates on to the raspberry pi http://www.mono-project.com/Main_Page An open source, cross-platform, implementation of C# and the CLR that is binary compatible with Microsoft.NET Mono is a free and open source project led by Xamarin (formerly by Novell) that provides a .NET Framework-compatible set of tools including, among others, a C# compiler and a Common Language Runtime WHY MONO? Because it lets us write .net code compiled on Windows We can simply copy the binary files from Windows to Linux and run it as is From a raspberry pi device, it is possible to use a .net application to take a photo and upload it to Windows Azure storage HOW TO INSTALL ON A RASPBERRY PI RUNNING LINUX You will issue the following commands: pi@raspberrypi ~ $ sudo apt-get update pi@raspberrypi ~ $ sudo apt-get install mono-complete The first command makes sure all the local package index are up to date with the changes made in repositories. Second command installs the complete Mono tooling and runtime. MAKING SURE THAT YOUR MONO APPLICATIONS CAN MAKE A HTTPS REST-BASED CALLS This command downloads the trusted root certificates from the Mozilla LXR web site into the Mono certificate store. Once complete, the Raspberry PI will be capable of making web requests using HTTPS requests within Mono. pi@raspberrypi ~ $ mozroots --import --ask-remove --machine CREATING A NEW AZURE MOBILE SERVICES ACCOUNT The mobile services account is needed to host a Node.js application that provides shared access signatures to raspberry pi devices The shared access signature is needed by the raspberry pi, so that it can directly and securely upload photos to Azure storage STEPS TO CREATE AN AZURE MOBILE SERVICE The steps below will create an Azure mobile service The service will be used to host a Node.js application interacting with a raspberry pi devices We will provision a SQL database, although it will not be used initially FOLLOW THESE STEPS TO CREATE THE MOBILE SERVICE Login into the Azure Portal Select MOBILE SERVICES from the left menu pane at the Azure Portal. In the lower left corner select "+NEW" to create a new Azure Mobile Service. Make sure you've selected, "COMPUTE / MOBILE SERVICE / CREATE." You will now enter a url. We will call this service raspberrymobileservice. For the DATABASE, we will choose "Create a new SQL database instance." The REGION we chose is "West US." The BACKEND is "JavaScript." Click the "->" arrow to proceed to the next screen. In this screen you will "Specify database settings." The NAME of your database will based on the URL you entered previously. In this case, the database is called "raspberrymobileservice_db." You will need to choose a SERVER. We will choose "New SQL database server" from the drop-down list. You will need to provide a SERVER LOGIN NAME and a SERVER LOGIN PASSWORD. Take note of the login you provided as it will be needed later CREATING A CUSTOM API Azure mobile services allows you to create a custom API written in JavaScript that can be called from a raspberry pi device using REST This custom API is really just a Node.js application running in the server CREATING THE API TO RESPOND TO THE DEVICE TRYING TO UPLOAD PHOTOS Now that the service is established, we will turn our attention to creating an API that the device can call into to upload a photo. Login into the Azure Portal Your mobile service will take a few minutes to complete, and you should see the "Ready" flag as the "Status" for your service. Once it is ready you can drill into your service to customize its behavior. Just to the right of the service name, click the right arrow key "->" to drill into the service details. The top menu bar will offer many options, but we are interested in the one titled "API." The API allows you to create a series of node.JS API calls that a device can call into using rest-based approaches. Click on "API." from there, select "CREATE A CUSTOM API." You will be asked to provide an API name. Type in "photos" for the API name. Below you will see a series of drop-down combo boxes that relate to permission. We will keep the default value of "Anybody with the application key." This might not be the best option for all scenarios. You can read more about this here. http://msdn.microsoft.com/en-us/library/azure/jj193161.aspx. Click the checkmark to complete the process. The name of the AP you just created, "Photos," should be visible on the portal interface. To drill into the photos API click on the right arrow key "->". The right arrow key will be just to the right of the name of the API "Photos". At this point you should see a basic script that has been provided by default. We will overwrite this default script with our own script as described in the MSDN Magazine article. CREATING A STORAGE ACCOUNT TO STORE THE PHOTOS Navigate to the portal and create a storage account Create a container for the photos Obtain the: Storage Account Name (you will provide a name) Storage Account Access key (generated for you) Container Name (you will create) CREATING A STORAGE ACCOUNT We will need a storage account so that we can upload photos to it. The steps are well documented here: http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/ In our case we call the storage account raspberrystorage. This means that the URL that the device will use to upload photos is https://raspberrystorage.blob.core.windows.net/. As you complete these steps make sure that you choose the storage account location to be the same location as was used for your mobile services account. This avoids any unnecessary latency or bandwidth costs between data centers. Once the storage account is created, we will need to create a container within it. Photos or any blob for that matter, are always stored within a container. To create a container drill into your newly created storage account and select CONTAINERS from the top menu. From there, select CREATE A CONTAINER. The new container dialog box will ask for a name for your container. Take note of the name you provide. We are calling our container ?photocontainer.? When the raspberry pi device uploads photos to the storage account, it will target a specific container, such as the one we just created. You will next be asked to indicate ACCESS rights. To keep things simple we will select access rights of Public Blob. ENTERING APP SETTINGS Rather than hard-code storage account information inside your JavaScript/Node.js applications, you should consider using apps settings inside of the Azure mobile services portal This post also discusses it well: http://blogs.msdn.com/b/carlosfigueira/archive/2013/12/09/application-settings-in-azure-mobile-services.aspx ?The idea of application settings is a set of key-value pairs which can be set for the mobile service (either via the portal or via the command-line interface), and those values could be then read in the service runtime.? NAVIGATING TO APP SETTINGS Navigate to the Azure Mobile Services section of the portal. Drill into the specific service by hitting the arrow below Select from the Configure Menu at the top Scroll down to the very bottom to see app settings Note that we need to enter: - We need to get this from Azure Storage - PhotoContainerName - AccountName - AccountKey We get this information from the Azure Storage Section of the Portal. Note that you need to have provisioned a Storage Account to have this information. How to get the AccountKey with Azure Storage Services Now you can get the access keys HOW NODE.JS WILL ACCESS THE APP SETTINGS You will create a Node.js application inside of Azure Mobile Services See previous steps THE NODE.JS APPLICATION READING APP SETTINGS You will starting by going back to Azure Mobile Services and drill down into your newly minted service We called ours raspberrymobileservice Once you click API, you should see: Notice the app settings are being read on lines 12 to 14.

June 19, 2014

by Bruno Terkaly

· 16,799 Views

Synchronizing Data Across Mule Applications with MongoDB

One of the many things that Mule does extremely well is transparently store state between messages. Take the idempotent-message-filter for example; The idempotent-message-filter keeps track of what messages have been processed previously to ensure no duplicate messages are processed. This is all achieved with one single xml element. Or take OAuth enabled cloud connectors that need to keep track of access tokens so you're not redirected to some authorize page each time. All this is handled behind the scenes without you having to worry about it. Mule’s ability to store this state is implemented via object-stores. Object-stores provide a simple generic way to store data in Mule. Many message processors such as the idempotent-meesage-filter, OAuth message processors, until-successful routers and many more all transparently use object-stores behind the scenes to store state between messages. By default, Mule uses in-memory object-stores behind the scenes. Things get more interesting however, when your Mule application is distributed across multiple Mule nodes. At some point you are going to need to synchronise object-stores across multiple applications. If you are fortunate to be using the Enterprise Edition of Mule you can take advantage of its out of the box clustering support for synchronizing state across an entire cluster. But even then as mentioned in this blog post by John Demic: Synchronizing Mule Applications Across Data Centers with Apache Cassandra, this may not be ideal if your Mule nodes are geographically distributed across multiple data centres. In the aforementioned post, John gives a great example of how to use Apache Cassandra as a shared object-store. But many other traditonal and NoSql data stores can be used as well. In this blog post I'm going to show a simple example of how you can achieve the same using MongoDB. If it's good enough for the Hipster Hacker then it's good enough for me! OMG! Does this officially make @MuleSoft Mule part of the Hipster Stack? pic.twitter.com/OwM4dzhGvi — Ryan Carter (@rydcarter) October 9, 2013 Configuring the MongoDB object-store MongoDB is an open-source document database, and the leading NoSQL database. The MongoDB module can be used to read/write/query and perform a whole load of other operation a MongoDB database as part of your Mule flows. It also exposes an additional submodule for using your MondogDB database as an object-store implementation in Mule. Here is a sample configuration from one of the tests: As you can see, it's classed as it's own module with it's own namespace and is configured simply using the mos:config name="mos" database="mongo-connector-test" element. However, this doesn't work for the current version of the connector. Due to the following bug. You can simply apply the fix in the aforementioned pull request, or you can instantiate the object store yourself using Spring - similar to that in the Apache Cassandra configuration. The object-store class can be found here: MongoObjectStore.java. As you can see, it has various annotations for default values and initialising the class. However, as we are not using this as a fully fledged Mule Devkit module, those annotations have no effect. So we have to pass in the default values ourselves and call the initialise method when instantiating our object-store. This is quite simple with Spring. Here is a full working example using the Spring instantiated object-store as the store for the idempotent-message-filter: If you run this configuration, Mule will poll every 10 seconds with the exact same message: "1". The first time it polls you should see "Passed Filter" in the log output. Subsequent polls should not log anything as the idempotent message filter will filter them out. This is usually performed with the default in-memory object-store. To prove this is persisted you should be able to check the value in the Mongo database collection: db.getCollection('mule.objectstore._default').find(). Also you can stop your Mule app and restart it to prove that it works between restarts and failovers etc.

June 17, 2014

by Ryan Carter

· 10,083 Views