DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
How to configure Swagger to generate Restful API Doc for your Spring Boot Web Application
Learn How to Enable Swagger in your Spring Boot Web Application
August 26, 2014
by Saurabh Chhajed
· 128,616 Views · 3 Likes
article thumbnail
JPA Hibernate Alternatives: When JPA & Hibernate Aren't Right for Your Project
Hello, how are you? Today we will talk about situations in which the use of the JPA/Hibernate is not recommended. Which alternatives do we have outside the JPA world? What we will talk about: JPA/Hibernate problems Solutions to some of the JPA/Hibernate problems Criteria for choosing the frameworks described here Spring JDBC Template MyBatis Sormula sql2o Take a look at: jOOQ and Avaje Is a raw JDBC approach worth it? How can I choose the right framework? Final thoughts I have created 4 CRUDs in my github using the frameworks mentioned in this post, you will find the URL at the beginning of each page. I am not a radical that thinks that JPA is worthless, but I do believe that we need to choose the right framework for the each situation. If you do not know I wrote a JPA book (in Portuguese only) and I do not think that JPA is the silver bullet that will solve all the problems. I hope you like the post. [= JPA/Hibernate Problems There are times that JPA can do more harm than good. Below you will see the JPA/Hibernate problems and in the next page you will see some solutions to these problems: Composite Key: This, in my opinion, is the biggest headache of the JPA developers. When we map a composite key we are adding a huge complexity to the project when we need to persist or find a object in the database. When you use composite key several problems will might happen, and some of these problems could be implementation bugs. Legacy Database: A project that has a lot of business rules in the database can be a problem when wee need to invoke StoredProcedures or Functions. Artifact size: The artifact size will increase a lot if you are using the Hibernate implementation. The Hibernate uses a lot of dependencies that will increase the size of the generated jar/war/ear. The artifact size can be a problem if the developer needs to do a deploy in several remote servers with a low Internet band (or a slow upload). Imagine a project that in each new release it is necessary to update 10 customers servers across the country. Problems with slow upload, corrupted file and loss of Internet can happen making the dev/ops team to lose more time. Generated SQL: One of the JPA advantages is the database portability, but to use this portability advantage you need to use the JPQL/HQL language. This advantage can became a disadvantage when the generated query has a poor performance and it does not use the table index that was created to optimize the queries. Complex Query: That are projects that has several queries with a high level of complexity using database resources like: SUM, MAX, MIN, COUNT, HAVING, etc. If you combine those resources the JPA performance might drop and not use the table indexes, or you will not be able to use a specific database resource that could solve this problem. Framework complexity: To create a CRUD with JPA is very simples, but problems will appear when we start to use entities relationships, inheritance, cache, PersistenceUnit manipulation, PersistenceContext with several entities, etc. A development team without a developer with a good JPA experience will lose a lot of time with JPA ‘rules‘. Slow processing and a lot of RAM memory occupied: There are moments that JPA will lose performance at report processing, inserting a lot of entities or problems with a transaction that is opened for a long time. After reading all the problems above you might be thinking: “Is JPA good in doing anything?”. JPA has a lot of advantages that will not be detailed here because this is not the post theme, JPA is a tool that is indicated for a lot of situations. Some of the JPA advantages are: database portability, save a lot of the development time, make easier to create queries, cache optimization, a huge community support, etc. In the next page we will see some solutions for the problems detailed above, the solutions could help you to avoid a huge persistence framework refactoring. We will see some tips to fix or to workaround the problems described here. Solutions to Some of the JPA/Hibernate problems We need to be careful if we are thinking about removing the JPA of our projects. I am not of the developer type that thinks that we should remove a entire framework before trying to find a solution to the problems. Some times it is better to choose a less intrusive approach. Composite Key Unfortunately there is not a good solution to this problem. If possible, avoid the creation of tables with composite key if it is not required by the business rules. I have seen developers using composite keys when a simple key could be applied, the composite key complexity was added to the project unnecessarily. Legacy Databases The newest JPA version (2.1) has support to StoredProcedures and Functions, with this new resource will be easier to communicate with the database. If a JPA version upgrade is not possible I think that JPA is not the best solution to you. You could use some of the vendor resources, e.g. Hibernate, but you will lose database and implementations portability. Artifact Size An easy solution to this problem would be to change the JPA implementation. Instead of using the Hibernate implementation you could use the Eclipsellink, OpenJPA or the Batoo. A problem might appear if the project is using Hibernate annotation/resources; the implementation change will require some code refactoring. Generated SQL and Complexes Query The solution to these problems would be a resource named NativeQuery. With this resource you could have a simplified query or optimized SQL, but you will sacrifice the database portability. You could put your queries in a file, something like SEARCH_STUDENTS_ORACLE or SEARCH_STUDENTS_MYSQL, and in production environment the correct file would be accessed. The problem of this approach is that the same query must be written for every database. If we need to edit the SEARCH_STUDENTS query, it would be required to edit the oracle and mysql files. If your project is has only one database vendor the NativeQuery resource will not be a problem. The advantage of this hybrid approach (JPQL and NativeQuery in the same project) is the possibility of using the others JPA advantages. Slow Processing and Huge Memory Size This problem can be solved with optimized queries (with NativeQuery), query pagination and small transactions. Avoid using EJB with PersistenceContext Extended, this kind of context will consume more memory and processing of the server. There is also the possibility of getting an entity from database as a “read only” entity, e.g.: entity that will only be used in a report. To recover an entity in a “read only” state is not needed to open a transaction, take a look at the code below: String query = "select uai from Student uai"; EntityManager entityManager = entityManagerFactory.createEntityManager(); TypedQuery typedQuery = entityManager.createQuery(query, Student.class); List resultList = typedQuery.getResultList(); Notice that in the code above there is no opened transaction, all the returned entities will be detached (non monitored by the JPA). If you are using EJB mark your transaction as NOT_SUPPORTED or you could use @Transactional(readOnly=true). Complexity I would say that there is only one solution to this problem: to study. It will be necessary to read books, blogs, magazines or any other trustful source of JPA material. More study is equals to less doubts in JPA. I am not a developer that believes that JPA it is the only and the best solution to every problem, but there are moments that JPA is not the best to tool to use. You must be careful when deciding about a persistence framework change, usually a lot of classes are affected and a huge refactoring is needed. Several bugs may be caused by this refactoring. It is needed to talk with the project mangers about this refactoring and list all the positive and negative effects. In the next four pages we will see 4 persistence frameworks that can be used in our projects, but before we see the frameworks I will show how that I choose each framework. Criteria for Choosing the frameworks Described Here Maybe you will think: “why the framework X is not here?”. Below I will list the criteria applied for choosing the framework displayed here: Found in more than one source of research: we can find in forums people talking about a framework, but it is harder to find the same framework appearing in more than one forum. The most quoted frameworks were chosen. Quoted by different sources: Some frameworks that we found in the forums are indicated only by its committers. Some forums does not allow “self merchandise”, but some frameworks owners still doing it. Last update 01/05/2013: I have searched for frameworks that have been updated in this past year. Quick Hello World: Some frameworks I could not do a Hello World with less than 15~20min, and with some errors. To the tutorials found in this post I have worked 7 minutes in each framework: starting counting in its download until the first database insert. The frameworks that will be displayed in here has good methods and are easy to use. To make a real CRUD scenario we have a persistence model like below: A attribute with a name different of the column name: socialSecurityNumber —-> social_security_number A date attribute a ENUM attribute With this characteristics in a class we will see some problems and how the framework solve it. Spring JDBC Template One of the most famous frameworks that we can find to access the database data is the Spring JDBC Template. The code of this project can be found in here: https://github.com/uaihebert/SpringJdbcTemplateCrud The Sprint JDBC Template uses natives queries like below: As it is possible to see in the image above the query has a database syntax (I will be using MySQL). When we use a native SQL query it is possible to use all the database resources in an easy way. We need an instance of the object JDBC Template (used to execute the queries), and to create the JDBC Template object we need to set up a datasource: We can get the datasource now (thanks to the Spring injection) and create our JDBCTemplate: PS.: All the XML code above and the JDBCTemplate instantiation could be replace by Spring injection and with a code bootstrap, just do a little research about the Spring features. One thing that I did not liked is the INSERT statement with ID recover, it is very verbose: With the KeyHolder class we can recover the generated ID in the database, unfortunately we need a huge code to do it. The other CRUD functions are easier to use, like below: Notice that to execute a SQL query it is very simple and results in a populated object, thanks to the RowMapper. The RowMapper is the engine that the JDBC Template uses to make easier to populate a class with data from the database. Take a look at the RowMapper code below: The best news about the RowMapper is that it can be used in any query of the project. The developer that is responsible to write the logic that will populate the class data. To finish this page, take a look below in the database DELETE and the database UPDATE statement: About the Spring JDBC Template we can say: Has a good support: Any search in the Internet will result in several pages with tips and bug fixes. A lot of companies use it: several projects across the world use it Be careful with different databases for the same project: The native SQL can became a problem with your project run with different databases. Several queries will need to be rewritten to adapt all the project databases. Framework Knowledge: It is good to know the Spring basics, how it can be configured and used. To those that does not know the Spring has several modules and in your project it is possible to use only the JDBC Template module. You could keep all the other modules/frameworks of your project and add only the necessary to run the JDBC Template. MyBatis MyBatis (created with the name iBatis) is a very good framework that is used by a lot of developers. Has a lot of functionalities, but we will only see a few in this post. The code of this page can be found in here: https://github.com/uaihebert/MyBatisCrud To run your project with MyBatis you will need to instantiate a Session Factory. It is very easy and the documentation says that this factory can be static: When you run a project with MyBatis you just need to instantiate the Factory one time, that is why it is in a static code. The configuration XML (mybatis.xml) it is very simple and its code can be found below: The Mapper (an attribute inside the XML above) will hold information about the project queries and how to translate the database result into Java objects. It is possible to create a Mapper in XML or Interface. Let us see below the Mapper found in the file crud_query.xml: Notice that the file is easy to understand. The first configuration found is a ResultMap that indicates the query result type, and a result class was configured “uai.model.Customer”. In the class we have a attribute with a different name of the database table column, so we need to add a configuration to the ResultMap. All queries need a ID that will be used by MyBatis session. In the beginning of the file it is possible to see a namespace declared that works as a Java package, this package will wrap all the queries and the ResultMaps found in the XML file. We could also use a Interface+Annotation instead of the XML. The Mapper found in the crud_query.xml file could be translated in to a Interface like: Only the Read methods were written in the Interface to make the code smaller, but all the CRUD methods could be written in the Interface. Let us see first how to execute a query found in the XML file: The parsing of the object is automatically and the method is easy to read. To run the query all that is needed is to use the combination “namespace + query id” that we saw in the crud_query.xml code above. If the developer wants to use the Interface approach he could do like below: With the interface query mode we have a clean code and the developer will not need to instantiate the Interface, the session class of the MyBatis will do the work. If you want to update, delete or insert a record in the database the code is very easy: About MyBatis we could say: Excellent Documentation: Every time that I had a doubt I could answer it just by reading its site documentation Flexibility: Allowing XML or Interfaces+Annotations the framework gives a huge flexibility to the developer. Notice that if you choose the Interface approach the database portability will be harder, it is easier to choose which XML to send with the deploy artifact rather than an interface Integration: Has integration with Guice and Spring Dynamic Query: Allows to create queries in Runtime, like the JPA criteria. It is possible to add “IFs” to a query to decide which attribute will be used in the query Transaction: If your project is not using Guice of Spring you will need to manually control the transaction Sormula Sormula is a ORM OpenSource framework, very similar to the JPA/Hibernate. The code of the project in this page can be found in here: https://github.com/uaihebert/SormulaCrud Sormula has a class named Database that works like the JPA EntityManagerFactory, the Database class will be like a bridge between the database and your model classes. To execute the SQL actions we will use the Table class that works like the JPA EntityManager, but the Table class is typed. To run Sormula in a code you will need to create a Database instance: To create a Database instance all that we need is a Java Connection. To read data from the database is very easy, like below: You only need to create a Database instance and a Table instance to execute all kind of SQL actions. How can we map a class attribute name different from the database table column name? Take a look below: We can use annotations to do the database mapping in our classes, very close to the JPA style. To update, delete or create data in the database you can do like below: About Sormula we can say that: Has a good documentation Easy to set up It is not found in the maven repository, it will make harder to attach the source code if needed Has a lot of checked exceptions, you will need to do a try/catch for the invoked actions sql2o This framework works with native SQL and makes easier to transform database data into Java objects. The code of the project in this page can be found in here: https://github.com/uaihebert/sql2oCrud sql2o has a Connection class that is very easy to create: Notice that we have a static Sql2o object that will work like a Connection factory. To read the database data we would do something like: Notice that we have a Native SQL written, but we have named parameters. We are not using positional parameters like ‘?1′ but we gave a name to the parameter like ‘:id’. We can say that named parameters has the advantage that we will not get lost in a query with several parameters; when we forget to pass some parameter the error message will tell us the parameter name that is missing. We can inform in the query the name of the column with a different name, there is no need to create a Mapper/RowMapper. With the return type defined in the query we will not need to instantiate manually the object, sql2o will do it for us. If you want to update, delete or insert data in the database you can do like below: It is a “very easy to use” framework. About the sql2o we can say that: Easy to handle scalar query: the returned values of SUM, COUNT functions are easy to handle Named parameters in query: Will make easy to handle SQL with a lot of parameters Binding functions: bind is a function that will automatically populate the database query parameters through a given object, unfortunately it did not work in this project for a problem with the enum. I did not investigate the problem, but I think that it is something easy to handle Take a look at: jOOQ and Avaje jOOQ jOOQ it is a framework indicated by a lot of people, the users of this frameworks praise it in a lot of sites/forums. Unfortunately the jOOQ did not work in my PC because my database was too old, and I could not download other database when writing this post (I was in an airplane). I noticed that to use the jOOQ you will need to generated several jOOQ classes based in your model. jOOQ has a good documentation in the site and it details how to generate those classes. jOOQ is free to those that uses a free database like: MySQL, Postgre, etc. The paid jOOQ version is needed to those that uses paid databases like: Oracle, SQL Server, etc. www.jooq.org/ Avaje Is a framework quoted in several blogs/forums. It works with the ORM concept and it is easy to execute database CRUD actions. Problems that I found: Not well detailed documentation: its Hello World is not very detailed Configurations: it has a required properties configuration file with a lot of configurations, really boring to those that just want to do a Hello World A Enhancer is needed: enhancement is a method do optimize the class bytecode, but is hard to setup in the beginning and is mandatory to do before the Hello World www.avaje.org Is a Raw JDBC Approach Worth It? The advantages of JDBC are: Best performance: We will not have any framework between the persistence layer and the database. We can get the best performance with a raw JDBC Control over the SQL: The written SQL is the SQL that will be executed in the database, no framework will edit/update/generate the query SQL Native Resource: We could access all natives database resources without a problem, e.g.: functions, stored procedures, hints, etc The disadvantages are: Verbose Code: After receiving the database query result we need to instantiate and populate the object manually, invoking all the required “set” methods. This code will get worse if we have classes relationships like one-to-many. It will be very easy to find a while inside another while. Fragile Code: If a database table column changes its name it will be necessary to edit all the project queries that uses this column. Some project uses constants with the column name to help with this task, e.g. Customer.NAME_COLUMN, with this approach the table column name update would be easier. If a column is removed from the database all the project queries would be updated, even if you have a column constants. Complex Portability: If your project uses more than one database it would be necessary to have almost all queries written for each vendor. For any update in any query it would be necessary to update every vendor query, this could take a lot the time from the developers. I can see only one factor that would make me choose a raw JDBC approach almost instantly: Performance: If your project need to process thousands of transactions per minutes, need to be scalable and with a low memory usage this is the best choice. Usually median/huge projects has all this high performance requirements. It is also possible to have a hybrid solution to the projects; most of the project repository (DAO) will use a framework, and just a small part of it will use JDBC I do like JDBC a lot, I have worked and I still working with it. I just ask you to not think that JDBC is the silver bullet for every problem. If you know any other advantage/disadvantage that is not listed here, just tell me and I will add here with the credits going to you. [= How Can I Choose the Right Framework? We must be careful if you want to change JPA for other project or if you are just looking for other persistence framework. If the solutions in page 3 are not solving your problems the best solution is to change the persistence framework. What should you considerate before changing the persistence framework? Documentation: is the framework well documented? Is easy to understand how it works and can it answer most of your doubts? Community: has the framework an active community of users? Has a forum? Maintenance/Fix Bugs: Is the framework receiving commits to fix bugs or receiving new features? There are fix releases being created? With which frequency? How hard is to find a developer that knows about this framework? I believe that this is the most important issue to be considered. You could add to your project the best framework in the world but without developers that know how to operate it the framework will be useless. If you need to hire a senior developer how hard would be to find one? If you urgently need to hire someone that knows that unknown framework maybe this could be very difficult. Final Thoughts I will say it again: I do not think that JPA could/should be applied to every situation in every project in the world; I do no think that that JPA is useless just because it has disadvantages just like any other framework. I do not want you to be offended if your framework was not listed here, maybe the research words that I used to find persistence frameworks did not lead me to your framework. I hope that this post might help you. If your have any double/question just post it. [= See you soon! \o_
August 25, 2014
by Hebert Coelho De Oliveira
· 152,476 Views · 10 Likes
article thumbnail
Neo4j: LOAD CSV - Handling Empty Columns
A common problem that people encounter when trying to import CSV files into Neo4j using Cypher’s LOAD CSV command is how to handle empty or ‘null’ entries in said files. For example let’s try and import the following file which has 3 columns, 1 populated, 2 empty: $ cat /tmp/foo.csv a,b,c mark,, load csv with headers from "file:/tmp/foo.csv" as row MERGE (p:Person {a: row.a}) SET p.b = row.b, p.c = row.c RETURN p When we execute that query we’ll see that our Person node has properties ‘b’ and ‘c’ with no value: ==> +-----------------------------+ ==> | p | ==> +-----------------------------+ ==> | Node[5]{a:"mark",b:"",c:""} | ==> +-----------------------------+ ==> 1 row ==> Nodes created: 1 ==> Properties set: 3 ==> Labels added: 1 ==> 26 ms That isn’t what we want – we don’t want those properties to be set unless they have a value. TO achieve this we need to introduce a conditional when setting the ‘b’ and ‘c’ properties. We’ll assume that ‘a’ is always present as that’s the key for our Person nodes. The following query will do what we want: load csv with headers from "file:/tmp/foo.csv" as row MERGE (p:Person {a: row.a}) FOREACH(ignoreMe IN CASE WHEN trim(row.b) <> "" THEN [1] ELSE [] END | SET p.b = row.b) FOREACH(ignoreMe IN CASE WHEN trim(row.c) <> "" THEN [1] ELSE [] END | SET p.c = row.c) RETURN p Since there’s no if or else statements in cypher we create our own conditional statement by using FOREACH. If there’s a value in the CSV column then we’ll loop once and set the property and if not we won’t loop at all and therefore no property will be set. ==> +-------------------+ ==> | p | ==> +-------------------+ ==> | Node[4]{a:"mark"} | ==> +-------------------+ ==> 1 row ==> Nodes created: 1 ==> Properties set: 1 ==> Labels added: 1
August 25, 2014
by Mark Needham
· 10,131 Views · 1 Like
article thumbnail
Getting Started with MongoDB and Java
We’ve been missing an introduction to using MongoDB from Java for a little while now - there’s plenty of information in the documentation, but we were lacking a step-by-step guide to getting started as a Java developer. I sought to rectify this with a couple of blog posts for the MongoDB official blog: the first, an introduction to using MongoDB from Java, including a non-comprehensive list of some of the libraries you can use; the second, an introductory guide to simple CRUD operations using the Java driver: Getting Started with MongoDB and Java, Part 1 Getting Started with MongoDB and Java, Part 2 This is very much aimed at Java/JVM developers who are new to MongoDB, and want to get a feel for how you use it. These guides are for the current (2.x) driver. When we release 3.x, we’ll release updated guides as well.
August 23, 2014
by Trisha Gee
· 9,404 Views
article thumbnail
A Puppet Automation + MySQL Tutorial: Wordpress Install in 7 Short Steps
[This article was written by Koby Nachmany.] If you are familiar with configuration management (aka CM) and automation, you probably know a thing or two about Puppet, and the amazing and rich collection of modules it offers. Puppet Forge contains a wealth of third party modules that enable us to do some pretty nifty stuff with almost no effort. Puppet helps deal with the messy parts of CM, like installing binaries and running installation scripts that are tedious to do manually. Tools such as Puppet were originally created for IT operations people, that are for the most part infrastructure-centric, and are best suited for setup and maintenance of hosts in a physical data center. Dealing with applications and certainly managing applications on an elastic virtualized or even cloudified environment, brings a set of new challenges despite the agility and other benefits it provides. Now imagine we can have this goodness coupled with an intelligent orchestration framework for an entire deployment? In this blog post I'd like to demonstrate how a cloud application orchestrator can complement already existing automation processes powered by configuration management tools, in this case we will demonstrate with Puppet. I will use the nodecellar application and the popular WordPress content management framework as examples. This will hopefully provide a good introduction to Cloudify blueprints. Overview So we've seen how Cloudify 3 allows us to easily orchestrate the "nodecellar" application Read about it Cloudify blueprints here. With the "nodecellar" example, Cloudify deploys a complex application using workflows that map deployment lifecycle events to bash scripts using Cloudify's bash runner plugin. Cloudify's Puppet integration now makes this pretty easy. Cloudify 3.0 - Taking Puppet to the Next Level of Orchestration. Check it out. Go The synergy between Cloudify and Puppet not only allows you to enjoy the benefits of your Puppet environment, but it also amplifies its usability by introducing unique advantages that will answer the following common challenges involved with configuration management tools: Agent Installation: Provision your service VMs, install a Puppet agent (if you like) and wires them up with the Puppet Master. Or, if you choose to run standalone, you can install the agent with the appropriate manifests needed for that service, as well. Order of Dependencies: Define the dependencies between application stacks, services and infrastructure resources. Which will then be launched based on that order. Remote Execution and Updates: Other than the basic install/uninstall, Cloudify enables customized application workflows that allow you to execute tools like remote shell scripts on a group of instances that belong to a particular service, or to a specific instance in a group. This feature is useful to run maintenance operations, such as snapshots in the case of a database, or code pushes in a continuous deployment model. In addition, you can run puppet apply whenever you feel it's right for your service. Post Deployment: Once your application is up, Cloudify will be able to glue your monitoring tool of choice, or you can choose to use the built-in one. A robust policy engine, enables auto-healing and even auto-scaling according to your service's required SLA. I'm now going to take a deep dive on my experience with a WordPress example that I feel is a very good representation of how Puppet and Cloudify work in sync. Let's say we want to deploy the popular WordPress application stack on two VMs . Something as follows: The flow is quite simple: -server 3.5.1 with the basic following modules installed: |-- hunner-wordpress (v0.6.0) |-- puppetlabs-apache (v1.0.1) - with php mods enabled |-- puppetlabs-mysql (v2.1.0) Your site.pp file should resemble something like this: node /^apache_web.*/ { include apache class { 'wordpress': create_db => false, create_db_user => false, } } node /^mysql.*/ { class { '::mysql::server': root_password => 'password', override_options => { 'mysqld' => { 'bind_address' => '0.0.0.0' } } } include mysql::client include wordpress } As we can see, we have an Apache PHP application that will likely require a database connection string (IP, port, user and password). This is where Cloudify facilitates the "gluing" of all the pieces together, by allowing us to inject dynamic/static custom facts to the dependent node (Apache server). Cloudify supports both standalone agents and PuppetMaster environments. Step 2: Tweaking the Original WordPress Module. Some minor adaptations to the wordpress init class of the WordPress module will allow us to embed these facts during Puppet agent invocation. Below is a code snippet (With defaults truncated): class wordpress ( $db_host_ip = $cloudify_related_host_ip, $db_user, = $cloudify_properties_db_user, $db_password = $cloudify_properties_db_pw, . . ) And some tweaking to the templates/wp-config.php.erb: /** MySQL hostname */ define('DB_HOST', ''); Let's add some tags for finer control of manifest execution: The MySQL node will not require the application part to run on it, so I've excluded it using a Puppet "tag" (read more about Puppet tags). Cloudify, of course, supports this and will provide the appropriate tags during agent invocation. -> class { 'wordpress::app': tag => ['postconfigure'], install_dir => $install_dir, install_url => $install_url, version => $version, db_name => $db_name, . .} Step 3: Creating the Blueprint In a similar way to the "nodecellar" blueprint, first lets create a folder with the name of "wp_puppet" and create a blueprint.yaml file within it. This file will then serve as the blueprint file. Now let's declare the name of this blueprint. blueprint: name: wp_puppet nodes: Now we can start creating the topology. Step 4: Creating VM Nodes Since, in this case I use the OpenStack provider to create the nodes, let's import the "OpenStack types" plugin. imports: - http://www.getcloudify.org/spec/openstack-plugin/1.0/plugin.yaml Since the VMs are the same, I declared a generic template for a VM host: vm_host: derived_from: cloudify.openstack.server properties: - install_agent: true - worker_config: user: ubuntu port: 22 # example for ssh key file (see `key_name` below) # this file matches the agent key configured during the bootstrap key: ~/.ssh/agent.key # Uncomment and update `management_network_name` when working a n neutron enabled openstack - management_network_name: cfy-mng-network - server: image: 8c096c29-a666-4b82-99c4-c77dc70cfb40 flavor: 102 key_name: cfy-agnt-kp security_groups: ['cfy-agent-default', 'wp_security_group'] # This is how we inject the puppet server's ip userdata: | #!/bin/bash -ex grep -q puppet /etc/hosts || echo "x.x.x.x puppet" | sudo -A tee -a /etc/hosts Create the MySQL and Apache VMs: - name: mysql_db_vm type: vm_host instances: deploy: 1 - name: apache_web_vm type: vm_host instances: deploy: 1 Step 5: Declaring Apache and MySQL Servers Since we are using the Puppet plugin to create those servers, first we have to import it: plugins: puppet_plugin: derived_from: cloudify.plugins.agent_plugin properties: url: https://github.com/cloudify-cosmo/cloudify-puppet-plugin/archive/nightly.zip The plugin defines server types as follows: middleware_server, app_server, db_server, web_server, message_bus_server, app_module. They are virtually the same, but serve the purpose of enabling better readability for the user and GUI visualization A Puppet server type is derived_from: cloudify.types.server type, but includes some puppet-specific properties and lifecycle events. For documentation see: Puppet Types So we now will go ahead and declare the server types: cloudify.types.puppet.web_server: derived_from: cloudify.types.web_server properties: # All Puppet related configuration goes inside # the "puppet_config" property. - puppet_config interfaces: cloudify.interfaces.lifecycle: # Specifically "start" operation. Otherwise tags must be # provided. - start: puppet_plugin.operations.operation cloudify.types.puppet.app_module: derived_from: cloudify.types.app_module properties: - puppet_config interfaces: cloudify.interfaces.lifecycle: - configure: puppet_plugin.operations.operation cloudify.types.puppet.db_server: derived_from: cloudify.types.db_server properties: - puppet_config interfaces: cloudify.interfaces.lifecycle: - start: puppet_plugin.operations.operation Step 6: Instantiating the Apache and MySQL nodes: Here we provide the Puppet configuration and tags and define the relationships between the nodes. Cloudify's agent will use those relationships in order to decide the appropriate facts to inject. - name: apache_web_server type: cloudify.types.puppet.web_server properties: port: 8080 puppet_config: server: puppet environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: apache_web_vm - name: wordpress_app type: cloudify.types.puppet.app_module properties: db_user: wordpress db_pass: passwd puppet_config: server: puppet tags: ['postconfigure'] environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: apache_web_server - type: wp_connected_to_mysql target: mysql_db_server - name: mysql_db_server type: cloudify.types.puppet.db_server properties: db_user: wordpress db_pass: passwd puppet_config: server: puppet environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: mysql_db_vm Step 7: Upload the Blueprint and Create the Deployment (via CLI or GUI) Then execute your deployment (via CLI or GUI). ubuntu@koby-n-cfy3-cli:~/cosmo_cli$ cfy blueprints upload -b wp9 wordpress/blueprint.yaml ubuntu@koby-n-cfy3-cli:~/cosmo_cli$ cfy deployments create -b wp9 -d WordPress_Deployment_1 Step 8: Take a Quick Coffee Break. Step 9: Enjoy your Orchestrated WordPress Stack!
August 21, 2014
by Sharone Zitzman
· 9,151 Views
article thumbnail
Lambda Architecture Principles
"Lambda Architecture" (introduced by Nathan Marz) has gained a lot of traction recently. Fundamentally, it is a set of design patterns of dealing with Batch and Real time data processing workflow that fuel many organization's business operations. Although I don't realize any novice ideas has been introduced, it is the first time these principles are being outlined in such a clear and unambiguous manner. In this post, I'd like to summarize the key principles of the Lambda architecture, focus more in the underlying design principles and less in the choice of implementation technologies, which I may have a different favors from Nathan. One important distinction of Lambda architecture is that it has a clear separation between the batch processing pipeline (ie: Batch Layer) and the real-time processing pipeline (ie: Real-time Layer). Such separation provides a means to localize and isolate complexity for handling data update. To handle real-time query, Lambda architecture provide a mechanism (ie: Serving Layer) to merge/combine data from the Batch Layer and Real-time Layer and return the latest information to the user. Data Source Entry At the very beginning, data flows in Lambda architecture as follows ... Transaction data starts streaming in from OLTP system during business operations. Transaction data ingestion can be materialized in the form of records in OLTP systems, or text lines in App log files, or incoming API calls, or an event queue (e.g. Kafka) This transaction data stream is replicated and fed into both the Batch Layer and Realtime Layer Here is an overall architecture diagram for Lambda. Batch Layer For storing the ground truth, "Master dataset" is the most fundamental DB that captures all basic event happens. It stores data in the most "raw" form (and hence the finest granularity) that can be used to compute any perspective at any given point in time. As long as we can maintain the correctness of master dataset, every perspective of data view derived from it will be automatically correct. Given maintaining the correctness of master dataset is crucial, to avoid the complexity of maintenance, master dataset is "immutable". Specifically data can only be appended while update and delete are disallowed. By disallowing changes of existing data, it avoids the complexity of handling the conflicting concurrent update completely. Here is a conceptual schema of how the master dataset can be structured. The center green table represents the old, traditional-way of storing data in RDBMS. The surrounding blue tables illustrates the schema of how the master dataset can be structured, with some key highlights Data are partitioned by columns and stored in different tables. Columns that are closely related can be stored in the same table NULL values are not stored Each data record is associated with a time stamp since then the record is valid Notice that every piece of data is tagged with a time stamp at which the data is changed (or more precisely, a change record that represents the data modification is created). The latest state of an object can be retrieved by extracting the version of the object with the largest time stamp. Although master dataset stores data in the finest granularity and therefore can be used to compute result of any query, it usually take a long time to perform such computation if the processing starts with such raw form. To speed up the query processing, various data at intermediate form (called Batch View) that aligns closer to the query will be generated in a periodic manner. These batch views (instead of the original master dataset) will be used to serve the real-time query processing. To generate these batch views, the "Batch Layer" use a massively parallel, brute force approach to process the original master dataset. Notice that since data in master data set is timestamped, the data candidate can be identified simply from those that has the time stamp later than the last round of batch processing. Although less efficient, Lambda architecture advocates that at each round of batch view generation, the previous batch view should just be simply discarded and the new batch view is computed from master dataset. This simple-mind, compute-from-scratch approach has some good properties in stopping error propagation (since error cannot be accumulated), but the processing may not be optimized and may take a longer time to finish. This can increase the "staleness" of the batch view. Real time Layer As discussed above, generating the batch view requires scanning a large volume of master dataset that takes few hours. The batch view will therefore be stale for at least the processing time duration (ie: between the start and end of the Batch processing). But the maximum staleness can be up to the time period between the end of this Batch processing and the end of next Batch processing (ie: the batch cycle). The following diagram illustrate this staleness. Even the batch view is stale period, business operates as usual and transaction data will be streamed in continuously. To answer user's query with the latest, up-to-date information. The business transaction records need to be captured and merged into the real-time view. This is the responsibility of the Real-time Layer. To reduce the latency of latest information availability close to zero, the merge mechanism has to be done in an incremental manner such that no batching delaying the processing will be introduced. This requires the real time view update to be very different from the batch view update, which can tolerate a high latency. The end goal is that the latest information that is not captured in the Batch view will be made available in the Realtime view. The logic of doing the incremental merge on Realtime view is application specific. As a common use case, lets say we want to compute a set of summary statistics (e.g. mean, count, max, min, sum, standard deviation, percentile) of the transaction data since the last batch view update. To compute the sum, we can simply add the new transaction data to the existing sum and then write the new sum back to the real-time view. To compute the mean, we can multiply the existing count with existing mean, adding the transaction sum and then divide by the existing count plus one. To implement this logic, we need to READ data from the Realtime view, perform the merge and WRITE the data back to the Realtime view. This requires the Realtime serving DB (which host the Realtime view) to support both random READ and WRITE. Fortunately, since the realtime view only need to store the stale data up to one batch cycle, its scale is limited to some degree. Once the batch view update is completed, the real-time layer will discard the data from the real time serving DB that has time stamp earlier than the batch processing. This not only limit the data volume of Realtime serving DB, but also allows any data inconsistency (of the realtime view) to be clean up eventually. This drastically reduce the requirement of sophisticated multi-user, large scale DB. Many DB system support multiple user random read/write and can be used for this purpose. Serving Layer The serving layer is responsible to host the batch view (in the batch serving database) as well as hosting the real-time view (in the real-time serving database). Due to very different accessing pattern, the batch serving DB has a quite different characteristic from the real-time serving DB. As mentioned in above, while required to support efficient random read at large scale data volume, the batch serving DB doesn't need to support random write because data will only be bulk-loaded into the batch serving DB. On the other hand, the real-time serving DB will be incrementally (and continuously) updated by the real-time layer, and therefore need to support both random read and random write. To maintain the batch serving DB updated, the serving layer need to periodically check the batch layer progression to determine whether a later round of batch view generation is finished. If so, bulk load the batch view into the batch serving DB. After completing the bulk load, the batch serving DB has contained the latest version of batch view and some data in the real-time view is expired and therefore can be deleted. The serving layer will orchestrate these processes. This purge action is especially important to keep the size of the real-time serving DB small and hence can limit the complexity for handling real-time, concurrent read/write. To process a real-time query, the serving layer disseminates the incoming query into 2 different sub-queries and forward them to both the Batch serving DB and Realtime serving DB, apply application-specific logic to combine/merge their corresponding result and form a single response to the query. Since the data in the real-time view and batch view are different from a timestamp perspective, the combine/merge is typically done by concatenate the results together. In case of any conflict (same time stamp), the one from Batch view will overwrite the one from Realtime view. Final Thoughts By separating different responsibility into different layers, the Lambda architecture can leverage different optimization techniques specifically designed for different constraints. For example, the Batch Layer focuses in large scale data processing using simple, start-from-scratch approach and not worrying about the processing latency. On the other hand, the Real-time Layer covers where the Batch Layer left off and focus in low-latency merging of the latest information and no need to worry about large scale. Finally the Serving Layer is responsible to stitch together the Batch View and Realtime View to provide the final complete picture. The clear demarcation of responsibility also enable different technology stacks to be utilized at each layer and hence can tailor more closely to the organization's specific business need. Nevertheless, using a very different mechanism to update the Batch view (ie: start-from-scratch) and Realtime view (ie: incremental merge) requires two different algorithm implementation and code base to handle the same type of data. This can increase the code maintenance effort and can be considered to be the price to pay for bridging the fundamental gap between the "scalability" and "low latency" need. Nathan's Lambda architecture also introduce a set of candidate technologies which he has developed and used in his past projects (e.g. Hadoop for storing Master dataset, Hadoop for generating Batch view, ElephantDB for batch serving DB, Cassandra for realtime serving DB, STORM for generating Realtime view). The beauty of Lambda architecture is that the choice of technologies is completely decoupled so I intentionally do not describe any of their details in this post. On the other hand, I have my own favorite which is different and that will be covered in my future posts.
August 20, 2014
by Ricky Ho
· 12,171 Views
article thumbnail
JPA Tutorial: Setting Up JPA in a Java SE Environment
There are many reasons to learn an ORM tool like JPA, but it's not a magic bullet that will solve all your problems.
August 18, 2014
by MD Sayem Ahmed
· 145,922 Views · 4 Likes
article thumbnail
The Dark Side of Hibernate Auto Flush
Introduction Now that I described the the basics of JPA and Hibernate flush strategies, I can continue unraveling the surprising behavior of Hibernate’s AUTO flush mode. Not all queries trigger a Session flush Many would assume that Hibernate always flushes the Session before any executing query. While this might have been a more intuitive approach, and probably closer to the JPA’s AUTO FlushModeType, Hibernate tries to optimize that. If the current executed query is not going to hit the pending SQL INSERT/UPDATE/DELETE statements then the flush is not strictly required. As stated in the reference documentation, the AUTO flush strategy may sometimessynchronize the current persistence context prior to a query execution. It would have been more intuitive if the framework authors had chosen to name it FlushMode.SOMETIMES. JPQL/HQL and SQL Like many other ORM solutions, Hibernate offers a limited Entity querying language (JPQL/HQL) that’s very much based on SQL-92 syntax. The entity query language is translated to SQL by the current database dialect and so it must offer the same functionality across different database products. Since most database systems are SQL-92 complaint, the Entity Query Language is an abstraction of the most common database querying syntax. While you can use the Entity Query Language in many use cases (selecting Entities and even projections), there are times when its limited capabilities are no match for an advanced querying request. Whenever we want to make use of some specific querying techniques, such as: Window functions Pivot table Common Table Expressions we have no other option, but to run native SQL queries. Hibernate is a persistence framework. Hibernate was never meant to replace SQL. If some query is better expressed in a native query, then it’s not worth sacrificing application performance on the altar of database portability. AUTO flush and HQL/JPQL First we are going to test how the AUTO flush mode behaves when an HQL query is about to be executed. For this we define the following unrelated entities: The test will execute the following actions: A Person is going to be persisted. Selecting User(s) should not trigger a the flush. Querying for Person, the AUTO flush should trigger the entity state transition synchronization (A person INSERT should be executed prior to executing the select query). 1 2 3 4 Product product = newProduct(); session.persist(product); assertEquals(0L, session.createQuery("select count(id) from User").uniqueResult()); assertEquals(product.getId(), session.createQuery("select p.id from Product p").uniqueResult()); Giving the following SQL output: 1 2 3 4 [main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: f76f61e2-f3e3-4ea4-8f44-82e9804ceed0, using strategy: org.hibernate.id.UUIDGenerator Query:{[selectcount(user0_.id) as col_0_0_ from user user0_][]} Query:{[insert into product (color, id) values (?, ?)][12,f76f61e2-f3e3-4ea4-8f44-82e9804ceed0]} Query:{[selectproduct0_.idas col_0_0_ from product product0_][]} As you can see, the User select hasn’t triggered the Session flush. This is because Hibernate inspects the current query space against the pending table statements. If the current executing query doesn’t overlap with the unflushed table statements, the a flush can be safely ignored. HQL can detect the Product flush even for: Sub-selects 1 2 3 4 5 session.persist(product); assertEquals(0L, session.createQuery( "select count(*) "+ "from User u "+ "where u.favoriteColor in (select distinct(p.color) from Product p)").uniqueResult()); Resulting in a proper flush call: 1 2 Query:{[insert into product (color, id) values (?, ?)][Blue,2d9d1b4f-eaee-45f1-a480-120eb66da9e8]} Query:{[selectcount(*) as col_0_0_ from user user0_ where user0_.favoriteColor in(selectdistinct product1_.color from product product1_)][]} Or theta-style joins 1 2 3 4 5 session.persist(product); assertEquals(0L, session.createQuery( "select count(*) "+ "from User u, Product p "+ "where u.favoriteColor = p.color").uniqueResult()); Triggering the expected flush : 1 2 Query:{[insert into product (color, id) values (?, ?)][Blue,4af0b843-da3f-4b38-aa42-1e590db186a9]} Query:{[selectcount(*) as col_0_0_ from user user0_ cross joinproduct product1_ where user0_.favoriteColor=product1_.color][]} The reason why it works is because Entity Queries are parsed and translated to SQL queries. Hibernate cannot reference a non existing table, therefore it always knows the database tables an HQL/JPQL query will hit. So Hibernate is only aware of those tables we explicitly referenced in our HQL query. If the current pending DML statements imply database triggers or database level cascading, Hibernate won’t be aware of those. So even for HQL, the AUTO flush mode can cause consistency issues. If you enjoy reading this article, you might want to subscribe to my newsletter and get a discount for my book as well. AUTO flush and native SQL queries When it comes to native SQL queries, things are getting much more complicated. Hibernate cannot parse SQL queries, because it only supports a limited database query syntax. Many database systems offer proprietary features that are beyond Hibernate Entity Query capabilities. Querying the Person table, with a native SQL query is not going to trigger the flush, causing an inconsistency issue: 1 2 3 Product product = newProduct(); session.persist(product); assertNull(session.createSQLQuery("select id from product").uniqueResult()); 1 2 3 DEBUG [main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: 718b84d8-9270-48f3-86ff-0b8da7f9af7c, using strategy: org.hibernate.id.UUIDGenerator Query:{[selectidfrom product][]} Query:{[insert into product (color, id) values (?, ?)][12,718b84d8-9270-48f3-86ff-0b8da7f9af7c]} The newly persisted Product was only inserted during transaction commit, because the native SQL query didn’t triggered the flush. This is major consistency problem, one that’s hard to debug or even foreseen by many developers. That’s one more reason for always inspecting auto-generated SQL statements. The same behaviour is observed even for named native queries: 1 2 3 4 @NamedNativeQueries( @NamedNativeQuery(name = "product_ids", query = "select id from product") ) assertNull(session.getNamedQuery("product_ids").uniqueResult()); So even if the SQL query is pre-loaded, Hibernate won’t extract the associated query space for matching it against the pending DML statements. Overruling the current flush strategy Even if the current Session defines a default flush strategy, you can always override it on a query basis. Query flush mode The ALWAYS mode is going to flush the persistence context before any query execution (HQL or SQL). This time, Hibernate applies no optimization and all pending entity state transitions are going to be synchronized with the current database transaction. 1 assertEquals(product.getId(), session.createSQLQuery("select id from product").setFlushMode(FlushMode.ALWAYS).uniqueResult()); Instructing Hibernate which tables should be syncronized You could also add a synchronization rule on your current executing SQL query. Hibernate will then know what database tables need to be syncronzied prior to executing the query. This is also useful for second level caching as well. 1 assertEquals(product.getId(), session.createSQLQuery("select id from product").addSynchronizedEntityClass(Product.class).uniqueResult()); If you enjoyed this article, I bet you are going to love my book as well. Conclusion The AUTO flush mode is tricky and fixing consistency issues on a query basis is a maintainer’s nightmare. If you decide to add a database trigger, you’ll have to check all Hibernate queries to make sure they won’t end up running against stale data. My suggestion is to use the ALWAYS flush mode, even if Hibernate authors warned us that: this strategy is almost always unnecessary and inefficient. Inconsistency is much more of an issue that some occasional premature flushes. While mixing DML operations and queries may cause unnecessary flushing this situation is not that difficult to mitigate. During a session transaction, it’s best to execute queries at the beginning (when no pending entity state transitions are to be synchronized) and towards the end of the transaction (when the current persistence context is going to be flushed anyway). The entity state transition operations should be pushed towards the end of the transaction, trying to avoid interleaving them with query operations (therefore preventing a premature flush trigger).
August 15, 2014
by Vlad Mihalcea
· 36,043 Views · 3 Likes
article thumbnail
DB Queries in spring's application-context.xml.
While solving one of my assignment, I realize the age long practice of writing db queries in java class file is really a big pain. They are difficult to read and understand. Further if you want to modify it again a lot of task. Thus, to get rid of the pain, I put the db queries in spring’s context xml. Later I realize using such injections using xml, its not only easier to maintain and understand also we can have module wise segregation of db queries in different or in different "xyzDAO-queries.xml". Further its easier to add more customers to the existing application. Please see the code snippet area for the context.xml called "userDAO-queries.xml". I've created Map with an id. In the Map I put the key as and values as .db queries goes here.... , which contains the query. The java classes are DBQueriesImpl.java and UserDAOImpl.java for fetching the exact query from the "xyzDAO-queries.xml". Please see the comments in the java class. Context XML (userDAO-queries.xml) part : SELECT user_id, user_name, user_password, user_fname, user_mname, user_lname, user_create_time FROM ty_users SELECT user_id, user_name, user_password FROM ty_users WHERE user_name = ? and user_password = ? SELECT user_id, user_name, user_password FROM ty_users WHERE user_id=? INSERT INTO ty_users (user_name, user_password) VALUES (?, ?) Java Code Part: UserDAOImpl.java public class UserDAOImpl implements UserDAO{ private static final Logger logger = LoggerFactory.getLogger(UserDAOImpl.class);// logger DBQueries dbQuery = new DBQueriesImpl("userDAO-queries.xml");// DBConnection dbConn = ConnectionFactory.getInstance().getConnectionMySQL(); @Override public List authenitcUserDetails(String userName, String userPassword) { Map queryMap = null; String query ; List> mappedUserList = null; List listUser = null; try{ listUser = new ArrayList(); queryMap = dbQuery.getQueryBeanFromFactory("VALIDATE_USER_CREDENTIAL"); //passing the query-name or Map Id. query = dbQuery.getQuery(queryMap);//passing the query-map to get the query. //rest of the code is common fetching code. mappedUserList = dbConn.dbQueryRead(query, new Object[]{userName,userPassword}); if(mappedUserList.size()>0){ listUser = new QueryResultSetMapperImpl().mapRersultSetToObject(mappedUserList, User.class); } }catch(Exception ex){ logger.error("fetchAllUser Error:: ", ex); } return listUser; } } DBQueriesImpl.java public class DBQueriesImpl implements DBQueries { private static final Logger logger = LoggerFactory.getLogger(DBQueriesImpl.class); private ApplicationContext queriesCtx ; /** * @Desc: Loading the queryContext xml * @param queryContext */ public DBQueriesImpl(String queryContext) { queriesCtx = new ClassPathXmlApplicationContext(queryContext); } /** * @Desc: Reading from the loaded application context and getting the query-map, . */ @Override public Map getQueryBeanFromFactory(String beanId){ Map queryMap = null; if (queriesCtx != null && beanId != null) { queryMap = (Map)queriesCtx.getBean(beanId); } return queryMap; } /** * @Desc: Getting the exact query from the query-map, . */ @Override public String getQuery(Map queryMap) { String query=null; try{ if(queryMap.containsKey(QueryConstants.QUERY_NODE)){ query = (String) queryMap.get(QueryConstants.QUERY_NODE); queryMap.remove(QueryConstants.QUERY_NODE); }else{ throw new NoSuchFieldError(); } }catch(Exception excp){ excp.printStackTrace(); } return query; } } With this kind of setting I can also solve few other problems. In one of my assignment I used this trick. In the application, various clients have different set of data requirements thus, different set of sql-queries. Now instead of writing java classes to met client requirement we just write the set of spring-context.xmls having sql and used those spring-context at runtime and fetch data.
August 13, 2014
by Shantanu Sikdar
· 5,447 Views · 1 Like
article thumbnail
6 Rules of Thumb for MongoDB Schema Design: Part 3
By William Zola, Lead Technical Support Engineer at MongoDB This is our final stop in this tour of modeling One-to-N relationships in MongoDB. In the first post, I covered the three basic ways to model a One-to-N relationship. Last time, I covered some extensions to those basics: two-way referencing and denormalization. Denormalization allows you to avoid some application-level joins, at the expense of having more complex and expensive updates. Denormalizing one or more fields makes sense if those fields are read much more often than they are updated. Read part two if you’ve missed it. Whoa! Look at All These Choices! So, to recap: You can embed, reference from the “one” side, or reference from the “N” side, or combine a pair of these techniques You can denormalize as many fields as you like into the “one” side or the “N” side Denormalization, in particular, gives you a lot of choices: if there are 8 candidates for denormalization in a relationship, there are 2 8 (1024) different ways to denormalize (including not denormalizing at all). Multiply that by the three different ways to do referencing, and you have over 3,000 different ways to model the relationship. Guess what? You now are stuck in the “paradox of choice” — because you have so many potential ways to model a “one-to-N” relationship, your choice on how to model it just got harder. Lots harder. Rules of Thumb: Your Guide Through the Rainbow Here are some “rules of thumb” to guide you through these indenumberable (but not infinite) choices One: favor embedding unless there is a compelling reason not to Two: needing to access an object on its own is a compelling reason not to embed it Three: Arrays should not grow without bound. If there are more than a couple of hundred documents on the “many” side, don’t embed them; if there are more than a few thousand documents on the “many” side, don’t use an array of ObjectID references. High-cardinality arrays are a compelling reason not to embed. Four: Don’t be afraid of application-level joins: if you index correctly and use the projection specifier (as shown in part 2) then application-level joins are barely more expensive than server-side joins in a relational database. Five: Consider the write/read ratio when denormalizing. A field that will mostly be read and only seldom updated is a good candidate for denormalization: if you denormalize a field that is updated frequently then the extra work of finding and updating all the instances is likely to overwhelm the savings that you get from denormalizing. Six: As always with MongoDB, how you model your data depends — entirely — on your particular application’s data access patterns. You want to structure your data to match the ways that your application queries and updates it. Your Guide To The Rainbow When modeling “One-to-N” relationships in MongoDB, you have a variety of choices, so you have to carefully think through the structure of your data. The main criteria you need to consider are: What is the cardinality of the relationship: is it “one-to-few”, “one-to-many”, or “one-to-squillions”? Do you need to access the object on the “N” side separately, or only in the context of the parent object? What is the ratio of updates to reads for a particular field? Your main choices for structuring the data are: For “one-to-few”, you can use an array of embedded documents For “one-to-many”, or on occasions when the “N” side must stand alone, you should use an array of references. You can also use a “parent-reference” on the “N” side if it optimizes your data access pattern. For “one-to-squillions”, you should use a “parent-reference” in the document storing the “N” side. Once you’ve decided on the overall structure of the data, then you can, if you choose, denormalize data across multiple documents, by either denormalizing data from the “One” side into the “N” side, or from the “N” side into the “One” side. You’d do this only for fields that are frequently read, get read much more often than they get updated, and where you don’t require strong consistency, since updating a denormalized value is slower, more expensive, and is not atomic. Productivity and Flexibility The upshot of all of this is that MongoDB gives you the ability to design your database schema to match the needs of your application. You can structure your data in MongoDB so that it adapts easily to change, and supports the queries and updates that you need to get the most out of your application.
August 13, 2014
by Francesca Krihely
· 8,741 Views
article thumbnail
6 Rules of Thumb for MongoDB Schema Design: Part 2
By William Zola, Lead Technical Support Engineer at MongoDB This is the second stop on our tour of modeling One-to-N relationships in MongoDB. Last time I covered the three basic schema designs: embedding, child-referencing, and parent-referencing. I also covered the two factors to consider when picking one of these designs: Will the entities on the “N” side of the One-to-N ever need to stand alone? What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions? With these basic techniques under our belt, I can move on to covering more sophisticated schema designs, involving two-way referencing and denormalization. Intermediate: Two-Way Referencing If you want to get a little bit fancier, you can combine two techniques and include both styles of reference in your schema, having both references from the “one” side to the “many” side and references from the “many” side to the “one” side. For an example, let’s go back to that task-tracking system. There’s a “people” collection holding Person documents, a “tasks” collection holding Task documents, and a One-to-N relationship from Person -> Task. The application will need to track all of the Tasks owned by a Person, so we will need to reference Person -> Task. With the array of references to Task documents, a single Person document might look like this: db.person.findOne() { _id: ObjectID("AAF1"), name: "Kate Monster", tasks [ // array of references to Task documents ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73") // etc ] } On the other hand, in some other contexts this application will display a list of Tasks (for example, all of the Tasks in a multi-person Project) and it will need to quickly find which Person is responsible for each Task. You can optimize this by putting an additional reference to the Person in the Task document. db.tasks.findOne() { _id: ObjectID("ADF9"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), owner: ObjectID("AAF1") // Reference to Person document } This design has all of the advantages and disadvantages of the “One-to-Many” schema, but with some additions. Putting in the extra ‘owner’ reference into the Task document means that its quick and easy to find the Task’s owner, but it also means that if you need to reassign the task to another person, you need to perform two updates instead of just one. Specifically, you’ll have to update both the reference from the Person to the Task document, and the reference from the Task to the Person. (And to the relational gurus who are reading this — you’re right: using this schema design means that it is no longer possible to reassign a Task to a new Person with a single atomic update. This is OK for our task-tracking system: you need to consider if this works with your particular use case.) Intermediate: Denormalizing With “One-To-Many” Relationships Beyond just modeling the various flavors of relationships, you can also add denormalization into your schema. This can eliminate the need to perform the application-level join for certain cases, at the price of some additional complexity when performing updates. An example will help make this clear. Denormalizing from Many -> One For the parts example, you could denormalize the name of the part into the ‘parts[]’ array. For reference, here’s the version of the Product document without denormalization. > db.products.findOne() { name : 'left-handed smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ // array of references to Part documents ObjectID('AAAA'), // reference to the #4 grommet above ObjectID('F17C'), // reference to a different Part ObjectID('D2AA'), // etc ] } Denormalizing would mean that you don’t have to perform the application-level join when displaying all of the part names for the product, but you would have to perform that join if you needed any other information about a part. > db.products.findOne() { name : 'left-handed smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ { id : ObjectID('AAAA'), name : '#4 grommet' }, // Part name is denormalized { id: ObjectID('F17C'), name : 'fan blade assembly' }, { id: ObjectID('D2AA'), name : 'power switch' }, // etc ] } While making it easier to get the part names, this would add just a bit of client-side work to the application-level join: // Fetch the product document > product = db.products.findOne({catalog_number: 1234}); // Create an array of ObjectID()s containing *just* the part numbers > part_ids = product.parts.map( function(doc) { return doc.id } ); // Fetch all the Parts that are linked to this Product > product_parts = db.parts.find({_id: { $in : part_ids } } ).toArray() ; Denormalizing saves you a lookup of the denormalized data at the cost of a more expensive update: if you’ve denormalized the Part name into the Product document, then when you update the Part name you must also update every place it occurs in the ‘products’ collection. Denormalizing only makes sense when there’s an high ratio of reads to updates. If you’ll be reading the denormalized data frequently, but updating it only rarely, it often makes sense to pay the price of slower updates — and more complex updates — in order to get more efficient queries. As updates become more frequent relative to queries, the savings from denormalization decrease. For example: assume the part name changes infrequently, but the quantity on hand changes frequently. This means that while it makes sense to denormalize the part name into the Product document, it does not make sense to denormalize the quantity on hand. Also note that if you denormalize a field, you lose the ability to perform atomic and isolated updates on that field. Just like with the two-way referencing example above, if you update the part name in the Part document, and then in the Product document, there will be a sub-second interval where the denormalized ‘name’ in the Product document will not reflect the new, updated value in the Part document. Denormalizing from One -> Many You can also denormalize fields from the “One” side into the “Many” side: > db.parts.findOne() { _id : ObjectID('AAAA'), partno : '123-aff-456', name : '#4 grommet', product_name : 'left-handed smoke shifter', // Denormalized from the ‘Product’ document product_catalog_number: 1234, // Ditto qty: 94, cost: 0.94, price: 3.99 } However, if you’ve denormalized the Product name into the Part document, then when you update the Product name you must also update every place it occurs in the ‘parts’ collection. This is likely to be a more expensive update, since you’re updating multiple Parts instead of a single Product. As such, it’s significantly more important to consider the read-to-write ratio when denormalizing in this way. Intermediate: Denormalizing With “One-To-Squillions” Relationships You can also denormalize the “one-to-squillions” example. This works in one of two ways: you can either put information about the “one” side (from the ‘hosts’ document) into the “squillions” side (the log entries), or you can put summary information from the “squillions” side into the “one” side. Here’s an example of denormalizing into the “squillions” side. I’m going to add the IP address of the host (from the ‘one’ side) into the individual log message: > db.logmsg.findOne() { time : ISODate("2014-03-28T09:42:41.382Z"), message : 'cpu is on fire!', ipaddr : '127.66.66.66', host: ObjectID('AAAB') } Your query for the most recent messages from a particular IP address just got easier: it’s now just one query instead of two. > last_5k_msg = db.logmsg.find({ipaddr : '127.66.66.66'}).sort({time : -1}).limit(5000).toArray() In fact, if there’s only a limited amount of information you want to store at the “one” side, you can denormalize it ALL into the “squillions” side and get rid of the “one” collection altogether: > db.logmsg.findOne() { time : ISODate("2014-03-28T09:42:41.382Z"), message : 'cpu is on fire!', ipaddr : '127.66.66.66', hostname : 'goofy.example.com', } On the other hand, you can also denormalize into the “one” side. Lets say you want to keep the last 1000 messages from a host in the ‘hosts’ document. You could use the $each / $slice functionality introduced in MongoDB 2.4 to keep that list sorted, and only retain the last 1000 messages: The log messages get saved in the ‘logmsg’ collection as well as in the denormalized list in the ‘hosts’ document: that way the message isn’t lost when it ages out of the ‘hosts.logmsgs’ array. // Get log message from monitoring system logmsg = get_log_msg(); log_message_here = logmsg.msg; log_ip = logmsg.ipaddr; // Get current timestamp now = new Date() // Find the _id for the host I’m updating host_doc = db.hosts.findOne({ipaddr : log_ip },{_id:1}); // Don’t return the whole document host_id = host_doc._id; // Insert the log message, the parent reference, and the denormalized data into the ‘many’ side db.logmsg.save({time : now, message : log_message_here, ipaddr : log_ip, host : host_id ) }); // Push the denormalized log message onto the ‘one’ side db.hosts.update( {_id: host_id }, {$push : {logmsgs : { $each: [ { time : now, message : log_message_here } ], $sort: { time : 1 }, // Only keep the latest ones $slice: -1000 } // Only keep the latest 1000 } ); Note the use of the projection specification ( {_id:1} ) to prevent MongoDB from having to ship the entire ‘hosts’ document over the network. By telling MongoDB to only return the _id field, I reduce the network overhead down to just the few bytes that it takes to store that field (plus just a little bit more for the wire protocol overhead). Just as with denormalizing in the “One-to-Many” case, you’ll want to consider the ratio of reads to updates. Denormalizing the log messages into the Host document makes sense only if log messages are infrequent relative to the number of times the application needs to look at all of the messages for a single host. This particular denormalization is a bad idea if you want to look at the data less frequently than you update it. Recap In this post, I’ve covered the additional choices that you have past the basics of embed, child-reference, or parent-reference. You can use bi-directional referencing if it optimizes your schema, and if you are willing to pay the price of not having atomic updates If you are referencing, you can denormalize data either from the “One” side into the “N” side, or from the “N” side into the “One” side When deciding whether or not to denormalize, consider the following factors: You cannot perform an atomic update on denormalized data Denormalization only makes sense when you have a high read to write ratio Next time, I’ll give you some guidelines to pick and choose among all of these options. To learn more Schema Design tips, see our recent Back to Basics webinar on “Thinking in Documents” Sign up for the MongoDB Newsletter to get MongoDB updates right to your inbox
August 11, 2014
by Francesca Krihely
· 12,400 Views
article thumbnail
Introducing BIRT iHub F-Type
Actuate recently released a new, free BIRT server called the BIRT iHub F-Type. It incorporates all the functionality of BIRT iHub and is limited only by the capacity of output it can deliver on a daily basis. It is ideal for departmental and smaller scale applications. When BIRT F-Type reaches its maximum output capacity, additional capacity can be purchased on a subscription based model. Some of the key features of BIRT iHub F-Type that will help improve your BIRT content applications are: Interactivity – Allow end-users to modify and personalize reports, and answer questions themselves. Scheduling – Automate report generation based on rules and calendar, and then notify users. Sharing – Secure document management and distribution that allows users to only access content/data they are entitled to. Excel Emitter – Export as native Excel (not CSV) with formulas/pivot tables/worksheets/charts. Integration – JavaScript API to embed dynamic reports and visualizations in your web app. Downloading BIRT iHub F-Type Before we get started with the installation process, we need to download BIRT iHub F-Type. There are three downloads available: Windows, Linux, and a VMware image. This blog will cover the Windows installation. If you’re installing either of the other types, you’ll find links to guides for them at the bottom of this blog post. Once you click on your chosen download, you’ll be asked to register. If you’ve already registered, click the “Click to Login” button. If not, fill out the short registration form to get started. Next, read and accept the license agreement. Once you’ve done that, click the checkbox, and a link for the download will appear. Click that to start your download. At this point, you should also receive an email with an activation code. Be sure to check your spam folder if you don’t see it in your inbox. Installing BIRT iHub F-Type After the download is complete, launch the executable file named ActuateBIRTiHubFType.exe. A welcome message will appear. Press Next to continue. You must read and accept the license agreement on the next screen. Choose a destination folder for the installation. The default is C:\Actuate\BIRTiHub. If you have existing BIRT designs that depend on a JDBC database driver, you can optionally specify the folder where these drivers are located. Press Next to continue. Once the installation has finished, press Finish to launch the BIRT iHub F-Type. A desktop shortcut is also created that points to the iHub F-Type URL at http://localhost:8700/iportal. The first time you launch the BIRT iHub F-Type, you will need to activate it. Enter the activation code that you should have received in an e-mail. After entering a valid activation code, you should receive a message that the code was accepted and the BIRT iHub F-Type should start initializing services. Once that has completed, you will be presented with the login screen. The default user name is “administrator” and the password is blank for your first log in. You’ll be able to change this after you have logged in. Press “Log In” to continue. The first time you launch the BIRT iHub F-Type, you will be in tutorial mode which will help you get started loading your BIRT content and required resources. You can bypass the tutorial mode at any time by pressing the “Exit Tutorial” button at the top right. Select a BIRT design (*.rptdesign) file and press the Upload button. If you don’t have a BIRT design, you can download a sample from the link on the same page. The BIRT design file is automatically inspected and if there are any dependent files needed, like images, data files, BIRT report libraries, CSS styles, or other linked BIRT designs, you will be asked to upload those files as well. Once your BIRT design and dependent files are uploaded, your BIRT report will be displayed in the BIRT iHub F-Type and is now ready to explore. Thanks for reading. Now, it’s time to unleash the full power of BIRT into your application. If you have any questions or comments, please feel free to use the comments section below or visit the BIRT iHub F-Type forum. -Virgil For more blogs in the “Introducing BIRT iHub F-Type” series, see the list below: Installing iHub F-Type: Linux | VMWare Image - See more at: http://blogs.actuate.com/introducing-birt-ihub-f-type-installing-on-windows/#sthash.QPJhv2gw.dpuf
August 6, 2014
by Michael Singer
· 1,937 Views
article thumbnail
Distributed Big Balls of Mud
if you want evidence that the software development industry is susceptible to fashion, just go and take a look at all of the hype around microservices. it's everywhere! for some people microservices is "the next big thing", whereas for others it's simply a lightweight evolution of the big soap service-oriented architectures that we saw 10 years ago "done right". i do like a lot of what the current microservice architectures are doing, but it's by no means a silver bullet. okay, i know that sounds obvious, but i think many people are jumping on them for the wrong reason. i often show this slide in my conference talks, and i've blogged about this before , but basically there are different ways to build software systems. on the one side we have traditional monolithic systems, where everything is bundled up inside a single deployable unit. this is probably where most of the industry is. caveats apply, but monoliths can be built quickly and are easy to deploy, but they provide limited agility because even tiny changes require a full redeployment. we also know that monoliths often end up looking like a big ball of mud because of the way that software often evolves over time. for example, many monolithic systems are built using a layered architecture, and it's relatively easy for layered architectures to be abused (e.g. skipping "around" a service to call the repository/data access layer directly). on the other side we have service-based architectures, where a software system is made up of many separately deployable services. again, caveats apply but, if done well, service-based architectures buy you a lot of flexibility and agility because each service can be developed, tested, deployed, scaled, upgraded and rewritten separately, especially if the services are decoupled via asynchronous messaging. the downside is increased complexity because your software system now has many more moving parts than a monolith. as robert says, the complexity is still there, you're just moving it somewhere else . there is, of course, a mid-ground here. we can build monolithic systems that are made up of in-process components, each of which has an explicit well-defined interface and set of responsibilities. this is old-school component-based design that talks about high cohesion and low coupling, but i usually sense some hesitation when i talk about it. and this seems odd to me. before i explain why, let me quote something from a blog post that i read earlier this morning about the rationale behind a team adopting a microservices approach. when we started building karma, we decided to split the project into two main parts: the backend api, and the frontend application. the backend is responsible for handling orders from the store, usage accounting, user management, device management and so forth, while the frontend offers a dashboard for users which accesses this api. along the way we noticed that if the whole backend api is monolithic it doesn't work very well because everything gets entangled. the blog post also mentions scaling, versioning and multiple languages/frameworks as other reasons to choose microservices. again, there are no silver bullets here, everything is a trade-off. anyway, "everything getting entangled" is not a reason to switch from monoliths to microservices. if you're building a monolithic system and it's turning into a big ball of mud, perhaps you should consider whether you're taking enough care of your software architecture. do you really understand what the core structural abstractions are in your software? are their interfaces and responsibilities clear too? if not, why do you think moving to a microservices architecture will help? sure, the physical separation of services will force you to not take some shortcuts, but you can achieve the same separation between components in a monolith. a little design thinking and an architecturally-evident coding style will help to achieve this without the baggage of going distributed. many of the teams i've spoken to are building monolithic systems and don't want to look at component-based design. the mid-ground seems to be a hard-sell. i ran a software architecture sketching workshop with a team earlier this year where we diagrammed one of their software systems. the diagram started as a strictly layered architecture (presentation, business services, data access) with all arrows pointing downwards and each layer only ever calling the layer directly beneath it. the code told a different story though and the eventual diagram didn't look so neat anymore. we discussed how adopting a package by component approach could fix some of these problems, but the response was, "meh, we like building software using layers". it seems as if teams are jumping on microservices because they're sexy, but the design thinking and decomposition strategy required to create a good microservices architecture are the same as those needed to create a well structured monolith. if teams find it hard to create a well structured monolith, i don't rate their chances of creating a well structured microservices architecture. as michael feathers recently said, " there's a bit of overhead involved in implementing each microservice. if they ever become as easy to create as classes, people will have a freer hand to create trouble - hulking monoliths at a different scale. ". i agree. a world of distributed big balls of mud worries me.
August 4, 2014
by Simon Brown
· 9,263 Views
article thumbnail
ACID Compliance: What It Means and Why You Should Care
Originally written by Dave Anselmi The presence of four components — atomicity, consistency, isolation and durability — can ensure that a database transaction is completed in a timely manner. When databases possess these components, they are said to be ACID-compliant. So just what is ACID compliance, and why should you care? Let’s take a look: Atomicity: Database transactions, like atoms, can be broken down into smaller parts. When it comes to your database, atomicity refers to the integrity of the entire database transaction, not just a component of it. In other words, if one part of a transaction doesn’t work like it’s supposed to, the other will fail as a result—and vice versa. For example, if you’re shopping on an e-commerce site, you must have an item in your cart in order to pay for it. What you can’t do is pay for something that’s not in your cart. (You can add something into your cart and not pay for it, but that database transaction won’t be complete, and thus not ‘atomic’, until you pay for it.) Consistency: For any database to operate as it’s intended to operate, it must follow the appropriate data validation rules. Thus, consistency means that only data which follows those rules is permitted to be written to the database. If a transaction occurs and results in data that does not follow the rules of the database, it will be ‘rolled back’ to a previous iteration of itself (or ‘state’) which complies with the rules. On the other hand, following a successful transaction, new data will be added to the database and the resulting state will be consistent with existing rules. Isolation: It’s safe to say that at any given time on Amazon, there is far more than one transaction occurring on the platform… In fact, an incredibly huge amount of database transactions are occurring simultaneously! For a database, isolation refers to the ability to concurrently process multiple transactions in a way that one does not affect another. So, imagine you and your neighbor are both trying to buy something from the same e-commerce platform at the same time. There are 10 items for sale: your neighbor wants five and you want six. Isolation means that one of those transactions would be completed ahead of the other one. In other words, if your neighbor clicked first, they will get five items, and only five items will be remaining in stock. So you will only get to buy five items. If you clicked first, you will get the six items you want, and they will only get four. Thus,. isolation ensures that eleven items aren’t sold when only ten exist. Durability: All technology fails from time to time… the goal is to make those failures invisible to the end-user. In databases that possess durability, data is saved once a transaction is completed, even if a power outage or system failure occurs. Imagine you’re buying in-demand concert tickets on a site similar to Ticketmaster.com. Right when tickets go on sale, you’re ready to make a purchase. After being stuck in the digital waiting room for some time, you’re finally able to add those tickets to your cart.. You then make the purchase and get your confirmation. However if that database lacks durability, even after your ticket purchase was confirmed, if the database suffers a failure incident your transaction would still be lost! As you might expect, this is a really bad thing to happen for an online e-commerce site, so transaction durability is a must-have. ClustrixDB, a NewSQL cloud database, comes with the added benefit of guaranteed ACID transactions, critical for e-commerce success! Click here to learn more.
July 30, 2014
by Lisa Schultz
· 11,345 Views
article thumbnail
CRUD Operation with ASP.NET MVC and Fluent Nhibernate
Before some time I have written a post about Getting Started with Nhibernate and ASP.NET MVC –CRUD operations. It’s one of the most popular post blog post on my blog. I get lots of questions via email and other medium why you are not writing a post about Fluent Nhibernate and ASP.NET MVC. So I thought it will be a good idea to write a blog post about it. What is Fluent Nhibernate: Convection over configuration that is mantra of Fluent Hibernate If you have see the my blog post about Nhibernate then you might have found that we need to create xml mapping to map table. Fluent Nhibernate uses POCO mapping instead of XML mapping and I firmly believe in Convection over configuration that is why I like Fluent Nhibernate a lot. But it’s a matter of Personal Test and there is no strong argument why you should use Fluent Nhibernate instead of Nhibernate. Fluent Nhibernate is team Comprises of James Gregory, Paul Batum, Andrew Stewart and Hudson Akridge. There are lots of committers and it’s a open source project. You can find all more information about at following site. http://www.fluentnhibernate.org/ On this site you can find definition of Fluent Nhibernate like below. Fluent, XML-less, compile safe, automated, convention-based mappings for NHibernate. Get your fluent on. They have excellent getting started guide on following url. You can easily walk through it and learned it. https://github.com/jagregory/fluent-nhibernate/wiki/Getting-started ASP.NET MVC and Fluent Nhibernate: So all set it’s time to write a sample application. So from visual studio 2013 go to file – New Project and add a new web application project with ASP.NET MVC. Once you are done with creating a web project with ASP.NET MVC It’s time to add Fluent Nhibernate to Application. You can add Fluent Nhibernate to application via following NuGet Package. And you can install it with package manager console like following. Now we are done with adding a Fluent Nhibernate to it’s time to add a new database. Now I’m going to create a simple table called “Employee” with four columns Id, FirstName, LastName and Designation. Following is a script for that. CREATE TABLE [dbo].[Employee] ( [Id] INT NOT NULL PRIMARY KEY, [FirstName] NVARCHAR(50) NULL, [LastName] NVARCHAR(50) NULL, [Designation] NVARCHAR(50) NULL ) Now table is ready it’s time to create a Model class for Employee. So following is a code for that. namespace FluentNhibernateMVC.Models { public class Employee { public virtual int EmployeeId { get; set; } public virtual string FirstName { get; set; } public virtual string LastName { get; set; } public virtual string Designation{ get; set; } } } Now Once we are done with our model classes and other stuff now it’s time to create a Map class which map our model class to our database table as we know that Fluent Nhibernate is using POCO classes to map instead of xml mappings. Following is a map class for that. using FluentNHibernate.Mapping; namespace FluentNhibernateMVC.Models { public class EmployeeMap : ClassMap { public EmployeeMap() { Id(x => x.EmployeeId); Map(x => x.FirstName); Map(x => x.LastName); Map(x => x.Designation); Table("Employee"); } } } If you see Employee Map class carefully you will see that I have mapped Id column with EmployeeId in table and another things you can see I have written a table with “Employee” which will map Employee class to employee class. Now once we are done with our mapping class now it’s time to add Nhibernate Helper which will connect to SQL Server database. using FluentNHibernate.Cfg; using FluentNHibernate.Cfg.Db; using NHibernate; using NHibernate.Tool.hbm2ddl; namespace FluentNhibernateMVC.Models { public class NHibernateHelper { public static ISession OpenSession() { ISessionFactory sessionFactory = Fluently.Configure() .Database(MsSqlConfiguration.MsSql2008 .ConnectionString(@"Data Source=(LocalDB)\v11.0;AttachDbFilename=C:\data\Blog\Samples\FluentNhibernateMVC\FluentNhibernateMVC\FluentNhibernateMVC\App_Data\FNhibernateDemo.mdf;Integrated Security=True") .ShowSql() ) .Mappings(m => m.FluentMappings .AddFromAssemblyOf()) .ExposeConfiguration(cfg => new SchemaExport(cfg) .Create(false, false)) .BuildSessionFactory(); return sessionFactory.OpenSession(); } } } Now we are done with classes It’s time to scaffold our Controller for the application. Right click Controller folder and click on add new controller. It will ask for scaffolding items like following. I have selected MVC5 controller with read/write action. Once you click Add It will ask for controller name. Now our controller is ready. It’s time to write code for our actions. Listing: Following is a code for our listing action. public ActionResult Index() { using (ISession session = NHibernateHelper.OpenSession()) { var employees = session.Query().ToList(); return View(employees); } } You add view for listing via right click on View and Add View. That’s is you are done with listing of your database. I have added few records to database table to see whether its working or not. Following is a output as expected. Create: Now it’s time to create code and views for creating a Employee. Following is a code for action results. public ActionResult Create() { return View(); } // POST: Employee/Create [HttpPost] public ActionResult Create(Employee employee) { try { using (ISession session = NHibernateHelper.OpenSession()) { using (ITransaction transaction = session.BeginTransaction()) { session.Save(employee); transaction.Commit(); } } return RedirectToAction("Index"); } catch (Exception exception) { return View(); } } Here you can see one if for blank action and another for posting data and saving to SQL Server with HttpPost Attribute. You can add view for create via right click on view on action result like following. Now when you run this you will get output as expected. Once you click on create it will return back to listing screen like this. Edit: Now we are done with listing/adding employee it’s time to write code for editing/updating employee. Following is a code Action Methods. public ActionResult Edit(int id) { using (ISession session = NHibernateHelper.OpenSession()) { var employee = session.Get(id); return View(employee); } } // POST: Employee/Edit/5 [HttpPost] public ActionResult Edit(int id, Employee employee) { try { using (ISession session = NHibernateHelper.OpenSession()) { var employeetoUpdate = session.Get(id); employeetoUpdate.Designation = employee.Designation; employeetoUpdate.FirstName = employee.FirstName; employeetoUpdate.LastName = employee.LastName; using (ITransaction transaction = session.BeginTransaction()) { session.Save(employeetoUpdate); transaction.Commit(); } } return RedirectToAction("Index"); } catch { return View(); } } You can create edit view via following via right click on view. Now when you run application it will work as expected. Details: You can write details code like following. public ActionResult Details(int id) { using (ISession session = NHibernateHelper.OpenSession()) { var employee = session.Get(id); return View(employee); } } You can view via right clicking on view as following. Now when you run application following is a output as expected. Delete: Following is a code for deleting Employee one action method for confirmation of delete and another one for deleting data from employee table. public ActionResult Delete(int id) { using (ISession session = NHibernateHelper.OpenSession()) { var employee = session.Get(id); return View(employee); } } [HttpPost] public ActionResult Delete(int id, Employee employee) { try { using (ISession session = NHibernateHelper.OpenSession()) { using (ITransaction transaction = session.BeginTransaction()) { session.Delete(employee); transaction.Commit(); } } return RedirectToAction("Index"); } catch (Exception exception) { return View(); } } You can add view for delete via right click on view like following. When you run it’s running as expected. Once you click on delete it will delete employee. That’s it. You can see it’s very easy. Fluent Nhibernate supports POCO mapping without writing any complex xml mappings. You can find whole source code at GitHub at following location. https://github.com/dotnetjalps/FluentNhiberNateMVC Hope you like it. Stay tuned for more!!
July 28, 2014
by Jalpesh Vadgama
· 24,326 Views · 1 Like
article thumbnail
From JPA to Hibernate's Legacy and Enhanced Identifier Generators
Read about enhanced identifier generators, like JPA and Hibernate.
July 16, 2014
by Vlad Mihalcea
· 19,453 Views
article thumbnail
Hibernate Identity, Sequence and Table (Sequence) Generator
Learn about Identity, Sequence, and Table in Hibernate.
July 9, 2014
by Vlad Mihalcea
· 178,030 Views · 2 Likes
article thumbnail
SpringBoot: Introducing SpringBoot
SpringBoot...there is a lot of buzz about SpringBoot nowadays. So what is SpringBoot? SpringBoot is a new spring portfolio project which takes opinionated view of building production-ready Spring applications by drastically reducing the amount of configuration required. Spring Boot is taking the convention over configuration style to the next level by registering the default configurations automatically based on the classpath libraries available at runtime. Well.. you might have already read this kind of introduction to SpringBoot on many blogs. So let me elaborate on what SpringBoot is and how it helps developing Spring applications more quickly. Spring framework was created by Rod Johnson when many of the Java developers are struggling with EJB 1.x/2.x for building enterprise applications. Spring framework makes developing the business components easy by using Dependency Injection and Aspect Oriented Programming concepts. Spring became very popular and many more Spring modules like SpringSecurity, Spring Batch, Spring Data etc become part of Spring portfolio. As more and more features added to Spring, configuring all the spring modules and their dependencies become a tedious task. Adding to that Spring provides atleast 3 ways of doing anything :-). Some people see it as flexibility and some others see it as confusing. Slowly, configuring all the Spring modules to work together became a big challenge. Spring team came up with many approaches to reduce the amount of configuration needed by introducing Spring XML DSLs, Annotations and JavaConfig. In the very beginning I remember configuring a big pile of jar version declarations in section and lot of declarations. Then I learned creating maven archetypes with basic structure and minimum required configurations. This reduced lot of repetitive work, but not eliminated completely. Whether you write the configuration by hand or generate by some automated ways, if there is code that you can see then you have to maintain it. So whether you use XML or Annotations or JavaConfig, you still need to configure(copy-paste) the same infrastructure setup one more time. On the other hand, J2EE (which is dead long time ago) emerged as JavaEE and since JavaEE6 it became easy (compared to J2EE and JavaEE5) to develop enterprise applications using JavaEE platform. Also JavaEE7 released with all the cool CDI, WebSockets, Batch, JSON support etc things became even more simple and powerful as well. With JavaEE you don't need so much XML configuration and your war file size will be in KBs (really??? for non-helloworld/non-stageshow apps also :-)). Naturally this "convention over configuration" and "you no need to glue APIs together appServer already did it" arguments became the main selling points for JavaEE over Spring. Then Spring team addresses this problem with SpringBoot :-). Now its time to JavaEE to show whats the SpringBoot's counterpart in JavaEE land :-) JBoss Forge?? I love this Spring vs JavaEE thing which leads to the birth of powerful tools which ultimately simplify the developers life :-). Many times we need similar kind of infrastructure setup using same libraries. For example, take a web application where you map DispatcherServlet url-pattern to "/", implement RESTFul webservices using Jackson JSON library with Spring Data JPA backend. Similarly there could be batch or spring integration applications which needs similar infrastructure configuration. SpringBoot to the rescue. SpringBoot look at the jar files available to the runtime classpath and register the beans for you with sensible defaults which can be overridden with explicit settings. Also SpringBoot configure those beans only when the jars files available and you haven't define any such type of bean. Altogether SpringBoot provides common infrastructure without requiring any explicit configuration but lets the developer overrides if needed. To make things more simpler, SpringBoot team provides many starter projects which are pre-configured with commonly used dependencies. For example Spring Data JPA starter project comes with JPA 2.x with Hibernate implementation along with Spring Data JPA infrastructure setup. Spring Web starter comes with Spring WebMVC, Embedded Tomcat, Jackson JSON, Logback setup. Aaah..enough theory..lets jump onto coding. I am using latest STS-3.5.1 IDE which provides many more starter project options like Facebbok, Twitter, Solr etc than its earlier version. Create a SpringBoot starter project by going to File -> New -> Spring Starter Project -> select Web and Actuator and provide the other required details and Finish. This will create a Spring Starter Web project with the following pom.xml and Application.java 4.0.0 com.sivalabs hello-springboot 1.0-SNAPSHOT jar hello-springboot Spring Boot Hello World org.springframework.boot spring-boot-starter-parent 1.1.3.RELEASE org.springframework.boot spring-boot-starter-actuator org.springframework.boot spring-boot-starter-web org.springframework.boot spring-boot-starter-test test UTF-8 com.sivalabs.springboot.Application 1.7 org.springframework.boot spring-boot-maven-plugin package com.sivalabs.springboot; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.EnableAutoConfiguration; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; @Configuration @ComponentScan @EnableAutoConfiguration public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); } } Go ahead and run this class as a standalone Java class. It will start the embedded Tomcat server on 8080 port. But we haven't added any endpoints to access, lets go ahead and add a simple REST endpoint. @Configuration @ComponentScan @EnableAutoConfiguration @Controller public class Application { public static void main(String[] args) { SpringApplication.run(Application.class, args); } @RequestMapping(value="/") @ResponseBody public String bootup() { return "SpringBoot is up and running"; } } Now point your browser to http://localhost:8080/ and you should see the response "SpringBoot is up and running". Remember while creating project we have added Actuator starter module also. With Actuator you can obtain many interesting facts about your application. Try accessing the following URLs and you can see lot of runtime environment configurations that are provided by SpringBoot. http://localhost:8080/beans http://localhost:8080/metrics http://localhost:8080/trace http://localhost:8080/env http://localhost:8080/mappings http://localhost:8080/autoconfig http://localhost:8080/dump SpringBoot actuator deserves a dedicated blog post to cover its vast number of features, I will cover it in my upcoming posts. I hope this article provides some basic introduction to SpringBoot and how it simplifies the Spring application development. More on SpringBoot in upcoming articles. - See more at: http://www.sivalabs.in/2014/07/springboot-introducing-springboot.html#sthash.7syCIt8V.dpuf
July 4, 2014
by Siva Prasad Reddy Katamreddy
· 12,423 Views · 5 Likes
article thumbnail
Reporting Back from MongoDB World 2014, NYC, Planet JSON
Closely approaching the one year mark of when I first joined MongoLab (and the MongoDB community), I had the pleasure of attending the inaugural MongoDB World conference put together by the incredible MongoDB team. Second only to the excitement around major MongoDB feature announcements was the collective disbelief that this was MongoDB’s first multi-day conference ever. A big congratulations to all those that worked hard to put on such a massive (did you see the Intrepid!?) event. All this planning would have been for naught if MongoDB leaders and engineers failed to deliver announcements and features that would meet and exceed expectations. From major public cloud announcements to the reveal of document-level locking in version 2.8, developers and conference goers had plenty to be excited about. There was a lot to digest from the conference… we’ll cover the major highlights in case you missed them. Big announcements in public cloud Our time at the MongoLab booth yielded many high-quality conversations, predominantly those about offloading previously internal processes and workloads to the public cloud. It was remarkable to see and hear so many enterprise teams with the exact same message: the public cloud is the future, and the future is now. It’s no surprise then that MongoDB, Inc. released not one, but two press releases around MongoDB solutions for the public cloud. Fully-managed MongoDB on the Microsoft Azure Store Nearly one year ago, MongoDB, Inc. chose to partner with the MongoLab team to build a production-ready MongoDB solution for developers on Microsoft Azure. On the first day of World, MongoDB, Inc. announced the product of our collaboration – a fully-managed highly available MongoDB-as-a-Service Add-On offering on the Microsoft Azure Store. This new service runs MongoDB Enterprise and offers replication, monitoring and support from MongoDB, Inc. It’s also backed by MongoDB Management Service (MMS), allowing for point-in-time recovery of MongoDB deployments. Now, teams without the expertise or resources to manage their MongoDB deployment(s) can outsource all the database operations (monitoring and alerting, backups, performance tuning, etc.) to both MongoLab and MongoDB’s expert support teams. You can check out the MongoDB add-on in the Azure Store: https://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/ MongoDB solutions on Google Cloud Platform MongoDB, Inc. also announced the arrival of new resources to help Google Cloud Platform customers deploy MongoDB on Google Compute Engine. These resources include a “Click to Deploy” feature and a MongoDB on Google Compute Engine Solutions paper covering MongoDB best practices. If you are looking for a fully-managed solution, with automated provisioning, backups, integrated monitoring and alerting, along with expert support, MongoLab recently announced the arrival of production-ready replica sets on Google. Product Roadmap – MongoDB version 2.8 On the second day of MongoDB World, Eliot Horowitz, MongoDB, Inc. CTO & Co-founder, took center stage and announced two huge changes to the MongoDB core project: document-level locking and pluggable storage engines. These features not only reflect improvements to the core project, but also signal to the community that the MongoDB team is listening to its users and is capable of delivering the software needed to power the workloads of tomorrow. Document-level locking The slides above from Eliot’s keynote point to a current obstacle (database-level locking) in MongoDB that limits overall scalability. With database-level locking, any write operation to the database holds the write lock and prevents subsequent writes from executing on the database until the original operation holding the write lock completes. Eliot’s announcement of document-level locking moves the write lock contention from the database level to the document (MongoDB equivalent to SQL “records”) level. This change will allow users to achieve much higher write throughput (we saw a 10x performance improvement in the live demo) across their MongoDB deployments, improving write scalability. If you’d like to try out document-level locking, the MongoDB team has already pushed the feature to the master branch on GitHub. This should only be used for experimentation, not to be run in production. Pluggable storage engine As MongoDB matures, feature releases like document level locking will continue to allow developers to build robust systems on top of MongoDB. But as the number of use cases grows, different tooling tailored to specific use cases may prove to be extremely beneficial. For example, if Company X decides that they want to use MongoDB to warehouse some of their data, they would likely want to optimize their database for slow-moving data and storage efficiency (compression). With the introduction of pluggable storage engines, many new possibilities are open to the community. Teams can now write their own storage engine for a particular use case, configure replica set nodes with different storage engines for specific situations, or collaborate with the open-source community to architect innovative solutions. This feature not only allows for more granular control of the database, but also encourages the MongoDB community to work together. Takeaways: A maturing and thriving ecosystem Roughly a year ago, MongoLab CTO Todd Dampier recapped MongoSF 2013 and spoke to the health of the MongoDB ecosystem. How far we’ve come! After attending the inaugural MongoDB World and chatting with MongoDB Masters, interns, hackathon winners, power users and those new to the community, the enthusiasm is still surging and as positive as ever. This enthusiasm is well placed. Developers and hackers use MongoDB because so much rich data on the web is shared as JSON (think Facebook, Twitter, Google, etc.). As a result, MongoDB is the de-facto database for hackathons and bootstrapped projects. Just learn the API for the site you want to mine, throw the JSON in MongoDB and query your data with the rich query language- it’s that easy. The MongoDB ecosystem is maturing as well. Take a look at the Customer Success Stories and you’ll get a feel for the extent in which enterprises leverage the solution and use it in production. To further drive enterprise adoption, MongoDB, Inc.’s public cloud solutions and product roadmap features aim to help teams run MongoDB in production and give teams the confidence that MongoDB will continue to improve scalability and meet their growing project requirements. Congratulations again to the MongoDB team on their big announcements and for creating such a fantastic forum at which to learn and meet fellow MongoDB users. Our team at MongoLab had a great time making new friends and talking shop; we look forward to meeting more MongoDB users soon (at a MongoDB Days near you)! -Chris@MongoLab
July 2, 2014
by Chris Chang
· 6,474 Views
article thumbnail
Why %util Number from Iostat is Meaningless for MySQL Capacity Planning
Earlier this month I wrote about vmstat iowait cpu numbers, and some of the comments I got were advertising the use of util% as reported by the iostat tool instead. I find this number even more useless for MySQL performance tuning and capacity planning. Now let me start by saying this is a really tricky and deceptive number. Many DBAs who report instances of their systems having a very busy IO subsystem said the util% in vmstat was above 99% and therefore they believe this number is a good indicator of an overloaded IO subsystem. Indeed – when your IO subsystem is busy, up to its full capacity, the utilization should be very close to 100%. However, it is perfectly possible for the IO subsystem and MySQL with it to have plenty more capacity than when utilization is showing 100% – as I will show in an example. Before that though lets see what the iostat manual page has to say on this topic – from this main page we can read: %util Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits. Which says right here that the number is useless for pretty much any production database server that is likely to be running RAID, Flash/SSD, SAN or cloud storage (such as EBS) capable of handling multiple requests in parallel. Let’s look at the following illustration. I will run sysbench on a system with a rather slow storage data size larger than buffer pool and uniform distribution to put pressure on the IO subsystem. I will use a read-only benchmark here as it keeps things more simple… sysbench –num-threads=1 –max-requests=0 –max-time=6000000 –report-interval=10 –test=oltp –oltp-read-only=on –db-driver=mysql –oltp-table-size=100000000 –oltp-dist-type=uniform –init-rng=on –mysql-user=root –mysql-password= run I’m seeing some 9 transactions per second, while disk utilization from iostat is at nearly 96%: [ 80s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 171.82ms (95%) [ 90s] threads: 1, tps: 9.20, reads/s: 128.80, writes/s: 0.00 response time: 157.72ms (95%) [ 100s] threads: 1, tps: 9.00, reads/s: 126.00, writes/s: 0.00 response time: 215.38ms (95%) [ 110s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 141.39ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 127.90 0.70 4070.40 28.00 31.87 1.01 7.83 7.52 96.68 This makes a lot of sense – with read single thread read workload the drive should be only used getting data needed by the query, which will not be 100% as there is some extra time needed to process the query on the MySQL side as well as passing the result set back to sysbench. So 96% utilization; 9 transactions per second, this is a close to full-system capacity with less than 5% of device time to spare, right? Let’s run a benchmark with more concurrency – 4 threads at the time; we’ll see… [ 110s] threads: 4, tps: 21.10, reads/s: 295.40, writes/s: 0.00 response time: 312.09ms (95%) [ 120s] threads: 4, tps: 22.00, reads/s: 308.00, writes/s: 0.00 response time: 297.05ms (95%) [ 130s] threads: 4, tps: 22.40, reads/s: 313.60, writes/s: 0.00 response time: 335.34ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 295.40 0.90 9372.80 35.20 31.75 4.06 13.69 3.38 100.01 So we’re seeing 100% utilization now, but what is interesting – we’re able to reclaim much more than less than 5% which was left if we look at utilization – throughput of the system increased about 2.5x Finally let’s do the test with 64 threads – this is more concurrency than exists at storage level which is conventional hard drives in RAID on this system… [ 70s] threads: 64, tps: 42.90, reads/s: 600.60, writes/s: 0.00 response time: 2516.43ms (95%) [ 80s] threads: 64, tps: 42.40, reads/s: 593.60, writes/s: 0.00 response time: 2613.15ms (95%) [ 90s] threads: 64, tps: 44.80, reads/s: 627.20, writes/s: 0.00 response time: 2404.49ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 601.20 0.80 19065.60 33.60 31.73 65.98 108.72 1.66 100.00 In this case we’re getting 4.5x of throughput compared to single thread and 100% utilization. We’re also getting almost double throughput of the run with 4 thread where 100% utilization was reported. This makes sense – there are 4 drives which can work in parallel and with many outstanding requests they are able to optimize their seeks better hence giving a bit more than 4x. So what have we so ? The system which was 96% capacity but which could have driven still to provide 4.5x throughput – so it had plenty of extra IO capacity. More powerful storage might have significantly more ability to run requests in parallel so it is quite possible to see 10x or more room after utilization% starts to be reported close to 100% So if utilization% is not very helpful what can we use to understand our database IO capacity better ? First lets look at the performance reported from those sysbench runs. If we look at 95% response time you can see 1 thread and 4 threads had relatively close 95% time growing just from 150ms to 250-300ms. This is the number I really like to look at- if system is able to respond to the queries with response time not significantly higher than it has with concurrency of 1 it is not overloaded. I like using 3x as multiplier – ie when 95% spikes to be more than 3x of the single concurrency the system might be getting to the overload. With 64 threads the 95% response time is 15-20x of the one we see with single thread so it is surely overloaded. Do we have anything reported by iostat which we can use in a similar way? It turns out we do! Check out the “await” column which tells us how much the requester had to wait for the IO request to be serviced. With single concurrency it is 7.8ms which is what this drives can do for random IO and is as good as it gets. With 4 threads it is 13.7ms – less than double of best possible, so also good enough… with concurrency of 64 it is however 108ms which is over 10x of what this device could produce with no waiting and which is surely sign of overload. A couple words of caution. First, do not look at svctm which is not designed with parallel processing in mind. You can see in our case it actually gets better with high concurrency while really database had to wait a lot longer for requests submitted. Second, iostat mixes together reads and writes in single statistics which specifically for databases and especially on RAID with BBU might be like mixing apples and oranges together – writes should go to writeback cache and be acknowledged essentially instantly while reads only complete when actual data can be delivered. The tool pt-diskstats from Percona Tookit breaks them apart and so can be much more for storage tuning for database workload. Some of the recent operating systems also ship with sysstat/iostat which breaks out await to r_await and w_await which is much more useful. Final note – I used a read-only workload on purpose – when it comes to writes things can be even more complicated – MySQL buffer pool can be more efficient with more intensive writes plus group commit might be able to commit a lot of transactions with single disk write. Still, the same base logic will apply. Summary: The take away is simple – util% only shows if a device has at least one operation to deal with or is completely busy, which does not reflect actual utilization for a majority of modern IO subsystems. So you may have a lot of storage IO capacity left even when utilization is close to 100%.
July 1, 2014
by Peter Zaitsev
· 5,869 Views
  • Previous
  • ...
  • 497
  • 498
  • 499
  • 500
  • 501
  • 502
  • 503
  • 504
  • 505
  • 506
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×