DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Languages Topics

article thumbnail
A Puppet Automation + MySQL Tutorial: Wordpress Install in 7 Short Steps
[This article was written by Koby Nachmany.] If you are familiar with configuration management (aka CM) and automation, you probably know a thing or two about Puppet, and the amazing and rich collection of modules it offers. Puppet Forge contains a wealth of third party modules that enable us to do some pretty nifty stuff with almost no effort. Puppet helps deal with the messy parts of CM, like installing binaries and running installation scripts that are tedious to do manually. Tools such as Puppet were originally created for IT operations people, that are for the most part infrastructure-centric, and are best suited for setup and maintenance of hosts in a physical data center. Dealing with applications and certainly managing applications on an elastic virtualized or even cloudified environment, brings a set of new challenges despite the agility and other benefits it provides. Now imagine we can have this goodness coupled with an intelligent orchestration framework for an entire deployment? In this blog post I'd like to demonstrate how a cloud application orchestrator can complement already existing automation processes powered by configuration management tools, in this case we will demonstrate with Puppet. I will use the nodecellar application and the popular WordPress content management framework as examples. This will hopefully provide a good introduction to Cloudify blueprints. Overview So we've seen how Cloudify 3 allows us to easily orchestrate the "nodecellar" application Read about it Cloudify blueprints here. With the "nodecellar" example, Cloudify deploys a complex application using workflows that map deployment lifecycle events to bash scripts using Cloudify's bash runner plugin. Cloudify's Puppet integration now makes this pretty easy. Cloudify 3.0 - Taking Puppet to the Next Level of Orchestration. Check it out. Go The synergy between Cloudify and Puppet not only allows you to enjoy the benefits of your Puppet environment, but it also amplifies its usability by introducing unique advantages that will answer the following common challenges involved with configuration management tools: Agent Installation: Provision your service VMs, install a Puppet agent (if you like) and wires them up with the Puppet Master. Or, if you choose to run standalone, you can install the agent with the appropriate manifests needed for that service, as well. Order of Dependencies: Define the dependencies between application stacks, services and infrastructure resources. Which will then be launched based on that order. Remote Execution and Updates: Other than the basic install/uninstall, Cloudify enables customized application workflows that allow you to execute tools like remote shell scripts on a group of instances that belong to a particular service, or to a specific instance in a group. This feature is useful to run maintenance operations, such as snapshots in the case of a database, or code pushes in a continuous deployment model. In addition, you can run puppet apply whenever you feel it's right for your service. Post Deployment: Once your application is up, Cloudify will be able to glue your monitoring tool of choice, or you can choose to use the built-in one. A robust policy engine, enables auto-healing and even auto-scaling according to your service's required SLA. I'm now going to take a deep dive on my experience with a WordPress example that I feel is a very good representation of how Puppet and Cloudify work in sync. Let's say we want to deploy the popular WordPress application stack on two VMs . Something as follows: The flow is quite simple: -server 3.5.1 with the basic following modules installed: |-- hunner-wordpress (v0.6.0) |-- puppetlabs-apache (v1.0.1) - with php mods enabled |-- puppetlabs-mysql (v2.1.0) Your site.pp file should resemble something like this: node /^apache_web.*/ { include apache class { 'wordpress': create_db => false, create_db_user => false, } } node /^mysql.*/ { class { '::mysql::server': root_password => 'password', override_options => { 'mysqld' => { 'bind_address' => '0.0.0.0' } } } include mysql::client include wordpress } As we can see, we have an Apache PHP application that will likely require a database connection string (IP, port, user and password). This is where Cloudify facilitates the "gluing" of all the pieces together, by allowing us to inject dynamic/static custom facts to the dependent node (Apache server). Cloudify supports both standalone agents and PuppetMaster environments. Step 2: Tweaking the Original WordPress Module. Some minor adaptations to the wordpress init class of the WordPress module will allow us to embed these facts during Puppet agent invocation. Below is a code snippet (With defaults truncated): class wordpress ( $db_host_ip = $cloudify_related_host_ip, $db_user, = $cloudify_properties_db_user, $db_password = $cloudify_properties_db_pw, . . ) And some tweaking to the templates/wp-config.php.erb: /** MySQL hostname */ define('DB_HOST', ''); Let's add some tags for finer control of manifest execution: The MySQL node will not require the application part to run on it, so I've excluded it using a Puppet "tag" (read more about Puppet tags). Cloudify, of course, supports this and will provide the appropriate tags during agent invocation. -> class { 'wordpress::app': tag => ['postconfigure'], install_dir => $install_dir, install_url => $install_url, version => $version, db_name => $db_name, . .} Step 3: Creating the Blueprint In a similar way to the "nodecellar" blueprint, first lets create a folder with the name of "wp_puppet" and create a blueprint.yaml file within it. This file will then serve as the blueprint file. Now let's declare the name of this blueprint. blueprint: name: wp_puppet nodes: Now we can start creating the topology. Step 4: Creating VM Nodes Since, in this case I use the OpenStack provider to create the nodes, let's import the "OpenStack types" plugin. imports: - http://www.getcloudify.org/spec/openstack-plugin/1.0/plugin.yaml Since the VMs are the same, I declared a generic template for a VM host: vm_host: derived_from: cloudify.openstack.server properties: - install_agent: true - worker_config: user: ubuntu port: 22 # example for ssh key file (see `key_name` below) # this file matches the agent key configured during the bootstrap key: ~/.ssh/agent.key # Uncomment and update `management_network_name` when working a n neutron enabled openstack - management_network_name: cfy-mng-network - server: image: 8c096c29-a666-4b82-99c4-c77dc70cfb40 flavor: 102 key_name: cfy-agnt-kp security_groups: ['cfy-agent-default', 'wp_security_group'] # This is how we inject the puppet server's ip userdata: | #!/bin/bash -ex grep -q puppet /etc/hosts || echo "x.x.x.x puppet" | sudo -A tee -a /etc/hosts Create the MySQL and Apache VMs: - name: mysql_db_vm type: vm_host instances: deploy: 1 - name: apache_web_vm type: vm_host instances: deploy: 1 Step 5: Declaring Apache and MySQL Servers Since we are using the Puppet plugin to create those servers, first we have to import it: plugins: puppet_plugin: derived_from: cloudify.plugins.agent_plugin properties: url: https://github.com/cloudify-cosmo/cloudify-puppet-plugin/archive/nightly.zip The plugin defines server types as follows: middleware_server, app_server, db_server, web_server, message_bus_server, app_module. They are virtually the same, but serve the purpose of enabling better readability for the user and GUI visualization A Puppet server type is derived_from: cloudify.types.server type, but includes some puppet-specific properties and lifecycle events. For documentation see: Puppet Types So we now will go ahead and declare the server types: cloudify.types.puppet.web_server: derived_from: cloudify.types.web_server properties: # All Puppet related configuration goes inside # the "puppet_config" property. - puppet_config interfaces: cloudify.interfaces.lifecycle: # Specifically "start" operation. Otherwise tags must be # provided. - start: puppet_plugin.operations.operation cloudify.types.puppet.app_module: derived_from: cloudify.types.app_module properties: - puppet_config interfaces: cloudify.interfaces.lifecycle: - configure: puppet_plugin.operations.operation cloudify.types.puppet.db_server: derived_from: cloudify.types.db_server properties: - puppet_config interfaces: cloudify.interfaces.lifecycle: - start: puppet_plugin.operations.operation Step 6: Instantiating the Apache and MySQL nodes: Here we provide the Puppet configuration and tags and define the relationships between the nodes. Cloudify's agent will use those relationships in order to decide the appropriate facts to inject. - name: apache_web_server type: cloudify.types.puppet.web_server properties: port: 8080 puppet_config: server: puppet environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: apache_web_vm - name: wordpress_app type: cloudify.types.puppet.app_module properties: db_user: wordpress db_pass: passwd puppet_config: server: puppet tags: ['postconfigure'] environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: apache_web_server - type: wp_connected_to_mysql target: mysql_db_server - name: mysql_db_server type: cloudify.types.puppet.db_server properties: db_user: wordpress db_pass: passwd puppet_config: server: puppet environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: mysql_db_vm Step 7: Upload the Blueprint and Create the Deployment (via CLI or GUI) Then execute your deployment (via CLI or GUI). ubuntu@koby-n-cfy3-cli:~/cosmo_cli$ cfy blueprints upload -b wp9 wordpress/blueprint.yaml ubuntu@koby-n-cfy3-cli:~/cosmo_cli$ cfy deployments create -b wp9 -d WordPress_Deployment_1 Step 8: Take a Quick Coffee Break. Step 9: Enjoy your Orchestrated WordPress Stack!
August 21, 2014
by Sharone Zitzman
· 9,139 Views
article thumbnail
JPA Tutorial: Setting Up JPA in a Java SE Environment
There are many reasons to learn an ORM tool like JPA, but it's not a magic bullet that will solve all your problems.
August 18, 2014
by MD Sayem Ahmed
· 145,910 Views · 4 Likes
article thumbnail
How to Add Tomcat 8 to Eclipse Kepler
the article represents steps required to configure tomcat 8 with eclipse kepler. download tomcat 8 and place it within any local folder. download eclipse java ee kepler as of date, tomcat 8 is not supported in eclipse javeee kepler. however, you could add the tomcat 8 by doing following: go to the wtp downloads page, select the latest version (currently 3.6), and download the zip. here’s the current link . copy the all of the files in features and plugins directories of the downloaded wtp into the corresponding eclipse directories in your eclipse folder (overwriting the existing files). start eclipse and click on “servers” tab in the workbench. go ahead and try adding a new server. you would find option for tomcat 8 available for selection as shown below. after clicking finish, you would see a new server added with the name as “tomcat v8.0 server at localhost”. start the server. check http://localhost:8080 (provided you installed tomcat 8 and set http port as 8080) interestingly, you would not see the welcome page, but the 404 error page. to get rid of that, double click on ”tomcat v8.0 server at localhost”. in the window that opens up, select “use tomcat installation” and, change deploy path from wtpwebapps to webapps. look at the figure below. restart the server and access http://localhost:8080 . you are all set.
August 8, 2014
by Ajitesh Kumar
· 84,172 Views
article thumbnail
You say Constructor Chaining, Swift says Initializer Delegation
One of the things that one must get used to with a new language is dealing with new conventions. Now coming from mostly a Java and Ruby background, there is a notion of constructors and constructor chaining, basically when one constructor calls another one. So when it comes to Swift, there are a few rules around the methods used to create objects. Firstly, the language guide refers to constructors as initializers. So basically anytime you want to use the term constructor, use the term initializer when you are in Swift land. More specifically, constructor chaining is known as initializer delegation. Initializers fall into two categories, Designated and Convenience.Designated initializers are the primary initializers and are responsible for initializing all properties of a class. So in this case, the only initializer here is known as the Designated one. class Person { let name: String let age: Int init(name: String, age: Int) { self.name = name self.age = age } } Now if you want to add other initializers to delegate to the Designated one, they will be known as Convenience initializers. So let's add one that defaults the age. Convenience initializers have the keyword conveniencebefore them. class Person { let name: String let age: Int init(name: String, age: Int) { self.name = name self.age = age } convenience init(name: String) { self.init(name: name, age: 100) } } Convenience initializers need to delegate to another Convenienceinitializer or a Designated initializer. Here's an example of a Convenience initializer calling another Convenienceone. class Person { let name: String let age: Int init(name: String, age: Int) { self.name = name self.age = age } convenience init(name: String) { self.init(name: name, age: 30) } convenience init() { self.init(name: "Homer") } } If you wanted to, you could create another Designated initializer. Now we have two Designated initializers and two Convenience initializers. class Person { let name: String let age: Int init(name: String, age: Int) { self.name = name self.age = age } init(age: Int, name: String) { self.age = age self.name = name } convenience init(name: String) { self.init(name: name, age: 30) } convenience init() { self.init(name: "Homer") } } What about subclasses? Once again, with initializers, there are certain rules that need to be adhered to when subclassing. So Designated initializers must call otherDesignated initializers in their immediate parent class. So let's take a look at an example. We are adding a Student class as a subclass of the Person class. It will add a property student number. class Student : Person { let number: String init(name: String, age: Int, number: String) { self.number = number super.init(name: name, age: age) } } Here we are calling the Designated initializer in the Person class. If we tried to call one of the Convenience initializers in the Person class, a compile error would occur. Also note, we need to assign the number property before calling the initializer in the parent class as it is a requirement to ensure all properties are initialized in child classes before their respective parent initializer is called. Summary Basically in summary, as explained in the language guide, Designatedinitializers delegate up and Convenience initializers delegate across. I hope this article helped in some way to improve your understanding of initializers in Swift. References Swift Language Guide
August 7, 2014
by Ricky Yim
· 9,969 Views
article thumbnail
Java Unit Testing Interview Questions
The article presents some of the frequently asked interview questions in relation with unit testing with Java code. Please suggest other questions tthat you came across and I shall include in the list below. What is unit testing? Which unit testing framework did you use? What are some of the common Java unit testing frameworks? Ans: Read the definition of Unit testing on Wikipedia page for unit testing. Simply speaking, unit testing is about testing a block of code in isolation. There are two popular unit testing framework in Java named as Junit, TestNG. In SDLC, When is the right time to start writing unit tests? Ans: Test-along if not test-driven; Writing unit tests towards end is not very effective. Test-along technique recommends developers to write the unit tests as they go with their development. With Junit 4, do we still need methods such as setUp and tearDown? Ans: No. This is taken care with help of @Before and @After annotations respectively What do following junit test annotations mean? Ans: Following is a list of frequently used JUnit 4 annotations:@Test (@Test identifies a test method) @Before (Ans: @Before method will execute before every JUnit4 test)@After (Ans: @After method will execute after every JUnit4 test)@BeforeClass (Ans: @BeforeClass method will be executed before JUnit test for a Class starts)@AfterClass (Ans: @AfterClass method will be executed after JUnit test for a Class is completed)@Ignore (@Ignore method will not be executed) How do one do exception handling unit tests using @Test annotation? Ans: @Test(expected={exception class}. For example: @Test(expected=IllegalArgumentException.class) Write a sample unit testing method for testing exception named as IndexOutOfBoundsException when working with ArrayList? @Test(expected=IndexOutOfBoundsException.class) public void outOfBounds() { new ArrayList
August 6, 2014
by Ajitesh Kumar
· 48,393 Views · 3 Likes
article thumbnail
Using the OpenXML SDK Productivity Tool to "decompile" Office Documents
Ode To Code - Easily Generate Microsoft Office Files From C# "... These days, Office files are no longer in a proprietary binary format, and are we can create the files directly without using COM automation. A .docx Word file, for example, is a collection of XML documents zipped into a single file. The official name of the format is Open XML. There is an SDK to help with reading and writing OpenXML, and a Productivity Tool that can generate C# code for a given file. All you need to do is load a document, presentation, or workbook into the tool and press the “Reflect Code” button. The downside to this tool is that even a simple document will generate 4,000 lines of code. Another downside is that the generated code assumes it will write directly to the file system, however it is easy to pass in an abstract Stream object instead. So while this code isn’t perfect, the code does produce valid document and..." I've been blogging about the OpenXML SDK for years now, but I think this is the first time I've seen this part of it, this utility. And like he says, 4K LoC is like, well, allot, it does look like an awesome way to learn the low level OpenXML SDK ins and outs. Related Past Post XRef: Open Sesame - Open XML SDK is now open source Using OpenXML to load an Excel Worksheet into a DataTable (or just how different OpenXML is from the old Excel API we're used too) Using OpenXML SDK to generate Word documents via templates (and without Word being installed) Checking for Microsoft Word DocX/DocM Revisions/Track Changes without using Word... (via OpenXML SDK, LINQ to XML or XML DOM) LINQ to XlsX... Using VB.Net, LINQ, the OpenXML SDK and a little C# helper, to query an Excel XlsX Using native OpenXML to create an XlsX (Which provides an example of why I highlight tools that make OpenXML easier...) Generating Xlsx's on the Server? You're using OpenXML, right? With help from the PowerTools for OpenXML? Official boat-load, as in supertanker, sized OpenXML content list (Insert "One OpenXML content list to rule them all" here) So how do I get from here to OpenXML? Got a map for you, an Open XML SDK Blog Map… Where to go to scratch your OpenXML dev info itch… "Open XML Explained" Free eBook (PDF) The Noob's Guide to Open XML Dev (If you know how to spell OpenXML but that's about it, this is your Getting Started guide...) Reusing the PowerShell PowerTools for Open XML in your C# or VB.Net world PowerShell, OpenXML, WMI and the PowerTools for OpenXML = Doc generation for our inner geek Because it’s a PowerShell kind of day… PowerTools for Open XML V1.1 Released OpenXML PowerTools updated – Cell your Excel via PowerShell Powering into OpenXML with PowerShell Open XML SDK 2.0 for Microsoft Office Released – Automate Office documents without Office Open XML 2.0 Code Snippets for VS2010 (and VS2008 too) Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008 – 52 C#/VB Code Snippets to help ease your Open XML coding Open XML File Format Code Snippets for Visual Studio 2005 (Office 2007 NOT required) Open XML SDK v1 Released OpenXML Viewer 1.0 Released – Open source DocX to HTML conversion, with IE, Firefox and Opera (and/or command line) support
July 31, 2014
by Greg Duncan
· 16,515 Views
article thumbnail
Java JMX Shutdown Gracefully with ShutdownHook example
For JMX you need an interface: package com.bos.jmx; public interface ShutdownMBean { /** * Shutdown operation */ public void shutdown() throws Exception; } The implementation is Shutdown java class above. Include this code on server startup used in my Main.java: public void initServerJMX() throws MalformedObjectNameException, InstanceAlreadyExistsException, MBeanRegistrationException, NotCompliantMBeanException { // Initialise JMX // Get the Platform MBean Server MBeanServer mbs = ManagementFactory.getPlatformMBeanServer(); // Construct the ObjectName for the Shutdown MBean we will register ObjectName mbeanName = new ObjectName("com.bos.jmx:type=Shutdown"); // Create the Shutdown MBean final Shutdown mbean = new Shutdown(); // Register the Hello World MBean mbs.registerMBean(mbean, mbeanName); Runtime.getRuntime().addShutdownHook(new Thread() { public void run() { try { mbean.shutdown(); } catch (Exception e) { e.printStackTrace(); } } }); } public void useArg0(String arg) { //sample code using arg } Include this code in client used to shutdown the server used in my Main.java: public void clientExecuteShutdown(int port) throws Exception { JMXServiceURL url = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://:" + port + "/jmxrmi"); JMXConnector jmxc = JMXConnectorFactory.connect(url, null); MBeanServerConnection mbsc = jmxc.getMBeanServerConnection(); // Construct the ObjectName for the Shutdown MBean // ObjectName mbeanName = new ObjectName("com.bos.jmx:type=Shutdown"); // Create a dedicated proxy for the MBean instead of // going directly through the MBean server connection // ShutdownMBean mbeanProxy = JMX.newMBeanProxy(mbsc, mbeanName, ShutdownMBean.class, true); logger.info("Executing shutdown..."); mbeanProxy.shutdown(); logger.info("Shutdown done."); } My sample start/shutdown code with main used in my Main.java: public static void main(String args[]) throws NumberFormatException, Exception { if (args.length < 1) { throw new RuntimeException( "1. Please specify 1st argument." + "\n2. Please specify 1st argument as shutdown and 2nd argument as port of JMX."); } if ("shutdown".equalsIgnoreCase(args[0])) { if (args.length < 2) { throw new RuntimeException( "Please specify 1st argument as shutdown and 2nd argument as port of JMX."); } clientExecuteShutdown(Integer.valueOf(args[1])); } else { useArg0(args[0]); initServerJMX(); } } This code attached also has Runtime.getRuntime().addShutdownHook() The code startup line is for server: java -Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -classpath . com.bos.jmx.Main arg0 The code for shutdown of server is: java -classpath . com.bos.jmx.Main shutdown 9999 package com.bos.jmx; public class Shutdown implements ShutdownMBean { private static volatile boolean flagRunOne = false; @Override public void shutdown() throws Exception { if (!flagRunOne) { System.out.println("Server shutting down..."); //Sample shutdown code here Main.shutdown(); System.out.println("Server shutdown."); flagRunOne = true; } } }
July 30, 2014
by Bosco Ferrao
· 7,626 Views
article thumbnail
Glassfish 4 - Performance Tuning, Monitoring and Troubleshooting
This is the third blog in C2B2 series looking at Glassfish 4. The previous two are available here: Part 1 - Getting started with Glassfish 4 Part 2 - Glassfish 4 - Features For High Availability In this blog I will look at 3 areas: Performance Tuning, where I will look at some of the areas to look at when setting up a system for production. Monitoring, where I will look at some of the tools we use for monitoring a system both during performance testing and tuning and once a system is up and running. Troubleshooting, where I will look at some of the tools you can use to help diagnose and detect performance issues. Performance Tuning Glassfish out of the box (as with most app servers) is optimised for development purposes. Developers want the ability to deploy and undeploy continuously, create and remove resources, debug, etc. However, this configuration is not suitable for a production system. When configuring any application server you have to take into account what you are trying to achieve and what is best suited for the applications you intend to run. One size does not fit all! It can be a long and complex process and I'm afraid I can't give you a one-stop solution. However, I can give you some pointers to some of the things you can do to prepare your system for production. So, what kind of things do we look at when we are looking to performance tune a Glassfish system. Some of the most common things are: JVM Settings Garbage Collection Glassfish Settings Logging JVM Settings The standard JVM defaults are not suitable for a production system. One of the simplest changes that can be made is to use the -server flag, rather than the default -client. Although the Server and Client VMs are similar, the Server VM has been specially tuned to maximise peak operating speed. It is intended for executing long-running server applications, which need the fastest possible operating speed more than a fast start-up time or smaller runtime memory footprint. Allocate more memory to the JVM by modifying the value of the -Xmx flag. How much depends on the size and complexity of your enterprise application and how much memory you have available. In addition we also want to make sure we allocate all of the memory on startup. This is done with the -Xms flag. We set the minimum and maximum perm gen to the same value in order to avoid allocation failures & subsequent full garbage collections. Garbage Collection There are a number of settings that can be tweaked regarding Garbage Collection. I'm not going to cover GC tuning as that is a whole topic all of it's own but here are some of the settings we would always recommend regarding GC in a production environment: Firstly we want to ensure we log all Garbage Collection information as this can prove extremely useful in diagnosing issues. -verbose:gc Next we want to make sure we log GC information to a file. This will make it easier to separate the GC from other details in the log files. -Xloggc:/path_to_log_file/gc.log We also want to ensure we have as much detail as possible. -XX:+PrintGCDetails and that the information is timestamped for easier diagnosis of long running errors and to be able to ascertain what normal levels are over time. -XX:+PrintGCDateStamps Finally, we want to ensure that developers aren't making explicit calls to System.gc(). Hopefully they don’t anyway and if they are you need to look into why (doing so is a bad idea since this forces major collections) but this will disable it just in case. -XX:+DisableExplicitGC Heap Dumps Heap dumps can be extremely useful for diagnosing memory issues. There are two settings we would definitely recommend. These tell the JVM to generate a heap dump when an allocation from the Java heap or the permanent generation cannot be satisfied. There is no overhead in running with these options but they can be useful for production systems where OutOfMemoryErrors can take a long time to surface. -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dumps/glassfish.hprof Configuring Glassfish There are three ways to configure Glassfish: Through the admin console By directly editing the config files Using the asadmin tool Although making changes through the admin console can often be the easiest way to make changes we’d recommend where possible to script all changes so you have a repeatable production server build. Also you should ensure copies of all config files are kept in Config Control so you know you have a working copy and can roll back to a previous version when needed. Turn off development features Turn off auto-deploy and dynamic application reloading. Both of these features are great for development, but can affect performance. Configure the JSP servlet not to check JSP files for changes on every request. Also, set the parameter genStrAsCharArray to true. This will ensure all String values are declared as static char arrays. One reason for this is that the array has less memory overhead than String. These changes will mean you cannot change JSP pages on your production server without redeploying the application, but on a production system this is generally what you want. Acceptor Threads and Request Threads There are two main thread values we would recommend setting, acceptor threads and request threads. Acceptor threads are used to accept new connections to the server and to schedule existing connections when a new request comes in. Set this value equal to the number of CPU cores in your server. So, if you have two quad core CPUs, this value should be set to eight. Request threads run HTTP requests. You want enough of these to keep the machine busy, but not so many that they compete for CPU resources which would cause your throughput to suffer greatly. Static resources By default, GlassFish does not tell the client to cache static resources. It is recommended to cache static resources, like CSS files and images particularly if you have a lot of them. Thread pools Max thread pool and min pool size should be set to the same value. Specifying the same value will allow GlassFish to use a slightly more optimised thread pool. This configuration should be considered unless the load on the server varies significantly. Increasing this value will reduce HTTP response latency times. What to set these values to depends heavily on what your application is doing. In order to get this value right you should look to incrementally increase the thread count and to monitor performance after each incremental increase. When performance stops improving stop increasing the thread count. Logging You should look to turn off as much logging as possible. In a production environment we would generally recommend logging at WARN and above. This includes the logging done by Glassfish as well as your own applications. Monitoring The fewer monitoring options that are enabled, the better the server's performance. All Glassfish monitoring is turned off by default. Switching monitoring on can be very useful when diagnosing issues and when doing initial system testing and performance tuning for monitoring what changes. What to monitor Used Heap Size - Compare this number with the maximum allowed heap size to see what portion of the heap is in use. If the used heap size nears the max heap size, the garbage collector urgently attempts to free memory and this is something that should be avoided where possible. Number of loaded classes - Useful for detecting performance and application development trends. JVM Threads - Important for performance tuning and for troubleshooting JVM crashes. Some of the most essential indicators are the current active JVM thread count and the peak values. Thread pools - You should compare a pools current usage with the maximum number allowed. Problems can start to occur when the current count nears the max threads number. JVM Tools for Monitoring The following is a list of a a few of the tools that come with the JDK that are useful for monitoring information from the JVM. jstat - This tool displays performance statistics regarding usage of the perm gen, new gen and old gen. It also provides class loading and compilation statistics jmap - Gives you visibility of memory usage, can produce a class histogram and can dump the memory to a file jconsole/jvisualvm - These tools can display all the previously mentioned monitoring indicators and graph them over time. This allows you to spot trends and to get a better overall picture of your normal performance levels and changes over time. Note - These should NOT be left running permanently on a production system! Troubleshooting Unfortunately, no matter how much tuning and testing you do all systems WILL go wrong from time to time. So, what should you do when your production server bursts into flames? Well, in that situation you should call the fire service but for more general problems: Gather data - get as much data as you can, there is no such thing as too much! Analyse that data - Data is worthless when you don’t know what it means. Visualise where possible – graphs and charts reveal trends and patterns over time Make educated decisions - Only make decisions based on data. If you go with your “gut instinct” and what “feels right” you will probably make things worse Gathering data First up, for most of the JVM tools you will need the process ID of the server. You can get this information in various ways. Two of the simplest are: jps -v This will list all current running Java processes. The -v flag is for verbose output. ps aux | grep glassfish The ps command with the options aux will show all processes from all users. This will display a LOT of information so pipe it through grep to filter for the glassfish process As mentioned earlier the jstat tool can be used for gathering info on JVM performance. Other useful tools include: jstack This will produce thread stack dumps for all threads running in the JVM. This can be very useful for discovering stuck threads or long running threads. jmap This tool can be used to create a heap dump. It outputs to a file in .hprof format which can be read by a number of analysis tools jrcmd and jrmc These tools are only available with the jRockit JDK. I won't go into any detail here as I have previously blogged about jrcmd here: http://blog.c2b2.co.uk/2012/11/troubleshooting-jrockit-using-jrcmd.html and my colleague has blogged about jrmc here: http://blog.c2b2.co.uk/2012/10/weblogic-troubleshooting-with-jrockit.html Glassfish asadmin The Glassfish asadmin tool has a built in command which will provide similar functionality to the above tools but without the need for the PID. asadmin generate-jvm-report --type=[type] Analysing the data There are various tools available for analysing performance data. The following are some of the most useful: IBM Support Assistant is a free troubleshooting application that helps you research, analyze, and resolve problems using various support features and tools. It contains a Garbage Collection and Memory Visualiser as well as a Heap Analyser. It will also provide a report telling you where issues might exist, and listing red flags with advice on what to change in your applications jRockit Mission Control is a very powerful tool which can be used to monitor live systems or analyse historical data in the form of flight recordings. JVisualVM GCViewer is an optional plugin for jVisualVM which can transform a tool which is already great for live monitoring into a powerful analysis tool jhat is a Java Heap Analysis Tool. It processes heap dump files and produces HTML reports. There are better analysis tools, but it’s always freely available if you’re running a JDK. Others There are many open source and freely available tools and projects to help you, here we’ve covered some very common and widely used ones, but our list is by no means exhaustive! Conclusion Remember, Glassfish out of the box (or out of the zip file!) is not designed to be run 'as is'. You should also note that there is no ideal configuration that will work for all systems. It will take time and effort to get the best configuration for what you require. Hopefully in this blog I have given you some useful guidelines and pointers. You should take time to work out what you want in terms of services, then strip back your config to match that. You should test, test and test again to ensure that your configuration matches the requirements with regards to the applications you will be running on your server. You should tune your JVM to ensure you have the best settings for your particular configuration. You should ensure you have monitoring in place to keep a check on everything and ensure that if your server does crash you have as much information as possible at hand to diagnose what caused it. The next blog in this series looks at Migrating to Glassfish 4: http://blog.c2b2.co.uk/2013/07/glassfish-4-migrating-to-glassfish.html
July 30, 2014
by Andy Overton
· 24,807 Views
article thumbnail
AngularJS + TypeScript – How To Setup a Watch (And 2 Ways to Do it Wrong)
Introduction After setting up my initial application as described in my previous post, I went about to set up a watch. For those who don’t know what that is – it’s basically a function that gets triggered when an scope object or part of that changes. I have found 4 ways to set it up, and only one seems to be (completely) right. In JavaScript, you would set up a watch like this sample I nicked from Stack Overflow: function MyController($scope) { $scope.myVar = 1; $scope.$watch('myVar', function() { alert('hey, myVar has changed!'); }); $scope.buttonClicked = function() { $scope.myVar = 2; // This will trigger $watch expression to kick in }; } So how would you go about in TypeScript? Turns out there are a couple of ways that compile but don’t work, partially work, or have unexpected side effects. For my demonstration, I am going to use the DemoController that I made in my previous post. Incorrect method #1 – 1:1 translation. /// /// module App.Controllers { "use strict"; export class DemoController { static $inject = ["$scope"]; constructor(private $scope: Scope.IDemoScope) { if (this.$scope.person === null || this.$scope.person === undefined) { this.$scope.person = new Scope.Person(); } this.$scope.$watch(this.$scope.person.firstName, () => { alert("person.firstName changed to " + this.$scope.person.firstName); }); } public clear(): void { this.$scope.person.firstName = ""; this.$scope.person.lastName = ""; } } } The new part is in red. Very cool – we even use the inline ‘delegate-like’ notation do define the handler inline. This seems plausible, but does not work. What it does is, on startup, give the message “person.firstName changed to undefined” and then it never, ever does anything again. I have spent quite some time looking at this. Don’t do the same – read on. Incorrect method #2 – not catching the first call To fix the problem above, you need to use the delegate notation at the start as well: this.$scope.$watch(() => this.$scope.person.firstName, () => { alert("person.firstName changed to " + this.$scope.person.firstName); }); See the difference? As you now type a “J” in the top text box, you immediately get a “person.firstName changed to J” alert. Making it almost impossible to type. But you get the drift. But then we arrive at the next problem – this is still not correct: it goes off initially, when nothing has changed yet. This is undesirable in most occasions. The correct way It appears the callback actually has a few overloads with a couple of parameters, of which I usually only use oldValue and newValue to detect a real change. Kinda like you do in an INotifyPropertyChanged property: this.$scope.$watch(() => this.$scope.person.firstName, (oldValue: string, newValue: string) => { if (oldValue !== newValue) { alert("person.firstName changed to " + this.$scope.person.firstName); } }); Now it only goes off when there’s a real change in the watched property. …and possibly and even better way I am not really a fan of a lambda calling a lambda in a method call, so I would most probably refactor this to constructor(private $scope: Scope.IDemoScope) { if (this.$scope.person === null || this.$scope.person === undefined) { this.$scope.person = new Scope.Person(); } this.$scope.$watch(() => this.$scope.person.firstName, (oldValue: string, newValue: string) => { this.tellmeItChanged(oldValue, newValue); }); } private tellmeItChanged(oldValue: string, newValue: string) { if (oldValue !== newValue) { alert("person.firstName changed to " + this.$scope.person.firstName); } } as I think this is just a bit more readable, especially if you are going to do more complex things in the callback. Demo solution can be found here
July 28, 2014
by Joost van Schaik
· 14,831 Views
article thumbnail
Data-driven Unit Testing in Java
Data-driven testing is a powerful way of testing a given scenario with different combinations of values. In this article, we look at several ways to do data-driven unit testing in JUnit. Suppose, for example, you are implementing a Frequent Flyer application that awards status levels (Bronze, Silver, Gold, Platinum) based on the number of status points you earn. The number of points needed for each level is shown here: level minimum status points result level Bronze 0 Bronze Bronze 300 Silver Bronze 700 Gold Bronze 1500 Platinum Our unit tests need to check that we can correctly calculate the status level achieved when a frequent flyer earns a certain number of points. This is a classic problem where data-driven tests would provide an elegant, efficient solution. Data-driven testing is well-supported in modern JVM unit testing libraries such as Spock and Spec2. However, some teams don’t have the option of using a language other than Java, or are limited to using JUnit. In this article, we look at a few options for data-driven testing in plain old JUnit. Parameterized Tests in JUnit JUnit provides some support for data-driven tests, via the Parameterized test runner. A simple data-driven test in JUnit using this approach might look like this: @RunWith(Parameterized.class) public class WhenEarningStatus { @Parameters(name = "{index}: {0} initially had {1} points, earns {2} points, should become {3} ") public static Iterable data() { return Arrays.asList(new Object[][]{ {Bronze, 0, 100, Bronze}, {Bronze, 0, 300, Silver}, {Bronze, 100, 200, Silver}, {Bronze, 0, 700, Gold}, {Bronze, 0, 1500, Platinum}, }); } private Status initialStatus; private int initialPoints; private int earnedPoints; private Status finalStatus; public WhenEarningStatus(Status initialStatus, int initialPoints, int earnedPoints, Status finalStatus) { this.initialStatus = initialStatus; this.initialPoints = initialPoints; this.earnedPoints = earnedPoints; this.finalStatus = finalStatus; } @Test public void shouldUpgradeStatusBasedOnPointsEarned() { FrequentFlyer member = FrequentFlyer.withFrequentFlyerNumber("12345678") .named("Joe", "Jones") .withStatusPoints(initialPoints) .withStatus(initialStatus); member.earns(earnedPoints).statusPoints(); assertThat(member.getStatus()).isEqualTo(finalStatus); } } You provide the test data in the form of a list of Object arrays, identified by the _@Parameterized@ annotation. These object arrays contain the rows of test data that you use for your data-driven test. Each row is used to instantiate member variables of the class, via the constructor. When you run the test, JUnit will instantiate and run a test for each row of data. You can use the name attribute of the @Parameterized annotation to provide a more meaningful title for each test. There are a few limitations to the JUnit parameterized tests. The most important is that, since the test data is defined at a class level and not at a test level, you can only have one set of test data per test class. Not to mention that the code is somewhat cluttered - you need to define member variables, a constructor, and so forth. Fortunatly, there is a better option. Using JUnitParams A more elegant way to do data-driven testing in JUnit is to use [https://code.google.com/p/junitparams/|JUnitParams]. JUnitParams (see [http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22JUnitParams%22|Maven Central] to find the latest version) is an open source library that makes data-driven testing in JUnit easier and more explicit. A simple data-driven test using JUnitParam looks like this: @RunWith(JUnitParamsRunner.class) public class WhenEarningStatusWithJUnitParams { @Test @Parameters({ "Bronze, 0, 100, Bronze", "Bronze, 0, 300, Silver", "Bronze, 100, 200, Silver", "Bronze, 0, 700, Gold", "Bronze, 0, 1500, Platinum" }) public void shouldUpgradeStatusBasedOnPointsEarned(Status initialStatus, int initialPoints, int earnedPoints, Status finalStatus) { FrequentFlyer member = FrequentFlyer.withFrequentFlyerNumber("12345678") .named("Joe", "Jones") .withStatusPoints(initialPoints) .withStatus(initialStatus); member.earns(earnedPoints).statusPoints(); assertThat(member.getStatus()).isEqualTo(finalStatus); } } Test data is defined in the @Parameters annotation, which is associated with the test itself, not the class, and passed to the test via method parameters. This makes it possible to have different sets of test data for different tests in the same class, or mixing data-driven tests with normal tests in the same class, which is a much more logical way of organizing your classes. JUnitParam also lets you get test data from other methods, as illustrated here: @Test @Parameters(method = "sampleData") public void shouldUpgradeStatusFromEarnedPoints(Status initialStatus, int initialPoints, int earnedPoints, Status finalStatus) { FrequentFlyer member = FrequentFlyer.withFrequentFlyerNumber("12345678") .named("Joe", "Jones") .withStatusPoints(initialPoints) .withStatus(initialStatus); member.earns(earnedPoints).statusPoints(); assertThat(member.getStatus()).isEqualTo(finalStatus); } private Object[] sampleData() { return $( $(Bronze, 0, 100, Bronze), $(Bronze, 0, 300, Silver), $(Bronze, 100, 200, Silver) ); } The $ method provides a convenient short-hand to convert test data to the Object arrays that need to be returned. You can also externalize @Test @Parameters(source=StatusTestData.class) public void shouldUpgradeStatusFromEarnedPoints(Status initialStatus,int initialPoints, int earnedPoints,Status finalStatus){ ... } The test data here comes from a method in the StatusTestData class: public class StatusTestData{ public static Object[] provideEarnedPointsTable(){ return $( $(Bronze,0, 100,Bronze), $(Bronze,0, 300,Silver), $(Bronze,100,200,Silver) ); } } This method needs to be static, return an object array, and start with the word "provide". Getting test data from external methods or classes in this way opens the way to retrieving test data from external sources such as CSV or Excel files. JUnitParam provides a simple and clean way to implement data-driven tests in JUnit, without the overhead and limitations of the traditional JUnit parameterized tests. Testing with non-Java languages If you are not constrained to Java and/or JUnit, more modern tools such as Spock (https://code.google.com/p/spock/) and Spec2 provide great ways of writing clean, expressive unit tests in Groovy and Scala respectively. In Groovy, for example, you could write a test like the following: class WhenEarningStatus extends Specification{ def"should earn status based on the number of points earned"(){ given: def member =FrequentFlyer.withFrequentFlyerNumber("12345678") .named("Joe","Jones") .withStatusPoints(initialPoints) .withStatus(initialStatus); when: member.earns(earnedPoints).statusPoints() then: member.status == finalStatus where: initialStatus | initialPoints | earnedPoints | finalStatus Bronze |0 |100 |Bronze Bronze |0 |300 |Silver Bronze |100 |200 |Silver Silver |0 |700 |Gold Gold |0 |1500 |Platinum } } John Ferguson Smart is a specialist in BDD, automated testing, and software life cycle development optimization, and author of BDD in Action and other books. John runsregular courses in Australia, London and Europe on related topics such as Agile Requirements Gathering, Behaviour Driven Development, Test Driven Development, andAutomated Acceptance Testing. Blog Links >>
July 27, 2014
by John Ferguson Smart
· 24,670 Views · 1 Like
article thumbnail
How to Instantly Improve Your Java Logging With 7 Logback Tweaks
the benchmark tests to help you discover how logback performs under pressure logging is essential for server-side applications but it comes at a cost. it’s surprising to see though how much impact small changes and configuration tweaks can have on an app’s logging throughput. in this post we will benchmark logback ’s performance in terms of log entries per minute. we’ll find out which appenders perform best, what is prudent mode, and what are some of the awesome side effects of async methods, sifting and console logging. let’s get to it. the groundwork for the benchmark at its core, logback is based on log4j with tweaks and improvements under ceki gülcü ’s vision. or as they say, a better log4j . it features a native slf4j api, faster implementation, xml configuration, prudent mode, and a set of useful appenders which i will elaborate on shortly. having said that, there are quite a few ways to log with the different sets of appenders, patterns and modes available on logback. we took a set of commonly used combinations and put them to a test on 10 concurrent threads to find out which can run faster. the more log entries written per minute, the more efficient the method is and more resources are free to serve users. it’s not exact science but to be more precise, we’ve ran each test 5 times, removed the top and bottom outliers and took the average of the results. to try and be fair, all log lines written also had an equal length of 200 characters. ** all code is available on github right here . the test was run on a debian linux machine running on intel i7-860 (4 core @ 2.80 ghz) with 8gb of ram. first benchmark: what’s the cost of synchronous log files? first we took a look at the difference between synchronous and asynchronous logging. both writing to a single log file, the fileappender writes entries directly to file while the asyncappender feeds them to a queue which is then written to file. the default queue size is 256, and when it’s 80% full it stops letting in new entries of lower levels (except warn and error). the table compares between the fileappender and different queue sizes for the asyncappender. async came on top with the 500 queue size. tweak #1: asyncappender can be 3.7x faster than the synchronous fileappender. actually, it’s the fastest way to log across all appenders. it performed way better than the default configuration that even trails behind the sync fileappender which was supposed to finish last. so what might have happened? since we’re writing info messages, and doing so from 10 concurrent threads, the default queue size might have been too small and messages could have been lost to the default threshold. looking at results of the 500 and 1,000,000 queue sizes, you’ll notice that their throughput was similar so queue size and threshold weren’t an issue for them. tweak #2: the default asyncappender can cause a 5 fold performance cut and even lose messages. make sure to customize the queue size and discardingthreshold according to your needs. 500 0 ** setting an asyncappender’s queuesize and discardingthreshold second benchmark: do message patterns really make a difference? now we want to see the effect of log entry patterns on the speed of writing. to make this fair we kept the log line’s length equal (200 characters) even when using different patterns. the default logback entry includes the date, thread, level, logger name and message, by playing with it we tried to see what the effects on performance might be. this benchmark demonstrates and helps see up close the benefit of logger naming conventions. just remember to change its name accordingly to the class you use it in. tweak #3: naming the logger by class name provides 3x performance boost. taking the loggers or the threads name off added some 40k-50k entries per minute. no need to write information you’re not going to use. going minimal also proved to be a bit more effective. tweak #4: compared to the default pattern, using only the level and message fields provided 127k more entries per minute. third benchmark: dear prudence, won’t you come out to play? in prudent mode a single log file can be accessed from multiple jvms. this of course takes a hit on performance because of the need to handle another lock. we tested prudent mode on 2 jvms writing to a single file using the same benchmark we ran earlier. prudent mode takes a hit as expected, although my first guess was that the impact would be a stronger. tweak #5: use prudent mode only when you absolutely need it to avoid a throughput decrease. logs/test.log true ** configuring prudent mode on a fileappender fourth benchmark: how to speed up synchronous logging? let’s see how synchronous appenders other than the fileappender perform. the consoleappender writes to system.out or system.err (defaults to system.out) and of course can also be piped to a file. that’s how we we’re able to count the results. the socketappender writes to a specified network resource over a tcp socket. if the target is offline, the message is dropped. otherwise, it’s received as if it was generated locally. for the benchmark, the socket was was sending data to the same machine so we avoided network issues and concerns. to our surprise, explicit file access through fileappender is more expensive than writing to console and piping it to a file. the same result, a different approach, and some 200k more log entries per minute. socketappender performed similarly to fileappender in spite of adding serialization in between, the network resource if existed would have beared most of the overhead. tweak #6: piping consoleappender to a file provided 13% higher throughput than using fileappender. fifth benchmark: now can we kick it up a notch? another useful method we have in our toolbelt is the siftingappender. sifting allows to break the log to multiple files. our logic here was to create 4 separate logs, each holding the logs of 2 or 3 out of the 10 threads we run in the test. this is done by indicating a discriminator, in our case, logid, which determines the file name of the logs: logid unknown logs/sift-${logid}.log false ** configuring a siftingappender once again our fileappender takes a beat down. the more output targets, the less stress on the locks and fewer context switching. the main bottleneck in logging, same as with the async example, proves to be synchronising a file. tweak #7: using a siftingappender can allow a 3.1x improvement in throughput. conclusion we found that the way to achieve the highest throughput is by using a customized asyncappender. if you must use synchronous logging, it’s better to sift through the results and use multiple files by some logic. i hope you’ve found the insights from the logback benchmark useful and look forward to hear your thoughts at the comments below. originally posted in takipi's blog
July 25, 2014
by Chen Harel
· 20,570 Views
article thumbnail
Swiss Java Knife - A useful tool to add to your diagnostic tool-kit?
As a support consultant at C2B2 I am always looking for handy tools that may be able to help me or my team in diagnosing our customers middleware issues. So, when I came across a project called Swiss Java Knife promising tools for 'JVM monitoring, profiling and tuning' I figured I should take a look. It's basically a single jar file that allows you to run a number of tools most of which are similar to the ones that come bundled with the JDK. If you're interested in those tools my colleague Matt Brasier did a good introductory webinar which is available here: http://www.c2b2.co.uk/jvm_webinar_video Downloading Firstly I downloaded the latest jar file from github: https://github.com/aragozin/jvm-tools The source code is also available but for the purposes of this look into what it can offer the jar will suffice. What does it offer? Swiss Java Knife offers a number of commands: jps - Similar to the jps tool that comes with the JDK. ttop - Similar to the linux top command. hh - Similar to running the jmap tool that comes with the JDK with the -histo option. gc - Reports information about GC in real time. mx - Allows you to do basic operations with MBeans from the command line. mxdump - Dumps all MBeans of the target java process to JSON. Testing In order to test out the commands that are available I set up a Weblogic server and deployed an app containing a number of servlets that have known issues. These are then called via JMeter to show certain server behaviour: excessive Garbage Collection high CPU usage a memory leak Finding the process ID Normally to find the process ID I'd use the jps command that comes with the JDK. Swiss Java Knife has it's own version of the jps command so I tried that instead. Running the command: java -jar sjk-plus-0.1-2013-09-06.jar jps gives the following output: 5402org.apache.derby.drda.NetworkServerControl start 3250weblogic.Server 4032./ApacheJMeter.jar 3172weblogic.NodeManager -v 5427weblogic.Server 6523sjk-plus-0.1-2013-09-06.jar jps Which is basically the same as running the jps command with the -l option. There are a couple of additions where you can add filter options allowing you to pass in wild cards to match process descriptions or JVM system properties but overall it adds very little to the standard jps tool. jps -lv will generally give you everything you need. OK, so now we've got the process ID of our server we can start to look at what is going on. First of all, lets check garbage collection. Checking garbage collection OK. Now this one looks more promising. Swiss Java Knife has a command for collecting real time GC statistics. Let's give it a go. So, running the following command without my dodgy servlet running should give us a 'standard' reading: java -jar sjk-plus-0.1-2013-09-06.jar gc -p 3016 [GC: PS Scavenge#10471 time: 6ms interval: 113738ms mem: PS Survivor Space: 0k+96k->96k[max:128k,rate:0.84kb/s] PS Old Gen: 78099k+0k->78099k[max:349568k,rate:0.00kb/s] PS Eden Space: 1676k-1676k->0k[max:174464k,rate:-14.74kb/s]] [GC: PS MarkSweep#10436 time: 192ms interval: 40070ms mem: PS Survivor Space: 96k-96k->0k[max:128k,rate:-2.40kb/s] PS Old Gen: 78099k+7k->78106k[max:349568k,rate:0.19kb/s] PS Eden Space: 0k+0k->0k[max:174400k,rate:0.00kb/s]] PS Scavenge[ collections: 31 | avg: 0.0057 secs | total: 0.2 secs ] PS MarkSweep[ collections: 9 | avg: 0.1980 secs | total: 1.8 secs ] OK. Looks good. Useful to be able to get runtime GC info without having to rely on GC logs which are often not available. After running my dodgy servlet (containing a number System.gc() calls) we see the following: [GC: PS Scavenge#9787 time: 5ms interval: 38819ms mem: PS Survivor Space: 0k+64k->64k[max:192k,rate:1.65kb/s] PS Old Gen: 78062k+0k->78062k[max:349568k,rate:0.00kb/s] PS Eden Space: 204k-204k->0k[max:174336k,rate:-5.28kb/s]] [GC: PS MarkSweep#10200 time: 155ms interval: 112488ms mem: PS Survivor Space: 64k-64k->0k[max:192k,rate:-0.57kb/s] PS Old Gen: 78071k+0k->78071k[max:349568k,rate:0.00kb/s] PS Eden Space: 0k+0k->0k[max:174336k,rate:0.00kb/s]] PS Scavenge[ collections: 666 | avg: 0.0046 secs | total: 3.1 secs ] PS MarkSweep[ collections: 689 | avg: 0.1588 secs | total: 109.4 secs ] A big difference and although not a particularly realistic scenario it's certainly a useful tool for being able to quickly view runtime GC info. Next up we'll take a look at CPU usage. Checking CPU usage Swiss Java Knife has a command that works in a similar way to the linux top command which displays the top CPU processes. Running the following command should give us the top 10 CPU processes when running normally: java -jar sjk-plus-0.1-2013-09-06.jar ttop -n 10 -p 5427 -o CPU 2014-03-11T08:56:33.120-0700 Process summary process cpu=2.21% application cpu=0.67% (user=0.30% sys=0.37%) other: cpu=1.54% heap allocation rate 245kb/s [000001] user= 0.00% sys= 0.00% alloc= 0b/s - main [000002] user= 0.00% sys= 0.00% alloc= 0b/s - Reference Handler [000003] user= 0.00% sys= 0.00% alloc= 0b/s - Finalizer [000004] user= 0.00% sys= 0.00% alloc= 0b/s - Signal Dispatcher [000010] user= 0.00% sys= 0.00% alloc= 0b/s - Timer-0 [000011] user= 0.00% sys= 0.01% alloc= 96b/s - Timer-1 [000012] user= 0.00% sys= 0.01% alloc= 20b/s - [ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' [000013] user= 0.00% sys= 0.00% alloc= 0b/s - weblogic.time.TimeEventGenerator [000014] user= 0.00% sys= 0.04% alloc= 245b/s - weblogic.timers.TimerThread [000017] user= 0.00% sys= 0.00% alloc= 0b/s - Thread-7 So far so good, minimal CPU usage. Now I'll run my dodgy servlet and run it again: Hmmm, not so good: Unexpected error: java.lang.IllegalArgumentException: Comparison method violates its general contract! Try once again and we get the following: 2014-03-11T09:00:10.625-0700 Process summary process cpu=199.14% application cpu=189.87% (user=181.57% sys=8.30%) other: cpu=9.27% heap allocation rate 4945kb/s [000040] user=83.95% sys= 2.82% alloc= 0b/s - [ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)' [000038] user=93.71% sys=-0.44% alloc= 0b/s - [ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' [000044] user= 3.90% sys= 4.91% alloc= 4855kb/s - RMI TCP Connection(5)-127.0.0.1 [000001] user= 0.00% sys= 0.00% alloc= 0b/s - main [000002] user= 0.00% sys= 0.00% alloc= 0b/s - Reference Handler [000003] user= 0.00% sys= 0.00% alloc= 0b/s - Finalizer [000004] user= 0.00% sys= 0.00% alloc= 0b/s - Signal Dispatcher [000010] user= 0.00% sys= 0.00% alloc= 0b/s - Timer-0 [000011] user= 0.00% sys= 0.04% alloc= 1124b/s - Timer-1 [000012] user= 0.00% sys= 0.00% alloc= 0b/s - [STANDBY] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' So, the CPU usage is now through the roof (as expected). The main issue with this is that similar to the jps command it doesn't really offer much more than the top command. It also threw the exception above many times when trying to run commands ordered by CPU. Overall, it doesn't really add much to the command already available and unexpected errors are never good. Finally, we'll take a look at memory usage. Checking memory usage For checking memory usage Swiss Java Knife has a tool called hh which it claims is an extended version of jmap -histo. For those not familiar with jmap, it's another of the tools that comes with the JDK which prints shared object memory maps or heap memory details for a process. So, first of all I run my JMeter test that repeatedly calls my dodgy servlet. This time one that allocates multiple byte arrays each time it's called to simulate a memory leak. Although it claims to be an extended version of jmap -histo the only real addition is the ability to state how many buckets to view but this can be easily achieved by piping the output of jmap -histo through head. Aside from that the output is virtually identical. Output from jmap: num #instances #bytes class name ---------------------------------------------- 1: 42124 234260776 [B 2: 161472 24074512 3: 161472 21970928 4: 12853 15416848 5: 12853 10250656 6: 84735 9020400 [C 7: 10896 8943104 8: 91873 2939936 java.lang.String 9: 14021 1675576 java.lang.Class 10: 10311 1563520 [Ljava.lang.Object; Output from sjk: java -jar sjk-plus-0.1-2013-09-06.jar hh -n 10 -p 5427 1: 56626 386286072 [B 2: 161493 24076192 3: 161493 21973784 4: 12850 15409912 5: 12850 10249384 6: 10891 8936672 7: 83336 8577720 [C 8: 90525 2896800 java.lang.String 9: 14018 1675264 java.lang.Class 10: 9819 1579400 [Ljava.lang.Object; Total 996089 500086120 The only other tools available are the commands mxdump and mx which allow access to MBean attributes and operations. However, trying to run either of these resulted in a Null pointer exception. At this point I would generally download the code and start to poke about but by now I'd seen enough. Conclusion Although a nice idea it's very limited in what it offers. Under the covers it uses the Attach API so requires the JDK and not just the JRE in order to run so the majority of tools available are already provided with the standard JDK. There are a few additions to those tools but nothing that really makes it worthwhile using this instead. The only tool I could see myself using would be the real-time GC data gathering tool but this would only be of use where GC logs were unavailable and no other monitoring tools were available. The number of errors seen when running basic commands was also a concern, although this is just a project on github not a commercial offering and doesn't appear to be a particularly active project. So, a useful tool to add to your diagnostic tool-kit? Not in my opinion. It's certainly an interesting idea and with further work could be useful but for now I'd stick with the tools that are already available.
July 24, 2014
by Andy Overton
· 13,465 Views
article thumbnail
DocFlex/XML - XML Schema Documentation Generator and Toolkit
a powerful multi-format xml schema (xsd) documentation generator and a tool for rapid development of custom xsd documentation generators according to user needs. about docflex/xml "xsddoc" template set template processor template designer integrations generation of xsd diagrams apache ant & maven links about docflex/xml docflex/xml is a java-based software system for development and execution of high performance template-driven documentation generators from any data stored in xml files. the actual doc/report generators are programmed in the form of special templates using a graphic template designer , which represents the templates visually in a form resembling the output they generate. further, the templates are interpreted by a template processor , which takes on input the xml files and produces by them the result documentation. this article describes an application of docflex/xml for the task of generation of high-quality xml schema documentation. that includes the following features of docflex/xml system: " xsddoc " template set that implements the ready-to-use xml schema documentation generator itself. template processor makes the templates works. currently, it provides three interchangeable output generators for html, rtf, txt (plain text) formats. template designer provides a high quality gui to design/modify templates. if you need a special xml schema doc generator, the simplest way to create it is to modify the standard xsddoc templates. the template designer enables you to do that. integrations with altova xmlspy and oxygen xml editor . if you are a user of one of those popular xml editors, you can turn it also into a dynamically linked diagramming engine for docflex, so that to include automatically the xsd diagrams generated by xmlspy/oxygenxml into the xml schema documentation generated by docflex (with the full support of hyperlinks). "xsddoc" template set it is the implementation of xml schema documentation itself, which provides the following functionality: generation of single documentation by any number of xml schema (xsd) files together, in particular: highly navigable framed (javadoc-like) html documentation single-file html documentation rtf documentation (further convertible to pdf) processing of any referenced xml schemas, in particular: correct processing of all , , elements found across all involved xsd files. automatic loading and processing (i.e. inclusion in the documentation scope) all directly/indirectly referenced xsd files. sophisticated documenting of xsd components , including: component diagrams (with hyperlinks to everything depicted on them; see also integrations ) xml representation summary (a textual alternative to diagrams) lists of related components. for elements this includes also the list of possible containing elements . (such a list is never present in the output generated by xslt-based doc generators). list of usage locations support of any xml schema design patterns . this comes down mainly to the following: special treatment of local elements (see below) support and documenting of substitution groups support of importing, inclusion and redefinition of schema files special documenting of local elements . local elements are those components that are declared locally within other xsd components. w3c xml schema spec allows you to declare any number of local elements that may share the same name but have different content. that's because their meaning is local and there will be no collisions with other declarations. that, however, creates a problem for documenting, because in a documentation both global and local elements may appear simultaneously in various lists according to their common properties. if each element component is identified only by its name, you will get the lists with multiple repeating names but little clue what they mean. moreover, some xml schemas may contain lots of identical local element declarations (that is, they have the same both name and content). so, you'll get in those lists a mess of repeating names, some of which referencing to effectively the same entities, whereas others to complete different ones. in xsddoc , those problems are solved in two ways: adding extensions to local element names. the extension provides more information about the element (e.g. where it can be inserted or its global type or where it is defined). that makes the whole string identifying the element unique. here is how it looks. the grey text is the name extension: unifying local elements by type. on the left you can see a documentation generated with such unification. on the right, all local elements are documented straight as they are. click on each screenshot to view the docs: we believe the first documentation (on the left) is easier to understand and use. processing of xhtml markup . you can format your xml schema annotations with xhtml tags, which will be recognized and rendered with the appropriate formatting in both html and rtf output, as shown on the following screenshots (click to see more details): here, on the left you can see the xml source of an xml schema, whose annotations are heavily laden with xhtml markup (including insertion of images). the next is the html documentation generated by that schema. on the right is a page of rtf documentation also generated by that schema. possibility of unlimited customization : xsddoc is controlled by more than 400 parameters, which allow you to adjust the generated documentation within huge range of included details. template parameters serve the same role as options in traditional doc generators. the difference is that docflex template architecture makes the support/implementation of template parameters very cheap (typically, the most of efforts takes writing their descriptions). so, there may be hundreds of parameters controlling a large template application. if parameters are not enough, you can modify the templates themselves using the template designer . in case of html output, you can also apply your own css styles to change how the generated documentation looks. template processor the template processor (also called simply "generator") makes everything work. it consists of two logical parts: 1. template interpreter 2. output generator the output generator actually has three different implementations for each currently supported output format: html, rtf, txt (plain text). the plain-text output can be used to generate documentation in formats not supported directly by docflex. the template processor is started directly from java command line with the following arguments: ● main template ● template parameters ● initial xsd files to be processed (documented) ● xml catalogs (to redirect physical location of input files) ● destination directory/file ● output format (this selects which output generator will be used) ● output format options (specify settings to control the selected output generator) actually, the number of settings may be so large that the template processor provides a special gui to specify everything interactively (click to enlarge): template designer although docflex templates are stored as plain-text files (with an xml-like format), they are not supposed for editing manually. rather, a special graphic template designer must be used, which visualizes the templates in the form of template components they are made of. those components are the actual constructs of the template language (not some textual statements, operators, blocks etc.) the following screenshots show templates open in the template designer (click to see a lot more): that approach has a number of advantages, among them: the processing structures represented by template components may be displayed in a way that visually expresses what a component does (for instance, it may resemble the output it generates). that representation may be both expressive and compact (after all, it is not just a text), which allows you easily to navigate a template, understand what it does and modify anything you need. as template components are visual and interactive, they may have very complex internal structure, for instance, contain lots of properties and nested components. at that, you don't need to scroll and navigate some kind of enormous text, which encodes all of this (as it would be in case of a script). rather, you just need to invoke some property dialogs and expand/collapse some component sections. a template component may be easily copied, pasted and deleted as a whole. at that, you don't need to bother that the template syntax is restored after that. the template designer will also ensure that each component is created, copied or moved only in the allowed place. the highly structured nature of templates eliminates the need for most of various named identifiers. many connections between different template components are also maintained by the template designer (i.e. modified automatically when necessary). as template files are stored and read only programmatically, there is no need to know and understand their syntax. there will be no syntax errors either. the actual syntax of template files may be optimized not for human programmers, but for faster loading and processing of templates by the template processor . there is no need in a compilation phase. the separation of template semantics from the particular structure of template files helps for faster and easier evolution of the template language. the obsolete constructs of older template versions can be automatically converted into new structures. both old and new templates will look and work up-to-date. integrations generation of xsd diagrams docflex/xml is able to work with any kind of diagrams (i.e. inserting them automatically in the generated output). that is supported on the level of templates, along with the generation of hypertext imagemaps, as shown on the following screenshot (click to see a lot more): docflex/xml provides no diagramming engine of its own. instead, it includes integrations with two most popular xml editors that do generate xsd diagrams: ● altova xmlspy ● oxygen xml editor effectively, the third-party software is used as dynamically linked diagramming engine. the advantage of such integrations is that when you are the user of one of those xml editors, you will get in the documentation generated by docflex the same diagrams as you see in your xml editor. here is how such a documentation with diagrams looks (click on a screenshot to view the real html): apache ant & maven as a pure java application, docflex/xml can be run in any environment that runs java itself. the template processor can be easily integrated with ant (that can be specified just in the ant build file). in case of maven, docflex/xml includes a simple maven plugin. it is possible also to use all diagraming integrations with both ant and maven. links docflex/xml (home page): http://www.filigris.com/docflex-xml/ docflex/xml xsddoc: http://www.filigris.com/docflex-xml/xsddoc/ xsddoc examples: http://www.filigris.com/docflex-xml/xsddoc/examples/ xmlspy integration: http://www.filigris.com/docflex-xml/xmlspy/ oxygenxml integration: http://www.filigris.com/docflex-xml/oxygenxml/ free downloads: http://www.filigris.com/downloads/ this original article: http://www.filigris.com/ann/docflex-xsd/
July 23, 2014
by Leonid Rudy
· 7,625 Views
article thumbnail
Building Extremely Large In-Memory InputStream for Testing Purposes
For some reason I needed extremely large, possibly even infinite InputStream that would simply return the same byte[]over and over. This way I could produce insanely big stream of data by repeating small sample. Sort of similar functionality can be found in Guava: Iterable Iterables.cycle(Iterable) and Iterator Iterators.cycle(Iterator). For example if you need an infinite source of 0 and 1, simply sayIterables.cycle(0, 1) and get 0, 1, 0, 1, 0, 1... infinitely. Unfortunately I haven't found such utility forInputStream, so I jumped into writing my own. This article documents many mistakes I made during that process, mostly due to overcomplicating and overengineering straightforward solution. We don't really need an infinite InputStream, being able to create very large one (say, 32 GiB) is enough. So we are after the following method: public static InputStream repeat(byte[] sample, int times) It basically takes sample array of bytes and returns an InputStream returning these bytes. However when sample runs out, it rolls over, returning the same bytes again - this process is repeated given number of times, until InputStreamsignals end. One solution that I haven't really tried but which seems most obvious: public static InputStream repeat(byte[] sample, int times) { final byte[] allBytes = new byte[sample.length * times]; for (int i = 0; i < times; i++) { System.arraycopy(sample, 0, allBytes, i * sample.length, sample.length); } return new ByteArrayInputStream(allBytes); } I see you laughing there! If sample is 100 bytes and we need 32 GiB of input repeating these 100 bytes, generatedInputStream shouldn't really allocate 32 GiB of memory, we must be more clever here. As a matter of fact repeat()above has another subtle bug. Arrays in Java are limited to 231-1 entries (int), 32 GiB is way above that. The reason this program compiles is a silent integer overflow here: sample.length * times. This multiplication doesn't fit in int. OK, let's try something that at least theoretically can work. My first idea was as follows: what if I create manyByteArrayInputStreams sharing the same byte[] sample (they don't do an eager copy) and somehow join them together? Thus I needed some InputStream adapter that could take arbitrary number of underlying InputStreams and chain them together - when first stream is exhausted, switch to next one. This awkward moment when you look for something in Apache Commons or Guava and apparently it was in the JDK forever... java.io.SequenceInputStream is almost ideal. However it can only chain precisely two underlying InputStreams. Of course since SequenceInputStreamis an InputStream itself, we can use it recursively as an argument to outer SequenceInputStream. Repeating this process we can chain arbitrary number of ByteArrayInputStreams together: public static InputStream repeat(byte[] sample, int times) { if (times <= 1) { return new ByteArrayInputStream(sample); } else { return new SequenceInputStream( new ByteArrayInputStream(sample), repeat(sample, times - 1) ); } } If times is 1, just wrap sample in ByteArrayInputStream. Otherwise use SequenceInputStream recursively. I think you can immediately spot what's wrong with this code: too deep recursion. Nesting level is the same as times argument, which will reach millions or even billions. There must be a better way. Luckily minor improvement changes recursion depth from O(n) to O(logn): public static InputStream repeat(byte[] sample, int times) { if (times <= 1) { return new ByteArrayInputStream(sample); } else { return new SequenceInputStream( repeat(sample, times / 2), repeat(sample, times - times / 2) ); } } Honestly this was the first implementation I tried. It's a simple application of divide and conquer principle, where we produce result by evenly splitting it into two smaller sub-problems. Looks clever, but there is one issue: it's easy to prove we create t (t = times) ByteArrayInputStreams and O(t) SequenceInputStreams. While sample byte array is shared, millions of various InputStream instances are wasting memory. This leads us to alternative implementation, creating just one InputStream, regardless value of times: import com.google.common.collect.Iterators; import org.apache.commons.lang3.ArrayUtils; public static InputStream repeat(byte[] sample, int times) { final Byte[] objArray = ArrayUtils.toObject(sample); final Iterator infinite = Iterators.cycle(objArray); final Iterator limited = Iterators.limit(infinite, sample.length * times); return new InputStream() { @Override public int read() throws IOException { return limited.hasNext() ? limited.next() & 0xFF : -1; } }; } We will use Iterators.cycle() after all. But before we have to translate byte[] into Byte[] since iterators can only work with objets, not primitives. There is no idiomatic way to turn array of primitives to array of boxed types, so I useArrayUtils.toObject(byte[]) from Apache Commons Lang. Having an array of objects we can create an infiniteiterator that cycles through values of sample. Since we don't want an infinite stream, we cut off infinite iterator usingIterators.limit(Iterator, int), again from Guava. Now we just have to bridge from Iterator toInputStream - after all semantically they represent the same thing. This solution suffers two problems. First of all it produces tons of garbage due to unboxing. Garbage collection is not that much concerned about dead, short-living objects, but still seems wasteful. Second issue we already faced previously:sample.length * times multiplication can cause integer overflow. It can't be fixed because Iterators.limit() takesint, not long - for no good reason. BTW we avoided third problem by doing bitwise and with 0xFF - otherwise byte with value -1 would signal end of stream, which is not the case. x & 0xFF is correctly translated to unsigned 255 (int). So even though implementation above is short and sweet, declarative rather than imperative, it's too slow and limited. If you have a C background, I can imagine how uncomfortable you were seeing me struggle. After all the most straightforward, painfully simple and low-level implementation was the one I came up with last: public static InputStream repeat(byte[] sample, int times) { return new InputStream() { private long pos = 0; private final long total = (long)sample.length * times; public int read() throws IOException { return pos < total ? sample[(int)(pos++ % sample.length)] : -1; } }; } GC free, pure JDK, fast and simple to understand. Let this be a lesson for you: start with the simplest solution that jumps to your mind, don't overengineer and don't be too smart. My previous solutions, declarative, functional, immutable, etc. - maybe they looked clever, but they were neither fast nor easy to understand. The utility we just developed was not just a toy project, it will be used later in subsequent article.
July 23, 2014
by Tomasz Nurkiewicz
· 7,534 Views
article thumbnail
VelocityEngine Spring Java Config
This is a first post in a series of short code snippets that will present the configuration of Spring beans from XML to Java. XML: resource.loader=class class.resource.loader.class=org.apache.velocity.runtime.resource.loader.ClasspathResourceLoader Java @Bean public VelocityEngine velocityEngine() throws VelocityException, IOException{ VelocityEngineFactoryBean factory = new VelocityEngineFactoryBean(); Properties props = new Properties(); props.put("resource.loader", "class"); props.put("class.resource.loader.class", "org.apache.velocity.runtime.resource.loader." + "ClasspathResourceLoader"); factory.setVelocityProperties(props); return factory.createVelocityEngine(); }
July 23, 2014
by Adrian Matei
· 13,204 Views
article thumbnail
Time - Memory Tradeoff With the Example of Java Maps
this article illustrates the general time - memory tradeoff with the example of different hash table implementations in java. the more memory a hash table takes, the faster each operation (e. g. getting a value by key or putting an entry) is performed. benchmarking method hash maps with int keys and int values were tested. memory measure is relative usage over theoretical minimum. for example, 1000 entries of int key and value take at least (4 (size of int) + 4) * 1000 = 8000 bytes. if the hash map implementation takes 20 000 bytes, it's memory overuse is (20 000 - 8000) / 8000 = 1.5. each implementation was benchmarked on 9 different load levels (load factors). on each load level, each map was filled with 10 numbers of entries, logariphmically evenly distributed bewteen 1000 and 10 000 000 (to study caching effects). then, for the same implementation and load level, memory metrics and average operation throughputs are averaged independently, over 3 smallest sizes (small sizes), 3 largest sizes (large sizes) and all 10 sizes from 1000 to 10 000 000 (all sizes). implementations: higher frequency trading collections (hftc) high performance primitive collections (hppc) fastutil collections goldman sachs collections (gs) trove collections mahout collections java.util.hashmap as a reference get value by key (successful) only looking at these charts, you can suppose that hftc, trove and mahout on the one size, fastutil, hppc and gs on the another use the same hash table algorithm. (in fact, it is not quite true.) sparser hash table on average performs less lookups during key search, therefore less memory reads, therefore the operation finishes earlier. notice, that on small sizes the largest maps are the fastest for all implementations, but on large and all sizes there isn't visible progress starting from memory overuse ~4. that's because when the total memory taken by the map goes beyond cpu cache capacity, cache misses become more often when the map is getting larger. this effect compensates algorithic trend. update (increment) value by key update operation behaves pretty similar to get(). fastutil wasn't benchmarked, because there aren't fairly performant method for this task in it's api. put an entry (key was absent) in this case, maps were gradually filled in with the entries from the size 0 to the target size (1000 - 10 000 000). rehash shouldn't occur, because maps were constructed with the target size provided. for small sizes, plots still looks like hyperbolas, but i can't explain so dramatic change on large sizes and differences between hftc and other primitive implementations. internal iteration (foreach) iteration is getting slower with memory usage growth. interesting thing about external iteration: for all open hash table implementations throughtput depends only memory usage, not even on load factor (which differs for implementations for the same memory usage). also, foreach throughput don't depend on open hash table size. external iteration (via iterator or cursor) external iteration performance is more varying than internal, because there is more freedom for optimization. hftc and trove employ own iteration interfaces, other libraries use standard java.util.iterator . footnote raw benchmark results from which the pictures were built with a link to the benchmarking code and information about the runsite in description.
July 22, 2014
by Roman Leventov
· 25,959 Views
article thumbnail
New in JAX-RS 2.0 – @BeanParam Annotation
JAX-RS 2.0 is the latest version of the JSR 311 specification and it was released along with Java EE 7.
July 22, 2014
by Abhishek Gupta DZone Core CORE
· 23,404 Views · 2 Likes
article thumbnail
5 Reasons to Use a Java Data Grid in Your Application
In this post we explore 5 reasons to use a Java Data Grid for caching Java objects in-memory in your applications. In a later post we will explore some of the other data grid capabilities, beyond data storage, that can revolutionize your Java architectures, like on-grid computation and events. Memory is Fast Java Data Grids store Java objects in memory. Memory access is fast with low latency. So if access to data storage either disk or database is the primary bottleneck in your application then using a data grid as an in-memory cache in front of your storage tier will give you a performance boost. Scale out your Application Shared State If you need to share state across JVMs to scale out your application then using a Java Data Grid rather than a database will increase your scalability. A typical shared state architecture is shown below, the application server tier stores shared Java objects in the data grid and these objects are available to all application server nodes in your architecture. Separating the data grid tier from the application server tier has a number of advantages; Applications can be redeployed and restarted without losing the shared state Data Grid JVMs and Application JVMs can be tuned separately State can be shared across multiple different applications. Each tier can be scaled horizontally separately depending on work load Typical use cases for shared state include; PCI compliant storage of card security codes; In-game state in online games; web session data; prices and catalogues in ecommerce. Anything that needs low latency access can be stored in the shared data grid. High Availability for In-Memory Data As well as low latency access and scaling out shared state. Java Data Grids also provide high availability for your in-memory data. When storing Java objects in a data grid a primary object is stored in one of the Data Grid JVMs and secondary back up copies of the object are stored in different Data Grid JVM node, ensuring that if you lose a node then you don't lose any data. Clients of the data grid do not need to know where data is to access it so high availability is transparent to your application. Scale Out In-Memory Data Volumes Java objects, in data grids, aren't fully replicated across all Data Grid JVMs but are stored as a primary object and a secondary object. This means the more Data Grid JVM nodes we add the more JVM heap we have for storing Java objects in-memory (and remember memory is fast). For example if we build a Data Grid with 20 JVMs each with 4Gb free heap (after per JVM overhead) we could theoretically store 80Gb (4 times 20) of shared Java objects. If we assume we have 1 duplicate for high availability this cuts our storage in half so we can store 40Gb (.5 time 4 times 20 ) of Java Objects in memory. Native Integration with JPA Java Data Grids have native integration with JPA frameworks like TopLink and Hibernate whereby the Data Grid can act as a second level cache between JPA and the database. This can give a large performance boost to your database driven application if latency associated with database access is a key performance bottleneck.
July 22, 2014
by Steve Millidge
· 7,391 Views
article thumbnail
R: ggplot: Problem automatically picking scale for difftime object
While reading ‘Why The R Programming Language Is Good For Business‘ I came across Udacity’s ‘Data Analysis with R‘ courses – part of which focuses exploring data sets using visualisations, something I haven’t done much of yet. I thought it’d be interesting to create some visualisations around the times that people RSVP ‘yes’ to the various Neo4j events that we run in London. I started off with the following query which returns the date time that people replied ‘Yes’ to an event and the date time of the event: library(Rneo4j) query = "MATCH (e:Event)<-[:TO]-(response {response: 'yes'}) RETURN response.time AS time, e.time + e.utc_offset AS eventTime" allYesRSVPs = cypher(graph, query) allYesRSVPs$time = timestampToDate(allYesRSVPs$time) allYesRSVPs$eventTime = timestampToDate(allYesRSVPs$eventTime) > allYesRSVPs[1:10,] time eventTime 1 2011-06-05 12:12:27 2011-06-29 18:30:00 2 2011-06-05 14:49:04 2011-06-29 18:30:00 3 2011-06-10 11:22:47 2011-06-29 18:30:00 4 2011-06-07 15:27:07 2011-06-29 18:30:00 5 2011-06-06 20:21:45 2011-06-29 18:30:00 6 2011-07-04 19:49:04 2011-07-27 19:00:00 7 2011-07-05 16:40:10 2011-07-27 19:00:00 8 2011-08-19 07:41:10 2011-08-31 18:30:00 9 2011-08-24 12:47:40 2011-08-31 18:30:00 10 2011-08-18 09:56:53 2011-08-31 18:30:00 I wanted to create a bar chart showing the amount of time in advance of a meetup that people RSVP’d ‘yes’ so I added the following column to my data frame: allYesRSVPs$difference = allYesRSVPs$eventTime - allYesRSVPs$time > allYesRSVPs[1:10,] time eventTime difference 1 2011-06-05 12:12:27 2011-06-29 18:30:00 34937.55 mins 2 2011-06-05 14:49:04 2011-06-29 18:30:00 34780.93 mins 3 2011-06-10 11:22:47 2011-06-29 18:30:00 27787.22 mins 4 2011-06-07 15:27:07 2011-06-29 18:30:00 31862.88 mins 5 2011-06-06 20:21:45 2011-06-29 18:30:00 33008.25 mins 6 2011-07-04 19:49:04 2011-07-27 19:00:00 33070.93 mins 7 2011-07-05 16:40:10 2011-07-27 19:00:00 31819.83 mins 8 2011-08-19 07:41:10 2011-08-31 18:30:00 17928.83 mins 9 2011-08-24 12:47:40 2011-08-31 18:30:00 10422.33 mins 10 2011-08-18 09:56:53 2011-08-31 18:30:00 19233.12 mins I then tried to use ggplot to create a bar chart of that data: > ggplot(allYesRSVPs, aes(x=difference)) + geom_histogram(binwidth=1, fill="green") Unfortunately that resulted in this error: Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous Error: Discrete value supplied to continuous scale I couldn’t find anyone who had come across this problem before in my search but I did find the as.numeric function which seemed like it would put the difference into an appropriate format: allYesRSVPs$difference = as.numeric(allYesRSVPs$eventTime - allYesRSVPs$time, units="days") > ggplot(allYesRSVPs, aes(x=difference)) + geom_histogram(binwidth=1, fill="green") that resulted in the following chart: We can see there is quite a heavy concentration of people RSVPing yes in the few days before the event and then the rest are scattered across the first 30 days. We usually announce events 3/4 weeks in advance so I don’t know that it tells us anything interesting other than that it seems like people sign up for events when an email is sent out about them. The date the meetup was announced (by email) isn’t currently exposed by the API but hopefully one day it will be. The code is on github if you want to have a play – any suggestions welcome.
July 21, 2014
by Mark Needham
· 11,878 Views
article thumbnail
Grouping, Sampling and Batching - Custom Collectors in Java 8
Continuing from the first article, this time we will write some more useful custom collectors: for grouping by given criteria, sampling input, batching and sliding over with fixed size window. Grouping (counting occurrences, histogram) Imagine you have a collection of some items and you want to calculate how many times each item (with respect to equals()) appears in this collection. This can be achieved using CollectionUtils.getCardinalityMap() from Apache Commons Collections. This method takes an Iterable and returns Map, counting how many times each item appeared in the collection. However sometimes instead of usingequals() we would like to group by an arbitrary attribute of input T. For example say we have a list of Person objects and we would like to compute the number of males vs. females (i.e. Map) or maybe an age distribution. There is a built-in collector Collectors.groupingBy(Function classifier) - however it returns a map from key to all items mapped to that key. See: import static java.util.stream.Collectors.groupingBy; //... final List people = //... final Map> bySex = people .stream() .collect(groupingBy(Person::getSex)); It's valuable, but in our case unnecessarily builds two List. I only want to know the number of people. There is no such collector built-in, but we can compose it in a fairly simple manner: import static java.util.stream.Collectors.counting; import static java.util.stream.Collectors.groupingBy; //... final Map bySex = people .stream() .collect( groupingBy(Person::getSex, HashMap::new, counting())); This overloaded version of groupingBy() takes three parameters. First one is the key (classifier) function, as previously. Second argument creates a new map, we'll see shortly why it's useful. counting() is a nested collector that takes all people with same sex and combines them together - in our case simply counting them as they arrive. Being able to choose map implementation is useful e.g. when building age histogram. We would like to know how many people we have at given age - but age values should be sorted: final TreeMap byAge = people .stream() .collect( groupingBy(Person::getAge, TreeMap::new, counting())); byAge .forEach((age, count) -> System.out.println(age + ":\t" + count)); We ended up with a TreeMap from age (sorted) to count of people having that age. Sampling, batching and sliding window IterableLike.sliding() method in Scala allows to view a collection through a sliding fixed-size window. This window starts at the beginning and in each iteration moves by given number of items. Such functionality, missing in Java 8, allows several useful operators like computing moving average, splitting big collection into batches (compare with Lists.partition() in Guava) or sampling every n-th element. We will implement collector for Java 8 providing similar behaviour. Let's start from unit tests, which should describe briefly what we want to achieve: import static com.nurkiewicz.CustomCollectors.sliding @Unroll class CustomCollectorsSpec extends Specification { def "Sliding window of #input with size #size and step of 1 is #output"() { expect: input.stream().collect(sliding(size)) == output where: input | size | output [] | 5 | [] [1] | 1 | [[1]] [1, 2] | 1 | [[1], [2]] [1, 2] | 2 | [[1, 2]] [1, 2] | 3 | [[1, 2]] 1..3 | 3 | [[1, 2, 3]] 1..4 | 2 | [[1, 2], [2, 3], [3, 4]] 1..4 | 3 | [[1, 2, 3], [2, 3, 4]] 1..7 | 3 | [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7]] 1..7 | 6 | [1..6, 2..7] } def "Sliding window of #input with size #size and no overlapping is #output"() { expect: input.stream().collect(sliding(size, size)) == output where: input | size | output [] | 5 | [] 1..3 | 2 | [[1, 2], [3]] 1..4 | 4 | [1..4] 1..4 | 5 | [1..4] 1..7 | 3 | [1..3, 4..6, [7]] 1..6 | 2 | [[1, 2], [3, 4], [5, 6]] } def "Sliding window of #input with size #size and some overlapping is #output"() { expect: input.stream().collect(sliding(size, 2)) == output where: input | size | output [] | 5 | [] 1..4 | 5 | [[1, 2, 3, 4]] 1..7 | 3 | [1..3, 3..5, 5..7] 1..6 | 4 | [1..4, 3..6] 1..9 | 4 | [1..4, 3..6, 5..8, 7..9] 1..10 | 4 | [1..4, 3..6, 5..8, 7..10] 1..11 | 4 | [1..4, 3..6, 5..8, 7..10, 9..11] } def "Sliding window of #input with size #size and gap of #gap is #output"() { expect: input.stream().collect(sliding(size, size + gap)) == output where: input | size | gap | output [] | 5 | 1 | [] 1..9 | 4 | 2 | [1..4, 7..9] 1..10 | 4 | 2 | [1..4, 7..10] 1..11 | 4 | 2 | [1..4, 7..10] 1..12 | 4 | 2 | [1..4, 7..10] 1..13 | 4 | 2 | [1..4, 7..10, [13]] 1..13 | 5 | 1 | [1..5, 7..11, [13]] 1..12 | 5 | 3 | [1..5, 9..12] 1..13 | 5 | 3 | [1..5, 9..13] } def "Sampling #input taking every #nth th element is #output"() { expect: input.stream().collect(sliding(1, nth)) == output where: input | nth | output [] | 1 | [] [] | 5 | [] 1..3 | 5 | [[1]] 1..6 | 2 | [[1], [3], [5]] 1..10 | 5 | [[1], [6]] 1..100 | 30 | [[1], [31], [61], [91]] } } Using data driven tests in Spock I managed to write almost 40 test cases in no-time, succinctly describing all requirements. I hope these are clear for you, even if you haven't seen this syntax before. I already assumed existence of handy factory methods: public class CustomCollectors { public static Collector>> sliding(int size) { return new SlidingCollector<>(size, 1); } public static Collector>> sliding(int size, int step) { return new SlidingCollector<>(size, step); } } The fact that collectors receive items one after another makes are job harder. Of course first collecting the whole list and sliding over it would have been easier, but sort of wasteful. Let's build result iteratively. I am not even pretending this task can be parallelized in general, so I'll leave combiner() unimplemented: public class SlidingCollector implements Collector>, List>> { private final int size; private final int step; private final int window; private final Queue buffer = new ArrayDeque<>(); private int totalIn = 0; public SlidingCollector(int size, int step) { this.size = size; this.step = step; this.window = max(size, step); } @Override public Supplier>> supplier() { return ArrayList::new; } @Override public BiConsumer>, T> accumulator() { return (lists, t) -> { buffer.offer(t); ++totalIn; if (buffer.size() == window) { dumpCurrent(lists); shiftBy(step); } }; } @Override public Function>, List>> finisher() { return lists -> { if (!buffer.isEmpty()) { final int totalOut = estimateTotalOut(); if (totalOut > lists.size()) { dumpCurrent(lists); } } return lists; }; } private int estimateTotalOut() { return max(0, (totalIn + step - size - 1) / step) + 1; } private void dumpCurrent(List> lists) { final List batch = buffer.stream().limit(size).collect(toList()); lists.add(batch); } private void shiftBy(int by) { for (int i = 0; i < by; i++) { buffer.remove(); } } @Override public BinaryOperator>> combiner() { return (l1, l2) -> { throw new UnsupportedOperationException("Combining not possible"); }; } @Override public Set characteristics() { return EnumSet.noneOf(Characteristics.class); } } I spent quite some time writing this implementation, especially correct finisher() so don't be frightened. The crucial part is a buffer that collects items until it can form one sliding window. Then "oldest" items are discarded and window slides forward by step. I am not particularly happy with this implementation, but tests are passing. sliding(N)(synonym to sliding(N, 1)) will allow calculating moving average of N items.sliding(N, N) splits input into batches of size N. sliding(1, N) takes every N-th element (samples). I hope you'll find this collector useful, enjoy!
July 18, 2014
by Tomasz Nurkiewicz
· 21,938 Views · 2 Likes
  • Previous
  • ...
  • 409
  • 410
  • 411
  • 412
  • 413
  • 414
  • 415
  • 416
  • 417
  • 418
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×