DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Languages Topics

article thumbnail
Using Spring Profiles in XML Config
Learn how to use spring profiles in an XML config with this great tutorial.
August 14, 2012
by Roger Hughes
· 185,667 Views · 4 Likes
article thumbnail
Installing Oracle Java 6 on Ubuntu
If you have already installed Ubuntu 12.04 you probably have realized that Sun java(oracle java) does not come prepacked with Ubuntu like it used to be , instead OpenJDK comes with it. Here is how you can install Oracle java on Ubuntu 12.04 manually. Download jdk-6u32-linux-x64.bin from this link. If you have used 32-bit Ubuntu installation, download jdk-6u32-linux-x32.bin instead. To make the downloaded bin file executable use the following command chmod +x jdk-6u32-linux-x64.bin To extract the bin file use the following command ./jdk-6u32-linux-x64.bin Using the following command create a folder called "jvm" inside /usr/lib if it is not already existing sudo mkdir /usr/lib/jvm Move the extracted folder into the newly created jvm folder sudo mv jdk1.6.0_32 /usr/lib/jvm/ To install the Java source use following commands sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.6.0_32/bin/javac 1 sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.6.0_32/bin/java 1 sudo update-alternatives --install /usr/bin/javaws javaws /usr/lib/jvm/jdk1.6.0_32/bin/javaws 1 To make this default java sudo update-alternatives --config javac sudo update-alternatives --config java sudo update-alternatives --config javaws To make symlinks point to the new Java location use the following command ls -la /etc/alternatives/java* To verify Java has installed correctly use this command java -version
August 13, 2012
by Pavithra Gunasekara
· 60,098 Views · 1 Like
article thumbnail
JAXB and Root Elements
@XmlRootElement is an annotation that people are used to using with JAXB (JSR-222). It's purpose is to uniquely associate a root element with a class. Since JAXB classes map to complex types, it is possible for a class to correspond to multiple root elements. In this case @XmlRootElement can not be used and people start getting a bit confused. In this post I'll demonstrate how @XmlElementDecl can be used to map this use case. XML Schema The XML schema below contains three root elements: customer, billing-address, and shipping-address. The customer element has an anonymous complex type, while billing-address and shipping-address are of the same named type (address-type). createBillingAddress(AddressType value) { return new JAXBElement(_BillingAddress_QNAME, AddressType.class, null, value); } @XmlElementDecl(namespace = "http://www.example.org/customer", name = "shipping-address") public JAXBElement createShippingAddress(AddressType value) { return new JAXBElement(_ShippingAddress_QNAME, AddressType.class, null, value); } } package-info The package-info class is used to specify the namespace mapping (see JAXB & Namespaces). @XmlSchema(namespace = "http://www.example.org/customer", elementFormDefault = XmlNsForm.QUALIFIED) package org.example.customer; import javax.xml.bind.annotation.*; Unmarshal Operation Now we look at the impact of the type of root element when unmarshalling XML. customer.xml Below is a sample XML document with customer as the root element. Remember the customer element had an anonymous complex type. 1 Any Street 2 Another Road shipping.xml Here is a sample XML document with shipping-address as the root element. The shipping-address element had a named complex type. 2 Another Road Unmarshal Demo When unmarshalling XML that corresponds to a class annotated with @XmlRootElement you get an instance of the domain object. But when unmarshalling XML that corresponds to a class annotated with @XmlElementDecl you get the domain object wrapped in an instance of JAXBElement. In this example you may need to use the QName from the JAXBElement to determine if you unmarshalled a billing or shipping address. package org.example.customer; import java.io.File; import javax.xml.bind.*; public class UnmarshalDemo { public static void main(String[] args) throws Exception { JAXBContext jc = JAXBContext.newInstance("org.example.customer"); Unmarshaller unmarshaller = jc.createUnmarshaller(); // Unmarshal Customer File customerXML = new File("src/org/example/customer/customer.xml"); Customer customer = (Customer) unmarshaller.unmarshal(customerXML); // Unmarshal Shipping Address File shippingXML = new File("src/org/example/customer/shipping.xml"); JAXBElement je = (JAXBElement) unmarshaller.unmarshal(shippingXML); AddressType shipping = je.getValue(); } } Unmarshal Demo - JAXBIntrospector If you don't want to deal with remembering whether the result of the unmarshal operation will be a domain object or JAXBElement, then you can use the JAXBIntrospector.getValue(Object) method to always get the domain object. package org.example.customer; import java.io.File; import javax.xml.bind.*; public class JAXBIntrospectorDemo { public static void main(String[] args) throws Exception { JAXBContext jc = JAXBContext.newInstance("org.example.customer"); Unmarshaller unmarshaller = jc.createUnmarshaller(); // Unmarshal Customer File customerXML = new File("src/org/example/customer/customer.xml"); Customer customer = (Customer) JAXBIntrospector.getValue(unmarshaller .unmarshal(customerXML)); // Unmarshal Shipping Address File shippingXML = new File("src/org/example/customer/shipping.xml"); AddressType shipping = (AddressType) JAXBIntrospector .getValue(unmarshaller.unmarshal(shippingXML)); } } Marshal Operation You can directly marshal an object annotated with @XmlRootElement to XML. Classes corresponding to @XmlElementDecl annotations must first be wrapped in an instance of JAXBElement. The factory method you you annotated with @XmlElementDecl is the easiest way to do this. The factory method is in the ObjectFactory class if you generated your model from an XML schema. package org.example.customer; import javax.xml.bind.*; public class MarshalDemo { public static void main(String[] args) throws Exception { JAXBContext jc = JAXBContext.newInstance("org.example.customer"); Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); // Create Domain Objects AddressType billingAddress = new AddressType(); billingAddress.setStreet("1 Any Street"); Customer customer = new Customer(); customer.setBillingAddress(billingAddress); // Marshal Customer marshaller.marshal(customer, System.out); // Marshal Billing Address ObjectFactory objectFactory = new ObjectFactory(); JAXBElement je = objectFactory.createBillingAddress(billingAddress); marshaller.marshal(je, System.out); } } Output Below is the output from running the demo code. 1 Any Street 1 Any Street
August 12, 2012
by Blaise Doughan
· 226,364 Views · 15 Likes
article thumbnail
Java Executor Service Types
The ExecutorService feature came with Java 5. It extends the Executor interface and provides a thread pool feature to execute asynchronous short tasks. There are five ways to execute the tasks asyncronously by using the ExecutorService interface provided Java 6. ExecutorService execService = Executors.newCachedThreadPool(); This approach creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for 60 seconds are terminated and removed from the cache. ExecutorService execService = Executors.newFixedThreadPool(10); This approach creates a thread pool that reuses a fixed number of threads. Created nThreads will be active at the runtime. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. ExecutorService execService = Executors.newSingleThreadExecutor(); This approach creates an Executor that uses a single worker thread operating off an unbounded queue. Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Methods of the ExecutorService : execute(Runnable) : Executes the given command at some time in the future. submit(Runnable) : Submit method returns a Future Object which represents executed task. Future Object returns null if the task has finished correctly. shutdown() : Initiates an orderly shutdown in which previously submitted tasks are executed, but no new tasks will be accepted. Invocation has no additional effect if already shut down. shutdownNow() : Attempts to stop all actively executing tasks, halts the processing of waiting tasks, and returns a list of the tasks that were awaiting execution. There are no guarantees beyond best-effort attempts to stop processing actively executing tasks. For example, typical implementations will cancel via Thread.interrupt, so any task that fails to respond to interrupts may never terminate. A sample application is below : STEP 1 : CREATE MAVEN PROJECT A maven project is created as below. (It can be created by using Maven or IDE Plug-in). STEP 2 : CREATE A NEW TASK A new task is created by implementing the Runnable interface(creating Thread) as below. TestTask Class specifies business logic which will be executed. package com.otv.task; import org.apache.log4j.Logger; /** * @author onlinetechvision.com * @since 24 Sept 2011 * @version 1.0.0 * */ public class TestTask implements Runnable { private static Logger log = Logger.getLogger(TestTask.class); private String taskName; public TestTask(String taskName) { this.taskName = taskName; } public void run() { try { log.debug(this.taskName + " is sleeping..."); Thread.sleep(3000); log.debug(this.taskName + " is running..."); } catch (InterruptedException e) { e.printStackTrace(); } } STEP 3 : CREATE TestExecutorService by using newCachedThreadPool TestExecutorService is created by using the method newCachedThreadPool. In this case, created thread count is specified at the runtime. package com.otv; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import com.otv.task.TestTask; /** * @author onlinetechvision.com * @since 24 Sept 2011 * @version 1.0.0 * */ public class TestExecutorService { public static void main(String[] args) { ExecutorService execService = Executors.newCachedThreadPool(); execService.execute(new TestTask("FirstTestTask")); execService.execute(new TestTask("SecondTestTask")); execService.execute(new TestTask("ThirdTestTask")); execService.shutdown(); } } When TestExecutorService is run, the output will be seen as below : 24.09.2011 17:30:47 DEBUG (TestTask.java:21) - SecondTestTask is sleeping... 24.09.2011 17:30:47 DEBUG (TestTask.java:21) - ThirdTestTask is sleeping... 24.09.2011 17:30:47 DEBUG (TestTask.java:21) - FirstTestTask is sleeping... 24.09.2011 17:30:50 DEBUG (TestTask.java:23) - ThirdTestTask is running... 24.09.2011 17:30:50 DEBUG (TestTask.java:23) - FirstTestTask is running... 24.09.2011 17:30:50 DEBUG (TestTask.java:23) - SecondTestTask is running... STEP 4 : CREATE TestExecutorService by using newFixedThreadPool TestExecutorService is created by using the method newFixedThreadPool. In this case, required thread count has to be set as the following : package com.otv; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import com.otv.task.TestTask; /** * @author onlinetechvision.com * @since 24 Sept 2011 * @version 1.0.0 * */ public class TestExecutorService { public static void main(String[] args) { ExecutorService execService = Executors.newFixedThreadPool(2); execService.execute(new TestTask("FirstTestTask")); execService.execute(new TestTask("SecondTestTask")); execService.execute(new TestTask("ThirdTestTask")); execService.shutdown(); } } When TestExecutorService is run, ThirdTestTask is executed after FirstTestTask and SecondTestTask’ s executions are completed. The output will be seen as below: 24.09.2011 17:33:38 DEBUG (TestTask.java:21) - FirstTestTask is sleeping... 24.09.2011 17:33:38 DEBUG (TestTask.java:21) - SecondTestTask is sleeping... 24.09.2011 17:33:41 DEBUG (TestTask.java:23) - FirstTestTask is running... 24.09.2011 17:33:41 DEBUG (TestTask.java:23) - SecondTestTask is running... 24.09.2011 17:33:41 DEBUG (TestTask.java:21) - ThirdTestTask is sleeping... 24.09.2011 17:33:44 DEBUG (TestTask.java:23) - ThirdTestTask is running... STEP 5 : CREATE TestExecutorService by using newSingleThreadExecutor TestExecutorService is created by using the method newSingleThreadExecutor. In this case, only one thread is created and tasks are executed sequentially. package com.otv; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import com.otv.task.TestTask; /** * @author onlinetechvision.com * @since 24 Sept 2011 * @version 1.0.0 * */ public class TestExecutorService { public static void main(String[] args) { ExecutorService execService = Executors.newSingleThreadExecutor(); execService.execute(new TestTask("FirstTestTask")); execService.execute(new TestTask("SecondTestTask")); execService.execute(new TestTask("ThirdTestTask")); execService.shutdown(); } } When TestExecutorService is run, SecondTestTask and ThirdTestTask is executed after FirstTestTask’ s execution is completed. The output will be seen as below : 24.09.2011 17:38:21 DEBUG (TestTask.java:21) - FirstTestTask is sleeping... 24.09.2011 17:38:24 DEBUG (TestTask.java:23) - FirstTestTask is running... 24.09.2011 17:38:24 DEBUG (TestTask.java:21) - SecondTestTask is sleeping... 24.09.2011 17:38:27 DEBUG (TestTask.java:23) - SecondTestTask is running... 24.09.2011 17:38:27 DEBUG (TestTask.java:21) - ThirdTestTask is sleeping... 24.09.2011 17:38:30 DEBUG (TestTask.java:23) - ThirdTestTask is running... STEP 6 : REFERENCES http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html http://tutorials.jenkov.com/java-util-concurrent/executorservice.html
August 6, 2012
by Eren Avsarogullari
· 23,710 Views · 2 Likes
article thumbnail
Using Multiple Versions of JDK and Eclipse in Single Machine
In my office laptop, I have installed two versions of JDK. For the office work, I need JDK6 because the internal framework needs it. I’m using JDK7 for my personal projects and exploring the latest and greatest in Java. I have two versions of Eclipse too (one for office work and one is the latest Juno). But, the tricky thing is to manage these multiple JDKs and IDEs. It’s a piece of cake if I just use Eclipse for compiling my code, because the IDE allows me to configure multiple versions of Java runtime. Unfortunately (or fortunately), I have to use the command line/shell to build my code. So, it is important that I have the right version of JDK present in the PATH and other related environment variables (such as JAVA_HOME). Manually modifying the environment variables every time I want to switch between JDKs, isn’t a happy task. But, thanks to Windows Powershell, I’m able to write a scriplet that can do the heavy-lifting for me. Basically, what I want to achieve is to set PATH variable to add Java bin folder and set the JAVA_HOME environment variable and then launch the correct Eclipse IDE. And, I want to do this with a single command. Let’s do it. Open a Windows Powershell. I prefer writing custom Windows scripts in my profile file so that it is available to run when ever I open the shell. To edit the profile, run this command: notepad.exe $profile - the $profile is a special variable that points to your profile file. Write the below script in the profile file and save it. function myIDE{ $env:Path += "C:\vraa\java\jdk7\bin;" $env:JAVA_HOME = "C:\vraa\java\jdk7" C:\vraa\ide\eclipse\eclipse set-location C:\vraa\workspace\myproject play } function officeIDE{ $env:Path += "C:\vraa\java\jdk6\bin;" $env:JAVA_HOME = "C:\vraa\java\jdk6" C:\office\eclipse\eclipse } Close and restart the Powershell. Now you can issue the command myIDE which will set the proper PATH and environment variables and then launch the eclipse IDE. As you can see, there are two functions with different configurations. Just call the function name that you want to launch from the Powershell command line (myIDE or officeIDE).
August 4, 2012
by Veera Sundar
· 20,828 Views
article thumbnail
JAXB - No Annotations Required
There appears to be a misconception that annotations are required on the model in order to use a JAXB (JSR-222) implementation. The truth is that JAXB is configuration by exception, so annotations are only required when you want to override default behaviour. In this example I'll demonstrate how to use JAXB without providing any metadata. Domain Model I will use the following domain model for this example. Note how there are no annotations of any kind. Customer Customer is the root object in this example. Normally we would annotate it with @XmlRootElement. Later in the demo code you will see how we can use an instance of JAXBElement instead. package blog.defaults; import java.util.List; public class Customer { private String firstName; private String lastName; private List phoneNumbers; public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } public List getPhoneNumbers() { return phoneNumbers; } public void setPhoneNumbers(List phoneNumbers) { this.phoneNumbers = phoneNumbers; } } PhoneNumber I have purposefully given the fields in this class nonsense names, so that later when we look at the XML you will be able to see that by default the element names are derived from the properties and not the fields. package blog.defaults; public class PhoneNumber { private String foo; private String bar; public String getType() { return foo; } public void setType(String type) { this.foo = type; } public String getNumber() { return bar; } public void setNumber(String number) { this.bar = number; } } Demo Code Since we haven't used @XmlRootElement (or @XmlElementDecl) to associate a root element with our Customer class we will need to tell JAXB what class we want to unmarshal the XML document to. This is done by using one of the unmarshal methods that take a Class parameter (line 14). This will return a JAXBElement, the Customer object is then accessed by calling getValue on it (line 15). To marshal the object back to XML we need to ensure that it is wrapped in a JAXBElement to supply the root element information (line 17). package blog.defaults; import javax.xml.bind.*; import javax.xml.namespace.QName; import javax.xml.transform.stream.StreamSource; public class Demo { public static void main(String[] args) throws Exception { JAXBContext jc = JAXBContext.newInstance(Customer.class); StreamSource xml = new StreamSource("src/blog/defaults/input.xml"); Unmarshaller unmarshaller = jc.createUnmarshaller(); JAXBElement je1 = unmarshaller.unmarshal(xml, Customer.class); Customer customer = je1.getValue(); JAXBElement je2 = new JAXBElement(new QName("customer"), Customer.class, customer); Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); marshaller.marshal(je2, System.out); } } input.xml/Output The following is the input to and output from running the demo code. The first thing we see is that it is a very reasonable XML representation of the data, there aren't any JAXB artifacts. By default JAXB will marshal everything as XML elements, and based on our PhoneNumber class we see that the element names were derived from the property names. Jane Doe 555-1111 work 555-2222 home Further Reading If you enjoyed this post then you may also be interested in: The majority of the articles on this blog describe how to leverage the power of JAXB's metadata to support different use cases, I invite you to check them out: http://blog.bdoughan.com/search/label/JAXB If you are interested in specifying metadata without using annotations, you may be interested in EclipseLink JAXB (MOXy)'s external mapping document: Extending JAXB - Representing Metadata as XML Extending JAXB - Representing Metadata as JSON
August 3, 2012
by Blaise Doughan
· 21,287 Views · 1 Like
article thumbnail
Mustaches in the World of Java
Mustache is templating system with implementation in many languages including Java and JavaScript . The templates are also supported by various web frameworks and client side JS libraries. Mustache has simple idea of "logic-less" system because it lacks any explicit control statements, like if, else or goto and also it does not have for statement however looping and conditional calculation can be achieved using custom tags that work with lists and lambdas. The name unfortunately has less to do with Tom Selleck but more with the heavy use of curly braces that look like mustache. The similarity is more than comparable. Mustache has implementation for most of the widely used languages like: Java, Javascript, Ruby,Net and many more. The client side template's in JavaScript Let say that you have some REST service and you have created a book view object that has an additional function that appends amazon associates id to the book url: var book = { id : 12, title : "A Game of Thrones", url : "http://www.amazon.com/gp/product/0553573403/", amazonId : "myAwesomeness", associateUrl : function() { return this.url + '?tag=' + this.amazonId; }, author : { name : 'George R. R. Martin', imdbUrl : 'http://www.imdb.com/name/nm0552333/', wikiUrl : 'https://en.wikipedia.org/wiki/George_R._R._Martin' }, haveInStock : true, similarBooks : [{ id : 13, title : "Decision Points" }, { id : 13, title : "Spoken from the Heart" }], comments : [] }; The standard way of rendering data without using templates would be create an output variable and just append everything inside and at the end just place the data where it should be. jQuery(document).ready(function() { var out = '' + book.title + ' is awesome book get it on Amazon'; jQuery('#content-jquery').html(out); }); This is fairly simple but if you for example want to change the span element with div it takes a little bit of time to figure where it should be closed and often you can miss if the element should be in single quotes or double quotes. The bigger issue here is that the content is peaces of strings that need to be easy to styled via CSS and JavaScript. As the code gets bigger this becomes unmanageable and changes to anything become slower especially if you add on top of this jQuery's manipulation functions like appendTo() or prependTo(). This direct use of out+= type of creating the content reminds me a lot of HttpServlet style of using print writer and doing out.print() and for the same reason why this was almost abandoned we should not do this in JavaScript. To simplify work we can add template engine like Mustache that is one of many client side tempting engines. So how does a template in mustache looks like, well for the example above with the book it would look like :
August 1, 2012
by Mite Mitreski
· 39,933 Views
article thumbnail
Use Lucene’s MMapDirectory on 64bit Platforms, Please!
Don’t be afraid – Some clarification to common misunderstandings Since version 3.1, Apache Lucene and Solr use MMapDirectory by default on 64bit Windows and Solaris systems; since version 3.3 also for 64bit Linux systems. This change lead to some confusion among Lucene and Solr users, because suddenly their systems started to behave differently than in previous versions. On the Lucene and Solr mailing lists a lot of posts arrived from users asking why their Java installation is suddenly consuming three times their physical memory or system administrators complaining about heavy resource usage. Also consultants were starting to tell people that they should not use MMapDirectory and change their solrconfig.xml to work instead with slow SimpleFSDirectory or NIOFSDirectory (which is much slower on Windows, caused by a JVM bug #6265734). From the point of view of the Lucene committers, who carefully decided that using MMapDirectory is the best for those platforms, this is rather annoying, because they know, that Lucene/Solr can work with much better performance than before. Common misinformation about the background of this change causes suboptimal installations of this great search engine everywhere. In this blog post, I will try to explain the basic operating system facts regarding virtual memory handling in the kernel and how this can be used to largely improve performance of Lucene (“VIRTUAL MEMORY for DUMMIES”). It will also clarify why the blog and mailing list posts done by various people are wrong and contradict the purpose of MMapDirectory. In the second part I will show you some configuration details and settings you should take care of to prevent errors like “mmap failed” and suboptimal performance because of stupid Java heap allocation. Virtual Memory[1] Let’s start with your operating system’s kernel: The naive approach to do I/O in software is the way, you have done this since the 1970s – the pattern is simple: whenever you have to work with data on disk, you execute a syscall to your operating system kernel, passing a pointer to some buffer (e.g. a byte[] array in Java) and transfer some bytes from/to disk. After that you parse the buffer contents and do your program logic. If you don’t want to do too many syscalls (because those may cost a lot processing power), you generally use large buffers in your software, so synchronizing the data in the buffer with your disk needs to be done less often. This is one reason, why some people suggest to load the whole Lucene index into Java heap memory (e.g., by using RAMDirectory). But all modern operating systems like Linux, Windows (NT+), MacOS X, or Solaris provide a much better approach to do this 1970s style of code by using their sophisticated file system caches and memory management features. A feature called “virtual memory” is a good alternative to handle very large and space intensive data structures like a Lucene index. Virtual memory is an integral part of a computer architecture; implementations require hardware support, typically in the form of a memory management unit (MMU) built into the CPU. The way how it works is very simple: Every process gets his own virtual address space where all libraries, heap and stack space is mapped into. This address space in most cases also start at offset zero, which simplifies loading the program code because no relocation of address pointers needs to be done. Every process sees a large unfragmented linear address space it can work on. It is called “virtual memory” because this address space has nothing to do with physical memory, it just looks like so to the process. Software can then access this large address space as if it were real memory without knowing that there are other processes also consuming memory and having their own virtual address space. The underlying operating system works together with the MMU (memory management unit) in the CPU to map those virtual addresses to real memory once they are accessed for the first time. This is done using so called page tables, which are backed by TLBs located in the MMU hardware (translation lookaside buffers, they cache frequently accessed pages). By this, the operating system is able to distribute all running processes’ memory requirements to the real available memory, completely transparent to the running programs. Schematic drawing of virtual memory (image from Wikipedia [1], http://en.wikipedia.org/wiki/File:Virtual_memory.svg, licensed by CC BY-SA 3.0) By using this virtualization, there is one more thing, the operating system can do: If there is not enough physical memory, it can decide to “swap out” pages no longer used by the processes, freeing physical memory for other processes or caching more important file system operations. Once a process tries to access a virtual address, which was paged out, it is reloaded to main memory and made available to the process. The process does not have to do anything, it is completely transparent. This is a good thing to applications because they don’t need to know anything about the amount of memory available; but also leads to problems for very memory intensive applications like Lucene. Lucene & Virtual Memory Let’s take the example of loading the whole index or large parts of it into “memory” (we already know, it is only virtual memory). If we allocate a RAMDirectory and load all index files into it, we are working against the operating system: The operating system tries to optimize disk accesses, so it caches already all disk I/O in physical memory. We copy all these cache contents into our own virtual address space, consuming horrible amounts of physical memory (and we must wait for the copy operation to take place!). As physical memory is limited, the operating system may, of course, decide to swap out our large RAMDirectory and where does it land? – On disk again (in the OS swap file)! In fact, we are fighting against our O/S kernel who pages out all stuff we loaded from disk [2]. So RAMDirectory is not a good idea to optimize index loading times! Additionally, RAMDirectory has also more problems related to garbage collection and concurrency. Because the data residing in swap space, Java’s garbage collector has a hard job to free the memory in its own heap management. This leads to high disk I/O, slow index access times, and minute-long latency in your searching code caused by the garbage collector driving crazy. On the other hand, if we don’t use RAMDirectory to buffer our index and use NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our code has to do a lot of syscalls to the O/S kernel to copy blocks of data between the disk or filesystem cache and our buffers residing in Java heap. This needs to be done on every search request, over and over again. Memory Mapping Files The solution to the above issues is MMapDirectory, which uses virtual memory and a kernel feature called “mmap” [3] to access the disk files. In our previous approaches, we were relying on using a syscall to copy the data between the file system cache and our local Java heap. How about directly accessing the file system cache? This is what mmap does! Basically mmap does the same like handling the Lucene index as a swap file. The mmap() syscall tells the O/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large byte[] array (in Java this is encapsulated by a ByteBuffer interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don’t need to do any syscalls, the processor’s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O/S kernel will load the data into file system cache. If it is already in cache, MMU/TLB map it directly to the physical memory in file system cache. It is now just a native memory access, nothing more! We don’t have to take care of paging in/out of buffers, all this is managed by the O/S kernel. Furthermore, we have no concurrency issue, the only overhead over a standard byte[] array is some wrapping caused by Java’s ByteBuffer interface (it is still slower than a real byte[] array, but that is the only way to use mmap from Java and is much faster than all other directory implementations shipped with Lucene). We also waste no physical memory, as we operate directly on the O/S cache, avoiding all Java GC issues described before. What does this all mean to our Lucene/Solr application? We should not work against the operating system anymore, so allocate as less as possible heap space (-Xmx Java option). Remember, our index accesses rely on passed directly to O/S cache! This is also very friendly to the Java garbage collector. Free as much as possible physical memory to be available for the O/S kernel as file system cache. Remember, our Lucene code works directly on it, so reducing the number of paging/swapping between disk and memory. Allocating too much heap to our Lucene application hurts performance! Lucene does not require it with MMapDirectory. Why does this only work as expected on operating systems and Java virtual machines with 64bit? One limitation of 32bit platforms is the size of pointers, they can refer to any address within 0 and 232-1, which is 4 Gigabytes. Most operating systems limit that address space to 3 Gigabytes because the remaining address space is reserved for use by device hardware and similar things. This means the overall linear address space provided to any process is limited to 3 Gigabytes, so you cannot map any file larger than that into this “small” address space to be available as big byte[] array. And when you mapped that one large file, there is no virtual space (address like “house number”) available anymore. As physical memory sizes in current systems already have gone beyond that size, there is no address space available to make use for mapping files without wasting resources (in our case “address space”, not physical memory!). On 64bit platforms this is different: 264-1 is a very large number, a number in excess of 18 quintillion bytes, so there is no real limit in address space. Unfortunately, most hardware (the MMU, CPU’s bus system) and operating systems are limiting this address space to 47 bits for user mode applications (Windows: 43 bits) [4]. But there is still much of addressing space available to map terabytes of data. Common misunderstandings If you have read carefully what I have told you about virtual memory, you can easily verify that the following is true: MMapDirectory does not consume additional memory and the size of mapped index files is not limited by the physical memory available on your server. By mmap() files, we only reserve address space not memory! Remember, address space on 64bit platforms is for free! MMapDirectory will not load the whole index into physical memory. Why should it do this? We just ask the operating system to map the file into address space for easy access, by no means we are requesting more. Java and the O/S optionally provide the option to try loading the whole file into RAM (if enough is available), but Lucene does not use that option (we may add this possibility in a later version). MMapDirectory does not overload the server when “top” reports horrible amounts of memory. “top” (on Linux) has three columns related to memory: “VIRT”, “RES”, and “SHR”. The first one (VIRT, virtual) is reporting allocated virtual address space (and that one is for free on 64 bit platforms!). This number can be multiple times of your index size or physical memory when merges are running in IndexWriter. If you have only one IndexReader open it should be approximately equal to allocated heap space (-Xmx) plus index size. It does not show physical memory used by the process. The second column (RES, resident) memory shows how much (physical) memory the process allocated for operating and should be in the size of your Java heap space. The last column (SHR, shared) shows how much of the allocated virtual address space is shared with other processes. If you have several Java applications using MMapDirectory to access the same index, you will see this number going up. Generally, you will see the space needed by shared system libraries, JAR files, and the process executable itself (which are also mmapped). How to configure my operating system and Java VM to make optimal use of MMapDirectory? First of all, default settings in Linux distributions and Solaris/Windows are perfectly fine. But there are some paranoid system administrators around, that want to control everything (with lack of understanding). Those limit the maximum amount of virtual address space that can be allocated by applications. So please check that “ulimit -v” and “ulimit -m” both report “unlimited”, otherwise it may happen that MMapDirectory reports “mmap failed” while opening your index. If this error still happens on systems with lot’s of very large indexes, each of those with many segments, you may need to tune your kernel parameters in /etc/sysctl.conf: The default value of vm.max_map_count is 65530, you may need to raise it. I think, for Windows and Solaris systems there are similar settings available, but it is up to the reader to find out how to use them. For configuring your Java VM, you should rethink your memory requirements: Give only the really needed amount of heap space and leave as much as possible to the O/S. As a rule of thumb: Don’t use more than ¼ of your physical memory as heap space for Java running Lucene/Solr, keep the remaining memory free for the operating system cache. If you have more applications running on your server, adjust accordingly. As usual the more physical memory the better, but you don’t need as much physical memory as your index size. The kernel does a good job in paging in frequently used pages from your index. A good possibility to check that you have configured your system optimally is by looking at both "top" (and correctly interpreting it, see above) and the similar command "iotop" (can be installed, e.g., on Ubuntu Linux by "apt-get install iotop"). If your system does lots of swap in/swap out for the Lucene process, reduce heap size, you possibly used too much. If you see lot's of disk I/O, buy more RUM (Simon Willnauer) so mmapped files don't need to be paged in/out all the time, and finally: buy SSDs. Happy mmapping! Bibliography [1] http://en.wikipedia.org/wiki/Virtual_memory [2] https://www.varnish-cache.org/trac/wiki/ArchitectNotes [3] http://en.wikipedia.org/wiki/Memory-mapped_file [4] http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details
July 31, 2012
by Uwe Schindler
· 13,947 Views · 1 Like
article thumbnail
Implementing a Command Line With Eval in JavaScript
This blog post explores JavaScript’s eval function by implementing the foundation for an interactive command line. As a bonus, you’ll get to work with ECMAScript.next’s generators (which can already be tried out on current Firefox versions). Writing an evaluator Let’s say you want to implement an interactive command line for JavaScript (such as [1]). On one hand, you would need to get the graphical user interface right: The user inputs JavaScript code, the command line evaluates the code and displays the result. On the other hand, you would have to implement the evaluation. That’s what we will take on here. It is more complex that it initially seems and teaches us a lot about eval. For starters, let’s write a constructor Evaluator: function Evaluator() { } Evaluator.prototype.evaluate = function (str) { return JSON.stringify(eval(str)); }; To use the evaluator, we create an instance and send JavaScript code to it: > var e = new Evaluator(); > e.evaluate("Math.pow(2, 53)") '9007199254740992' > e.evaluate("3 * 7") '21' > e.evaluate("'foo'+'bar'") '"foobar"' JSON.stringify is used so that the evaluation results can be shown to the user and look like the input. Without stringify, things look as follows: > console.log(123) // OK 123 > console.log("abc") // not OK abc With stringify, everything looks OK: > console.log(JSON.stringify(123)) 123 > console.log(JSON.stringify("abc")) "abc" Note that undefined is not valid JSON, but stringify converts it to undefined (the value, not the string), which is fine for our purposes. What we have implemented so far works for basic things, but still has several problems. Let’s tackle them one at a time. Problem: declarations You can evaluate variable and function declarations, but they are forgotten immediately afterwards: > e.evaluate("var x = 12;") undefined > e.evaluate("x") ReferenceError: x is not defined How do we fix this? The following code is a solution: function Evaluator() { this.env = {}; } Evaluator.prototype.evaluate = function (str) { str = rewriteDeclarations(str); var __environment__ = this.env; // (1) with (__environment__) { // (2) return JSON.stringify(eval(str)); } }; function rewriteDeclarations(str) { // Prefix a newline so that search and replace is simpler str = "\n" + str; str = str.replace(/\nvar\s+(\w+)\s*=/g, "\n__environment__.$1 ="); // (3) str = str.replace(/\nfunction\s+(\w+)/g, "\n__environment__.$1 = function"); return str.slice(1); // remove prefixed newline } this.env holds all variable declarations and function declarations in its properties. We make it accessible to the input in two steps. Step 1 – declare: We assign this.env to __environment__ (1) and rewrite the input so that, among other things, each var declaration assigns to __environment__ (3). That demonstrates one important aspect of eval: it sees all variables in surrounding scopes. That is, if you invoke eval inside your function, you expose all of its internals. The only way to keep those internals secret is to put the eval call in a separate function and call that function. Step 2 – access: Use a with statement so that the properties of __environment__ appear as variables to the eval-ed code. This is not an ideal solution, more of a compromise: with should be avoided [2] and can’t be used in the advantageous strict mode [3]. But it is a quick solution for us now. A work-around is quite complex [4]. > var e = new Evaluator(); > e.evaluate("var x = 123;") '123' > e.evaluate("x") '123' Minor drawback: Normal var declarations have the result undefined; due to our rewriting we now get the value that is assigned to the variable. Problem: exceptions Right now, throwing an exception in evaluate’s input means that the method will throw: > e.evaluate("* 3") SyntaxError: Unexpected token * That is obviously unacceptable: In a graphical user interface, we want to report errors back to the user, not (invisibly) throw an exception. Here is one simple way of doing so: Evaluator.prototype.evaluate = function (str) { try { str = rewriteDeclarations(str); var __environment__ = this.env; with (__environment__) { return JSON.stringify(eval(str)); } } catch (e) { return e.toString(); } }; There is nothing surprising in this code, we simply use try-catch and report back what happened. More sophisticated solutions will want to do more, e.g. display the exception’s stack trace. The new evaluator in action: > var e = new Evaluator(); > e.evaluate("* 3") 'SyntaxError: Unexpected token *' Problem: console.log How do we handle calls to console.log in the input? Logged messages should be shown to the user, not be sent to the browser’s console. The solution is surprisingly easy: function Evaluator(cons) { this.env = {}; this.cons = cons; } Evaluator.prototype.evaluate = function (str) { try { str = rewriteDeclarations(str); var __environment__ = this.env; var console = this.cons; with (__environment__) { return JSON.stringify(eval(str)); } } catch (e) { return e.toString(); } }; The constructor now receives a custom implementation of console and assigns it to this.cons. By assigning that object to a local variable named console (1), we temporarily shadow the global console for eval, there is no need to replace it. Beware that that shadowing affects all of the function, you won’t be able to use the browser’s console anywhere in evaluate. The new evaluator in action: > var cons = { log: function (m) { console.log("### "+m) } }; > var e = new Evaluator(cons); > e.evaluate("console.log('hello')") ### hello undefined Problem: eval creates bindings inside the function One scary feature of eval is that it creates variable bindings inside the function that invokes it: > (function () { eval("var x=3"); return x }()) 3 Fortunately, the fix is easy: use strict mode. > (function () { "use strict"; eval("var x=3"); return x }()) ReferenceError: x is not defined You can’t use with in strict mode, so you’ll have to replace it with a work-around [4]. Keeping declarations in an environment An environment is where JavaScript keeps the parameters and variables of a function. It maps variable names to values and is thus similar to an object. We might be able to avoid rewriting the input and manage declarations via environments. The idea is as follows. eval puts declarations in some environment: Non-strict mode: the environment of the surrounding function. Strict mode: a newly created environment. What if we could reuse that environment for the next invocation of eval, instead of throwing it away? Then eval would properly remember prior declarations. Strict mode gives us no way to access the temporary environment it creates for each invocation. However, in non-strict mode, we might be able to keep the environment of the surrounding function around. The following subsections explore two ways of doing so. Declarations via nested scopes If you create a function g inside another function f, then g permanently retains a reference to f’s current environment envf. Whenever g is called, a new g-specific environment envg is created. But envg points to its parent environment envf. Variables that can’t be found in g’s scope (as managed via envg), are looked up in f’s scope (via envf). Thus, envf is not lost, as long as g exists. That gives us a strategy for keeping the environment of the function that calls eval around. In the following code that function is called evalHelper and creates a new function that has to be used for the next call of eval. Hence, declarations made in the former function are accessible in the later function. function Evaluator() { var that = this; that.evalHelper = function (str) { that.evalHelper = function (str) { return eval(str); }; return eval(str); }; } Evaluator.prototype.evaluate = function (str) { return this.evalHelper(str); }; The fatal problem of this implementation is that you cannot nest to arbitrary depth. But, for the above depth of 2, it works perfectly: > var e = new Evaluator(); > e.evaluate("var x = 7;"); undefined > e.evaluate("x * 3") 21 Declarations via a generator It would be great if we could “restart” the function that calls eval, re-enter it with its previous environment still in place. ECMAScript.next’s generators [5] let you do that. Current versions of Firefox already support generators. Here is a demonstration of how they work in these versions (in ECMAScript.next, you will have to write function*, but apart from that, the code is the same): function mygen() { console.log((yield 0) + " @ 0"); console.log((yield 1) + " @ 1"); console.log((yield 2) + " @ 2"); } The above is a generator function. Invoke it and it will create a generator object. On that object, you first need to invoke the next() method to start execution. A yield x inside the code pauses execution and returns x to the the previously called generator object method. After the first next(), you can either call next() or send(y). The latter means that the currently paused yield will continue and produce the value y. The former is equivalent to send(undefined). The following interaction shows mygen in use: > var g = mygen(); > g.next() // can’t use send() the first time 0 > g.send("a") // continue after yield 0, pause again a @ 0 1 > g.send("b") b @ 1 2 The following is an implementation of Evaluator that calls eval via the generator evalGenerator. Because of that, eval always sees the same environment and remembers declarations. function evalGenerator(console) { var str = yield; while(true) { try { var result = JSON.stringify(eval(str)); str = yield result; } catch (e) { str = yield e.toString(); } } } function Evaluator(cons) { this.evalGen = evalGenerator(cons); this.evalGen.next(); // start } Evaluator.prototype.evaluate = function (str) { return this.evalGen.send(str); }; The new evaluator works as expected. > var e = new Evaluator(); > e.evaluate("var x = 7;") undefined > e.evaluate("x * 2") "14" > e.evaluate("* syntax_error") "SyntaxError: missing ; before statement" The biggest problem with this solution is that it uses the deprecated features non-strict eval together with the new feature generators. There will probably be a way in ECMAScript.next to make this combination work, but it will be a hack and should thus be avoided. Conclusion We have used eval to implement a helper type for a command line. While doing so, we learned a few interesting things about eval: Letting it remember declarations between invocations is complicated; it can access all variables in the scopes surrounding its invocation; and in non-strict mode, it can even create new variables inside the invoking function. The best solution for remembering declarations would be for eval to have an optional parameter for an environment (to be reused), but that is not in the cards. Therefore, the only truly safe solution in pure JavaScript is to use a full-featured JavaScript parser such as esprima to rewrite critical parts of the input code. That is left as an exercise to the reader. References Combining code editing with a command line JavaScript’s with statement and why it’s deprecated JavaScript’s strict mode: a summary Handing variables to eval Asynchronous programming and continuation-passing style in JavaScript
July 27, 2012
by Axel Rauschmayer
· 5,201 Views
article thumbnail
Threads Versus Greenlets in Python Networking Library Gevent
In a previous post, I gave an introduction to gevent to show some of the benefits your application might get from using gevent greenlets instead of threads. Some people, however, took issue with my benchmark code, saying that the threaded example was contrived. In this post, I'll try to answer some of the objections. (It actually turns out that there was a bug in the version of ab I was using to test, as well, so I re-ran the tests from the previous post, too.) Threads versus Greenlets Initially, I had proposed a dummy webserver that handled incoming requests by creating a thread and delegating communication to that thread. The code in question is below: def threads(port): s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() t = threading.Thread(target=handle_request, args=(cli, time.sleep)) t.daemon = True t.start() When I could get the code above ot actually run the full benchmark (which it didn't often do) it ended up getting around 1300-1400 requests per second. The gevent version looked very similar: import gevent def greenlet(port): from gevent import socket s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() gevent.spawn(handle_request, cli, gevent.sleep) This code was able to handle closer to 1600 requests per second. Maybe I should have called it out better, but the fact that the gevent version performed better than the threaded version does point out an important aspect of gevent: Greenlets are significantly lighter-weight than true threads, particularly when creating them. However, the folks objected by pointing out that you just don't do that with threads. Nobody does. It's a dumb way to design a server. I agree with all these points, though that wasn't really the point I was going for. One thing I will point out is that: The reason you don't design threaded servers so they fork a thread each time you get a connection is that threads are expensive to fork, unlike greenlets. Fixing the benchmark So anyway, to "fix" the benchmark so it's a little more fair to threads, we'll use a thread pool to create all the threads up-front and then use a Queue.Queue to send work to them. Our server core now looks like this: def threads(port, N=10): s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) q = Queue() for x in xrange(N): t = threading.Thread(target=thread_worker, args=(q,)) t.daemon = True t.start() print 'Ready and waiting with %d threads on port %d' % ( N, port) while True: cli, addr = s.accept() q.put(cli) def thread_worker(q): while True: sock = q.get() handle_request(sock, time.sleep) If I now run this with a thread pool of 200 threads, I can indeed finish the benchmark (ApacheBench as ab -r -n 2000 -c 200... with around 1300 requests per second (a little less, probably due to the synchronization overhead of the Queue). So updating the benchmark to use a thread pool did not improve the performance. The equivalent gevent code uses gevent.pool.Pool: def greenlet(port, N=10): from gevent.pool import Pool from gevent import socket, sleep pool = Pool(N) s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() pool.spawn(handle_request, cli, sleep) Running ab with the same parameters I now get... around 1200-1400 requests per second. So why use gevent, again? So yes, if I had designed the benchmark to omit the thread/greenlet creation entirely, threads and greenlets do indeed perform about the same. The big win for greenlets is when your thread pool isn't big enough to handle the concurrent connections. It turns out that there's a clever denial-of-service attack on web servers known as slowloris that consumes threads from your thread pool quickly. Once your server's threads are all busy handling the slowloris requests, no further work can be done, and you end up with a very lightly loaded but still unresponsive server. To illustrate this, we can try running our benchmark with the thread pool, but only running 20 threads in the pool, but modifying our request handler to take five seconds to handle a request. We'll go ahead and modify the benchmark line to allow more time for responses as well: $ ab -n 2000 -c 200 -r -t 60 http://127.0.0.1:... Now our threaded example ends up timing out connections as it tries to service 200 concurrent connections, each taking five seconds, with only 20 worker threads. If we go back to our naive (un-pooled) gevent example, however, we're able to achieve 47 requests per second, which is close to the theoretical maximum of 50 requests per second, with a very light server load. The point? A slowloris attack will be able to eat up all the threads in your (finite-sized) thread pool, regardless of how big that pool is. Spawning a greenlet each time you receive a connection means you don't waste (almost) any resources waiting on IO. Conclusion There's a good bit more to gevent that I'd like to cover in future posts, but for now the points I'd like to leave you with are the following: You shouldn't be spawning something expensive like a thread for each incoming connection. It eats up various types of server resources. You shouldn't rely on thread pools to protect you from resource exhaustion, because they can fall victim to the slowloris attack. Gevent greenlets are lightweight enough that you can spawn one for each connection, and you don't have to rely on a pool (which can become exhausted in a slowloris type attack). So what do you think? Have I convinced you? I'd love to hear your reaction in the comments below!
July 26, 2012
by Rick Copeland
· 14,038 Views
article thumbnail
11 OPEN NoSQL Document-Oriented Databases
A document-oriented database is a designed for storing, retrieving, and managing document-oriented, or semi structured data. Document-oriented databases are one of the main categories of NoSQL databases. The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format(s) (or encoding(s)). Encodings in use include XML, YAML, JSON and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on). MongoDB: MongoDB is a collection-oriented, schema-free document database. Data is grouped into sets that are called ‘collections’. Each collection has a unique name in the database, and can contain an unlimited number of documents. Collections are analogous to tables in a RDBMS, except that they don’t have any defined schema. It store data (which is in BASON – “Binary Serialized dOcument Notation” format) that is a structured collection of key-value pairs, where keys are strings, and values are any of a rich set of data types, including arrays and documents. Home: http://www.mongodb.org/ Quick Start: http://www.mongodb.org/display/DOCS/Quickstart Download: http://www.mongodb.org/downloads CouchDB: CouchDB is a document database server, accessible via a RESTful JSON API. It is Ad-hoc and schema-free with a flat address space. Its Query-able and index-able, featuring a table oriented reporting engine that uses JavaScript as a query language. A CouchDB document is an object that consists of named fields. Field values may be strings, numbers, dates, or even ordered lists and associative maps. Home: http://couchdb.apache.org/ Quick Start: http://couchdb.apache.org/docs/intro.html Download: http://couchdb.apache.org/downloads.html Terrastore: Terrastore is a modern document store which provides advanced scalability and elasticity features without sacrificing consistency. It is based on Terracotta, so it relies on an industry-proven, fast clustering technology. Home: http://code.google.com/p/terrastore/ Quick Start: http://code.google.com/p/terrastore/wiki/Documentation Download: http://code.google.com/p/terrastore/downloads/list RavenDB: Raven is a .NET Linq enabled Document Database, focused on providing high performance, schema-less, flexible and scalable NoSQL data store for the .NET and Windows platforms. Raven store any JSON document inside the database. It is schema-less database where you can define indexes using C#’s Linq syntax. Home: http://ravendb.net/ Quick Start: http://ravendb.net/tutorials Download: http://ravendb.net/download OrientDB: OrientDB is an open source NoSQL database management system written in Java. Even if it is a document-based database, the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes. It has a strong security profiling system based on users and roles and supports SQL as a query languages. Home: http://www.orientechnologies.com/ Quick Start: http://code.google.com/p/orient/wiki/Tutorials Download: http://code.google.com/p/orient/wiki/Download ThruDB: Thrudb is a set of simple services built on top of the Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Its purpose is to offer web developers flexible, fast and easy-to-use services that can enhance or replace traditional data storage and access layers. It supports multiple storage backends such as BerkeleyDB, Disk, MySQL and also having Memcache and Spread integration. Home: http://code.google.com/p/thrudb/ Quick Start: http://thrudb.googlecode.com/svn/trunk/doc/Thrudb.pdf Download: http://code.google.com/p/thrudb/source/checkout SisoDB: SisoDb is a document-oriented db-provider for Sql-Server written in C#. It lets you store object graphs of POCOs (plain old clr objects) without having to configure any mappings. Each entity is treated as an aggregate root and will get separate tables created on the fly. Home: http://www.sisodb.com Quick Start: http://www.sisodb.com/Wiki Download: https://github.com/danielwertheim/SisoDb-Provider/ RaptorDB: RaptorDB is a extremely small size and fast embedded, noSql, persisted dictionary database using b+tree or MurMur hash indexing. It was primarily designed to store JSON data (see my fastJSON implementation), but can store any type of data that you give it. Home: http://www.codeproject.com/KB/database/RaptorDB.aspx Quick Start: http://www.codeproject.com/KB/database/RaptorDB.aspx Download: http://www.codeproject.com/KB/database/RaptorDB.aspx CloudKit: CloudKit provides schema-free, auto-versioned, RESTful JSON storage with optional OpenID and OAuth support, including OAuth Discovery. Home: http://getcloudkit.com/ Quick Start: http://getcloudkit.com/api/ Download: https://github.com/jcrosby/cloudkit Perservere: Persevere is an open source set of tools for persistence and distributed computing using an intuitive standards-based JSON interfaces of HTTP REST, JSON-RPC, JSONPath, and REST Channels. The core of the Persevere project is the Persevere Server. The Persevere server includes a Persevere JavaScript client, but the standards-based interface is intended to be used with any framework or client. Home: http://code.google.com/p/persevere-framework/ Quick Start: http://code.google.com/p/persevere-framework/w/list Download: http://code.google.com/p/persevere-framework/downloads/list Jackrabbit: The Apache Jackrabbit™ content repository is a fully conforming implementation of the Content Repository for Java Technology API (JCR, specified in JSR 170 and 283). A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more. Home: http://jackrabbit.apache.org Quick Start: http://jackrabbit.apache.org/getting-started-with-apache-jackrabbit.html Download: http://jackrabbit.apache.org/downloads.html Conclusion: Document databases store and retrieve documents and basic atomic stored unit is a document. As always your requirement leads into the decision. You need to think about your data-access patterns / use-cases to create a smart document-model. When your domain model can be split and partitioned across some documents, a document-database will be a suitable one for you. For example for a blog-software, a CMS or a wiki-software a document-db works extremely well. But at the same time a non-relational database is not better than a relational one in some cases where your database have a lot of relations and normalization. Just check the following link from stackoverflow also to cover the pros/cons of Relational Vs Document based databases. http://stackoverflow.com/questions/337344/pros-cons-of-document-based-databases-vs-relational-databases
July 23, 2012
by Lijin Joseji
· 69,215 Views · 2 Likes
article thumbnail
How Does SQL Server Scheduling Work? There's a Flowchart For That
srgolla - SQL Server Scheduling Flowchart This is a basic flowchart explaining SQL Server Scheduling at a very high level. This will appeal to a limited audience, but is still something I thought very informative and not something I see flowcharted every day (week/month/year).
July 21, 2012
by Greg Duncan
· 7,039 Views
article thumbnail
How Many Java developers are There in the World?
Oracle says it’s 9,000,000. Wikipedia claims it’s 10,000,000. And the guys from NumberOf.net seem to be the most precise - they know that there are exactly 9,007,346 Java developers out there. Nice numbers. I have used those articles as reference points while speaking about the potential market size for our memory leak detection tool. But something in these numbers has bothered me for years - there is no trustworthy and public analysis behind those numbers. Its just conjured up from thin air. So I finally thought I would do something about it and try to figure it out for good. It proved out to be a challenging task. After all - with more than seven billion people on our planet I couldn't call everyone and ask them. Well, maybe I could, but if every call would take on average 20 seconds I would need at least 4,439 years to complete the survey. If I would not sleep nor eat nor rest. So I had to use other ways for estimation. After playing around with different sources of information, I decided to dig into four of them for a closer look: Labour statistics provided by different governments Language popularity sites such as Tiobe and Langpop Employment portals using Indeed.com and Monster.com Download numbers on popular Java tools and libraries - namely Eclipse and Tomcat. Using that information I wanted to estimate the number using three different calculations - based on language popularity indexes, labour statistics and download figures. So, here we go. How many programmers could there be in total? World population is currently above seven billion. Out of those seven billion we can leave out sub-Saharan Africa (900M) and rural Asia (about 50% of its 2.2B population) as negligible. This leaves us with approximately 5 billion people living in regions where overall economical and cultural background can be considered suitable for software industries to spawn. Now, out of those 5,000,000,000 how many could be actually developing software? A good answer at StackExchange gives us some pointers as to where we can find information on the percentage of software developers in different countries. Using the US, Japan, Canada, the EU27 and the UK as a baseline we can estimate that 0.82% of the population is employed as a software developer or programmer: Country Population Developers % Canada 33,476,688 387,000 1.16% EU27 502,486,499 5,900,000 1.17% Japan 127,799,000 1,016,929 0.80% UK 63,162,000 333,000 0.53% US 313,931,000 1,336,300 0.43% Weighted average: 0.86% 0.86% out of five billion is 43,000,000. Lets remember this number, as it will be used as a baseline in following calculations. Popularity contests In the popularity contest we will use two channels for the source of data - the TIOBE index and the Langpop one. Other sources such as Dataist figures were hard to interpret, so we’ll stick just to those two. For the background - the TIOBE ratings are calculated by counting hits of the most popular search engines. The search query that is used is +" programming", e.g. +“Java programming” in our case. Langpop uses more sources for input besides search engine queries - in equal weights it traces open job positions, book titles, search engine results, the number of open source projects and other data to calculate its popularity score. Simplifying TIOBE and Langpop results, we can conclude that according to TIOBE 17% and according to Langpop ~15% of the programmers in the world are using Java. Averaging those numbers we can say that around 16% out of the 43,000,000 developers in the world use Java. This translates to 6,880,000 Java developers out there. Job portals Job portals, especially when considering both available positions and uploaded resumes, are definitely a good source of information. The larger ones also provide nice reports on labour market, which we will dig into next. Note that we used Indeed.com and Monster.com - if you can point us towards more and/or better sources of information, we would be glad to correct our calculations. But using this analysis from Monster.com and the aggregated statistics from Indeed.com we can say that ~18% of Monster.com applicants can program in Java and ~16% of open engineering / programming positions scanned by Indeed.com are looking for Java talent. Averaging those numbers we arrive at 17%. Which out of 41,000,000 programmers in total would translate to 7,310,000 Java guys and girls in the world. Software downloads Every Java developer uses something to build the application. Well, we expect them to use at least a JVM and a compiler. If you happen to know anyone who can get away without those two, please let us know. We would hire him immediately. But most of us tend to use more than just a compiler and a virtual machine. We use IDEs, application servers, build tools, etc. So we figured that we would look into the publicly available download numbers of these tools and try to estimate the number of developers from the download numbers. When calculating the total number of developers from estimated number of users, we take into account the market share of the corresponding software. To estimate the market share we use Zeroturnaround’s statistics gathered in the spring of 2012. Eclipse downloads. Eclipse Juno was released on June 27 and has been downloaded 1,200,000 times during the first 20 days. Looking into the historical data published by eclipse.org we can predict that Juno will be downloaded approximately 8,000,000 times in total. Last four major Eclipse releases have all been released using a yearly release calendar and all the releases took place in June: Juno - 8,000,000 (in a year, expecting the trend to continue. Currently has 1,200,000 downloads in first 20 days). Indigo - 6,000,000 downloads Helios - 4,100,000 downloads Galileo - 2,200,000 downloads Averaging Juno estimates and Indigo results, we can say that Eclipse is downloaded approximately 7,000,000 times a year. Using the Zeroturnaround’s statistics, we expect 68% of Java developers to use Eclipse as a (primary) IDE. If we now make a bold claim that each Java developer on Eclipse will download the IDE exactly once a year, expect the number of downloads per year to be 7,000,000 and consider that 32% of Java developers do not use Eclipse at all, we come to a conclusion that there should be 10,300,00 Java developers in total. Apache Tomcat downloads. Vadim Gritsenko has put together some nice statistics on top of Apache logs. From there we can see that during the last year Tomcat has been downloaded approximately 550,000 times/month. This gives us a yearly total of 6,600,000 Tomcat downloads. Applying now statistics from the same report used for calculating Eclipse’s market share we can estimate that 59% of Java developers are using Tomcat as one of their development platform. If we now again make a bold claim that each Java developer on Tomcat will download every major release exactly once and consider that 41% of Java developers do not use Tomcat, we reach to conclusion that there should be 11,186,000 Java developers out there. Averaging the numbers from Eclipse and Tomcat downloads, we end up with 10,743,000 Java developers. Conclusions We used three different sources for estimation - popularity contests, job market analysis and download numbers of popular Java development infrastructure products. The numbers varied quite a bit - from 6,880,000 to 10,743,000. Aggressively averaging the three numbers we can conclude that there are 8,311,000 Java developers out there. Not quite as much as Oracle or Wikipedia think, but still enough to build a business that provides developing tools for the Java community. Lies. Damn lies. And statistics.
July 20, 2012
by Nikita Salnikov-Tarnovski
· 24,441 Views
article thumbnail
Replacing Query String Elements in C# .NET and JavaScript
While writing list navigation and search features in websites today there is a constant need to find/replace and play with query string elements, so that you can easily manipulate these mystical items while you’re carrying them around in your website’s URLs. I have a few little methods I’ve used over the years and carry with me project to project, and this post is putting them on the record for easy access later. I have a secret. This post is actually more aimed at an audience of 'myself', and my ability to have an easy bit of source code to call upon when I’m on the go looking for a quick solution to cut and paste – as most of my blog posts are. But you, dear reader, you get to share in this benefit with me by pulling from the awesomeness within this post as well. Solution: .Net c# When doing this with c# you have a few pretty cool features up your sleeve. One of these is HttpUtility.ParseQueryString(urlPath) framework method. This static method allows you to extract a NameValueCollection that is editable from a given query string. Why is this cool? Because it allows you to very easily play with the query string collection like it is any other NameValueCollection – with Add() and Remove() methods. This makes it incredibly powerful. Quick & Dirty code beware! The code I’m pasting below is far from being the most elegant solution, i seem to have misplaced my nicer piece of code and am in too much of a rush to find it right now (sorry). Until i find my nicer solution, the method below will get you by – whether you have a hatred for ternary’s or not. public static string ReplaceQueryStringParam(string currentPageUrl, string paramToReplace, string newValue) { string urlWithoutQuery = currentPageUrl.IndexOf('?') >= 0 ? currentPageUrl.Substring(0, currentPageUrl.IndexOf('?')) : currentPageUrl; string queryString = currentPageUrl.IndexOf('?') >= 0 ? currentPageUrl.Substring(currentPageUrl.IndexOf('?')) : null; var queryParamList = queryString != null ? HttpUtility.ParseQueryString(queryString) : HttpUtility.ParseQueryString(string.Empty); if (queryParamList[paramToReplace] != null) { queryParamList[paramToReplace] = newValue; } else { queryParamList.Add(paramToReplace, newValue); } return String.Format("{0}?{1}", urlWithoutQuery, queryParamList); } To call this, you can do the following: // var currentUrl = HttpContext.Current.Request.Url; var currentUrl = "http://www.mysite.com/mypage?category=cool-products&sort=price&page=3"; // change the my sort-by param named"sort" to "name" var newUrlWithChangedSort = ReplaceQueryStringParam(currentUrl, "sort", "name"); Solution: JavaScript The second part of this post includes a JavaScript solution, as you never know when you have to do this on the client side. function replaceQueryString(url, param, value) { if (url.lastIndexOf('?') <= 0) url = url + "?"; var re = new RegExp("([?|&])" + param + "=.*?(&|$)", "i"); if (url.match(re)) return url.replace(re, '$1' + param + "=" + value + '$2'); else return url.substring(url.length - 1) == '?' ? url + param + "=" + value : url + '&' + param + "=" + value; } And to use the above code in your client-side javascript simply write something along the lines of: //var currentUrl = self.location; var currentUrl = "http://www.mysite.com/mypage?category=cool-products&sort=price&page=3"; // change the my sort-by param named"sort" to "name" var newUrlWithChangedSort = replaceQueryString(currentUrl, "sort", "name"); Easy – now next time you need to knock something together, instead of writing it yourself, you can simply cut & paste mine!
July 20, 2012
by Douglas Rathbone
· 16,510 Views
article thumbnail
How Changing Java Package Names Transformed my System Architecture
Changing your perspective even a small amount can have profound effects on how you approach your system. Let’s say you’re writing a web application in Java. In the system you deal with orders, customers and products. As a web application, your classes include staples like PersonController, PersonRepository, CustomerController and OrderService. How do you organize your classes into packages? There are two fundamental ways to structure your packages. Either you can focus on the logical tiers, like com.brodwall.myapp.controllers, com.brodwall.myapp.domain or perhaps com.brodwall.myapp.services.customer. Or you can focus on the domain contexts, like com.brodwall.myapp.customer, com.brodwall.myapp.orders and com.brodwall.myapp.products. The first approach is by far the most prevalent. In my view, it’s also the least helpful. Here are some ways your thinking changes if you structure your packages around domain concepts, rather than technological tiers: First, and most fundamentally, your mental model will now be aligned with that of the users of your system. If you’re asked to implement a typical feature, it is now more likely to be focused around a strict subset of the packages of your system. For example, adding a new field to a form will at least affect the presentation logic, entity and persistence layer for the corresponding domain concept. If your packages are organized around tiers, this change will hit all over your system. In a word: A system organized around features, rather than technologies, have higher coherence. This technical term means that a large percentage of a the dependencies of a class are located close to that class. Secondly, organizing around domain concepts will give you more options when your software grows. When a package contains tens of classes, you may want to split it up in several packages. The discussion can itself be enlightening. “Maybe we should separate out the customer address classes into a com.brodwall.myapp.customer.address package. It seems to have a bit of a life on its own.” “Yeah, and maybe we can use the same classes for other places we need addresses, such as suppliers?” “Cool, so com.brodwall.myapp.address, then?” Or maybe you decide that order status codes and payment status codes deserve to be in the “com.brodwall.myapp.order.codes” package. On the other hand, what options do you have for splitting up com.brodwall.myapp.controllers? You could create subpackages for customer, orders and products, but these subpackages may only have one or possibly two classes each. Finally, and perhaps most intriguingly, using domain concepts for packages allows you to vary the design according on a case by case basis. Maybe you really need a OrderService which coordinates the payment and shipping of an order, while ProductController only needs basic create-retrieve-update-delete functionality with a repository. A ProductService would just get in the way. If ProductService is missing from the com.brodwall.myapp.services package, this may be confusing or at the very least give you a nagging feeling that something is wrong. On the other hand, if there’s no Controller in the com.brodwall.myapp.product package, it doesn’t matter much. Also, most systems have some good parts and some not-so-good parts. If your Services package is not working for you, there’s not much you can do. But if the Products package is rotten, you can throw it out and reimplement it without the whole system being thrown into a state of chaos. By putting the classes needed to implement a feature together with each other and apart from the classes needed to implement other features, developers can be pragmatic and innovative when developing one feature without negatively affecting other features. The flip side of this is that most developers are more comfortable with some technologies in the application and less comfortable with other technologies. Organizing around features instead of technologies force each developer to consider a larger set of technological challenges. Some programmers take this as a motivating challenge to learn, while others, it seems, would rather not have to learn something new. If it were my money being spend to create features, I know what kind of developer I would want. Trivial changes can have large effects. By organizing your software around features, you get a more coherent system that allows for growth. It may challenge your developers, but it drives down the number of hand-offs needed to implement a feature and it challenges the developers to improve the parts of the application they are working on. See also my blog post on Architecture as tidying up.
July 20, 2012
by Johannes Brodwall
· 17,438 Views
article thumbnail
How to Resolve java.lang.NoClassDefFoundError: Part 3
This article is part 3 of our NoClassDefFoundError troubleshooting series. As I mentioned in my first article, there are many possible issues that can lead to a NoClassDefFoundError. This article will focus and describe one of the most common causes of this problem: failure of a Java class static initializer block or variable. A sample Java program will be provided and I encourage you to compile and run this example from your workstation in order to properly replicate and understand this type of NoClassDefFoundError problem. Java static initializer revisited The Java programming language provides you with the capability to “statically” initialize variables or a block of code. This is achieved via the “static” variable identifier or the usage of a static {} block at the header of a Java class. Static initializers are guaranteed to be executed only once in the JVM life cycle and are Thread safe by design which make their usage quite appealing for static data initialization such as internal object caches, loggers etc. What is the problem? I will repeat again, static initializers are guaranteed to be executed only once in the JVM life cycle…This means that such code is executed at the Class loading time and never executed again until you restart your JVM. Now what happens if the code executed at that time (@Class loading time) terminates with an unhandled Exception? Welcome to the java.lang.NoClassDefFoundError problem case #2! NoClassDefFoundError problem case 2 – static initializer failure This type of problem is occurring following the failure of static initializer code combined with successive attempts to create a new instance of the affected (non-loaded) class. Sample Java program The following simple Java program is split as per below: The main Java program NoClassDefFoundErrorSimulator The affected Java class ClassA ClassA provides you with a ON/OFF switch allowing you the replicate the type of problem that you want to study This program is simply attempting to create a new instance of ClassA 3 times (one after each other). It will demonstrate that an initial failure of either a static variable or static block initializer combined with successive attempt to create a new instance of the affected class triggers java.lang.NoClassDefFoundError. #### NoClassDefFoundErrorSimulator.java package org.ph.javaee.tools.jdk7.training2; /** * NoClassDefFoundErrorSimulator * @author Pierre-Hugues Charbonneau * */ public class NoClassDefFoundErrorSimulator { /** * @param args */ public static void main(String[] args) { System.out.println("java.lang.NoClassDefFoundError Simulator - Training 2"); System.out.println("Author: Pierre-Hugues Charbonneau"); System.out.println("http://javaeesupportpatterns.blogspot.com\n\n"); try { // Create a new instance of ClassA (attempt #1) System.out.println("FIRST attempt to create a new instance of ClassA...\n"); ClassA classA = new ClassA(); } catch (Throwable any) { any.printStackTrace(); } try { // Create a new instance of ClassA (attempt #2) System.out.println("\nSECOND attempt to create a new instance of ClassA...\n"); ClassA classA = new ClassA(); } catch (Throwable any) { any.printStackTrace(); } try { // Create a new instance of ClassA (attempt #3) System.out.println("\nTHIRD attempt to create a new instance of ClassA...\n"); ClassA classA = new ClassA(); } catch (Throwable any) { any.printStackTrace(); } System.out.println("\n\ndone!"); } } #### ClassA.java package org.ph.javaee.tools.jdk7.training2; /** * ClassA * @author Pierre-Hugues Charbonneau * */ public class ClassA { private final static String CLAZZ = ClassA.class.getName(); // Problem replication switch ON/OFF private final static boolean REPLICATE_PROBLEM1 = true; // static variable initializer private final static boolean REPLICATE_PROBLEM2 = false; // static block{} initializer // Static variable executed at Class loading time private static String staticVariable = initStaticVariable(); // Static initializer block executed at Class loading time static { // Static block code execution... if (REPLICATE_PROBLEM2) throw new IllegalStateException("ClassA.static{}: Internal Error!"); } public ClassA() { System.out.println("Creating a new instance of "+ClassA.class.getName()+"..."); } /** * * @return */ private static String initStaticVariable() { String stringData = ""; if (REPLICATE_PROBLEM1) throw new IllegalStateException("ClassA.initStaticVariable(): Internal Error!"); return stringData; } } Problem reproduction In order to replicate the problem, we will simply “voluntary” trigger a failure of the static initializer code. Please simply enable the problem type that you want to study e.g. either static variable or static block initializer failure: // Problem replication switch ON (true) / OFF (false) private final static boolean REPLICATE_PROBLEM1 = true; // static variable initializer private final static boolean REPLICATE_PROBLEM2 = false; // static block{} initializer Now, let’s run the program with both switch at OFF (both boolean values at false) ## Baseline (normal execution) java.lang.NoClassDefFoundError Simulator - Training 2 Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com FIRST attempt to create a new instance of ClassA... Creating a new instance of org.ph.javaee.tools.jdk7.training2.ClassA... SECOND attempt to create a new instance of ClassA... Creating a new instance of org.ph.javaee.tools.jdk7.training2.ClassA... THIRD attempt to create a new instance of ClassA... Creating a new instance of org.ph.javaee.tools.jdk7.training2.ClassA... done! For the initial run (baseline), the main program was able to create 3 instances of ClassA successfully with no problem. ## Problem reproduction run (static variable initializer failure) java.lang.NoClassDefFoundError Simulator - Training 2 Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com FIRST attempt to create a new instance of ClassA... java.lang.ExceptionInInitializerError at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:21) Caused by: java.lang.IllegalStateException: ClassA.initStaticVariable(): Internal Error! at org.ph.javaee.tools.jdk7.training2.ClassA.initStaticVariable(ClassA.java:37) at org.ph.javaee.tools.jdk7.training2.ClassA.(ClassA.java:16) ... 1 more SECOND attempt to create a new instance of ClassA... java.lang.NoClassDefFoundError: Could not initialize class org.ph.javaee.tools.jdk7.training2.ClassA at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:30) THIRD attempt to create a new instance of ClassA... java.lang.NoClassDefFoundError: Could not initialize class org.ph.javaee.tools.jdk7.training2.ClassA at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:39) done! ## Problem reproduction run (static block initializer failure) java.lang.NoClassDefFoundError Simulator - Training 2 Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com FIRST attempt to create a new instance of ClassA... java.lang.ExceptionInInitializerError at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:21) Caused by: java.lang.IllegalStateException: ClassA.static{}: Internal Error! at org.ph.javaee.tools.jdk7.training2.ClassA.(ClassA.java:22) ... 1 more SECOND attempt to create a new instance of ClassA... java.lang.NoClassDefFoundError: Could not initialize class org.ph.javaee.tools.jdk7.training2.ClassA at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:30) THIRD attempt to create a new instance of ClassA... java.lang.NoClassDefFoundError: Could not initialize class org.ph.javaee.tools.jdk7.training2.ClassA at org.ph.javaee.tools.jdk7.training2.NoClassDefFoundErrorSimulator.main(NoClassDefFoundErrorSimulator.java:39) done! What happened? As you can see, the first attempt to create a new instance of ClassA did trigger a java.lang.ExceptionInInitializerError. This exception indicates the failure of our static initializer for our static variable & bloc which is exactly what we wanted to achieve. The key point to understand at this point is that this failure did prevent the whole class loading of ClassA. As you can see, attempt #2 and attempt #3 both generated a java.lang.NoClassDefFoundError, why? Well since the first attempt failed, class loading of ClassA was prevented. Successive attempts to create a new instance of ClassA within the current ClassLoader did generate java.lang.NoClassDefFoundError over and over since ClassA was not found within current ClassLoader. As you can see, in this problem context, the NoClassDefFoundError is just a symptom or consequence of another problem. The original problem is the ExceptionInInitializerError triggered following the failure of the static initializer code. This clearly demonstrates the importance of proper error handling and logging when using Java static initializers. Recommendations and resolution strategies Now find below my recommendations and resolution strategies for NoClassDefFoundError problem case 2: - Review the java.lang.NoClassDefFoundError error and identify the missing Java class - Perform a code walkthrough of the affected class and determine if it contains static initializer code (variables & static block) - Review your server and application logs and determine if any error (e.g. ExceptionInInitializerError) originates from the static initializer code - Once confirmed, analyze the code further and determine the root cause of the initializer code failure. You may need to add some extra logging along with proper error handling to prevent and better handle future failures of your static initializer code going forward Please feel free to post any question or comment. The part 4 will start coverage of NoClassDefFoundError problems related to class loader problems.
July 19, 2012
by Pierre - Hugues Charbonneau
· 91,100 Views · 3 Likes
article thumbnail
5 Tips for Proper Java Heap Size
Determination of proper Java Heap size for a production system is not a straightforward exercise. In my Java EE enterprise experience, I have seen multiple performance problem cases due to inadequate Java Heap capacity and tuning. This article will provide you with 5 tips that can help you determine optimal Java Heap size, as a starting point, for your current or new production environment. Some of these tips are also very useful regarding the prevention and resolution of java.lang.OutOfMemoryError problems; including memory leaks. Please note that these tips are intended to “help you” determine proper Java Heap size. Since each IT environment is unique, you are actually in the best position to determine precisely the required Java Heap specifications of your client’s environment. Some of these tips may also not be applicable in the context of a very small Java standalone application but I still recommend you to read the entire article. Future articles will include tips on how to choose the proper Java VM garbage collector type for your environment and applications. #1 – JVM: you always fear what you don't understand How can you expect to configure, tune and troubleshoot something that you don’t understand? You may never have the chance to write and improve Java VM specifications but you are still free to learn its foundation in order to improve your knowledge and troubleshooting skills. Some may disagree, but from my perspective, the thinking that Java programmers are not required to know the internal JVM memory management is an illusion. Java Heap tuning and troubleshooting can especially be a challenge for Java & Java EE beginners. Find below a typical scenario: - Your client production environment is facing OutOfMemoryError on a regular basis and causing lot of business impact. Your support team is under pressure to resolve this problem - A quick Google search allows you to find examples of similar problems and you now believe (and assume) that you are facing the same problem - You then grab JVM -Xms and -Xmx values from another person OutOfMemoryError problem case, hoping to quickly resolve your client’s problem - You then proceed and implement the same tuning to your environment. 2 days later you realize problem is still happening (even worse or little better)…the struggle continues… What went wrong? - You failed to first acquire proper understanding of the root cause of your problem - You may also have failed to properly understand your production environment at a deeper level (specifications, load situation etc.). Web searches is a great way to learn and share knowledge but you have to perform your own due diligence and root cause analysis - You may also be lacking some basic knowledge of the JVM and its internal memory management, preventing you to connect all the dots together My #1 tip and recommendation to you is to learn and understand the basic JVM principles along with its different memory spaces. Such knowledge is critical as it will allow you to make valid recommendations to your clients and properly understand the possible impact and risk associated with future tuning considerations. Now find below a quick high level reference guide for the Java VM: The Java VM memory is split up to 3 memory spaces: The Java Heap. Applicable for all JVM vendors, usually split between YoungGen (nursery) & OldGen (tenured) spaces. The PermGen (permanent generation). Applicable to the Sun HotSpot VM only (PermGen space will be removed in future Java 7 or Java 8 updates) The Native Heap (C-Heap). Applicable for all JVM vendors. I recommend that you review each article below, including Sun white paper on the HotSpot Java memory management. I also encourage you to download and look at the OpenJDK implementation. ## Sun HotSpot VM http://javaeesupportpatterns.blogspot.com/2011/08/java-heap-space-hotspot-vm.html ## IBM VM http://javaeesupportpatterns.blogspot.com/2012/02/java-heap-space-ibm-vm.html ## Oracle JRockit VM http://javaeesupportpatterns.blogspot.com/2012/02/java-heap-space-jrockit-vm.html ## Sun (Oracle) – Java memory management white paper http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf ## OpenJDK – Open-source Java implementation http://openjdk.java.net/ As you can see, the Java VM memory management is more complex than just setting up the biggest value possible via –Xmx. You have to look at all angles, including your native and PermGen space requirement along with physical memory availability (and # of CPU cores) from your physical host(s). It can get especially tricky for 32-bit JVM since the Java Heap and native Heap are in a race. The bigger your Java Heap, smaller the native Heap. Attempting to setup a large Heap for a 32-bit VM e.g .2.5 GB+ increases risk of native OutOfMemoryError depending of your application(s) footprint, number of Threads etc. 64-bit JVM resolves this problem but you are still limited to physical resources availability and garbage collection overhead (cost of major GC collections go up with size). The bottom line is that the bigger is not always the better so please do not assume that you can run all your 20 Java EE applications on a single 16 GB 64-bit JVM process. #2 – Data and application is king: review your static footprint requirement Your application(s) along with its associated data will dictate the Java Heap footprint requirement. By static memory, I mean “predictable” memory requirements as per below. - Determine how many different applications you are planning to deploy to a single JVM process e.g. number of EAR files, WAR files, jar files etc. The more applications you deploy to a single JVM, higher demand on native Heap - Determine how many Java classes will be potentially loaded at runtime; including third part API’s. The more class loaders and classes that you load at runtime, higher demand on the HotSpot VM PermGen space and internal JIT related optimization objects - Determine data cache footprint e.g. internal cache data structures loaded by your application (and third party API’s) such as cached data from a database, data read from a file etc. The more data caching that you use, higher demand on the Java Heap OldGen space - Determine the number of Threads that your middleware is allowed to create. This is very important since Java threads require enough native memory or OutOfMemoryError will be thrown For example, you will need much more native memory and PermGen space if you are planning to deploy 10 separate EAR applications on a single JVM process vs. only 2 or 3. Data caching not serialized to a disk or database will require extra memory from the OldGen space. Try to come up with reasonable estimates of the static memory footprint requirement. This will be very useful to setup some starting point JVM capacity figures before your true measurement exercise (e.g. tip #4). For 32-bit JVM, I usually do not recommend a Java Heap size high than 2 GB (-Xms2048m, -Xmx2048m) since you need enough memory for PermGen and native Heap for your Java EE applications and threads. This assessment is especially important since too many applications deployed in a single 32-bit JVM process can easily lead to native Heap depletion; especially in a multi threads environment. For a 64-bit JVM, a Java Heap size of 3 GB or 4 GB per JVM process is usually my recommended starting point. #3 – Business traffic set the rules: review your dynamic footprint requirement Your business traffic will typically dictate your dynamic memory footprint. Concurrent users & requests generate the JVM GC “heartbeat” that you can observe from various monitoring tools due to very frequent creation and garbage collections of short & long lived objects. As you saw from the above JVM diagram, a typical ratio of YoungGen vs. OldGen is 1:3 or 33%. For a typical 32-bit JVM, a Java Heap size setup at 2 GB (using generational & concurrent collector) will typically allocate 500 MB for YoungGen space and 1.5 GB for the OldGen space. Minimizing the frequency of major GC collections is a key aspect for optimal performance so it is very important that you understand and estimate how much memory you need during your peak volume. Again, your type of application and data will dictate how much memory you need. Shopping cart type of applications (long lived objects) involving large and non-serialized session data typically need large Java Heap and lot of OldGen space. Stateless and XML processing heavy applications (lot of short lived objects) require proper YoungGen space in order to minimize frequency of major collections. Example: - You have 5 EAR applications (~2 thousands of Java classes) to deploy (which include middleware code as well…) - Your native heap requirement is estimated at 1 GB (has to be large enough to handle Threads creation etc.) - Your PermGen space is estimated at 512 MB - Your internal static data caching is estimated at 500 MB - Your total forecast traffic is 5000 concurrent users at peak hours - Each user session data footprint is estimated at 500 K - Total footprint requirement for session data alone is 2.5 GB under peak volume As you can see, with such requirement, there is no way you can have all this traffic sent to a single JVM 32-bit process. A typical solution involves splitting (tip #5) traffic across a few JVM processes and / or physical host (assuming you have enough hardware and CPU cores available). However, for this example, given the high demand on static memory and to ensure a scalable environment in the long run, I would also recommend 64-bit VM but with a smaller Java Heap as a starting point such as 3 GB to minimize the GC cost. You definitely want to have extra buffer for the OldGen space so I typically recommend up to 50% memory footprint post major collection in order to keep the frequency of Full GC low and enough buffer for fail-over scenarios. Most of the time, your business traffic will drive most of your memory footprint, unless you need significant amount of data caching to achieve proper performance which is typical for portal (media) heavy applications. Too much data caching should raise a yellow flag that you may need to revisit some design elements sooner than later. #4 – Don’t guess it, measure it! At this point you should: - Understand the basic JVM principles and memory spaces - Have a deep view and understanding of all applications along with their characteristics (size, type, dynamic traffic, stateless vs. stateful objects, internal memory caches etc.) - Have a very good view or forecast on the business traffic (# of concurrent users etc.) and for each application - Some ideas if you need a 64-bit VM or not and which JVM settings to start with - Some ideas if you need more than one JVM (middleware) processes But wait, your work is not done yet. While this above information is crucial and great for you to come up with “best guess” Java Heap settings, it is always best and recommended to simulate your application(s) behaviour and validate the Java Heap memory requirement via proper profiling, load & performance testing. You can learn and take advantage of tools such as JProfiler (future articles will include tutorials on JProfiler). From my perspective, learning how to use a profiler is the best way to properly understand your application memory footprint. Another approach I use for existing production environments is heap dump analysis using the Eclipse MAT tool. Heap Dump analysis is very powerful and allow you to view and understand the entire memory footprint of the Java Heap, including class loader related data and is a must do exercise in any memory footprint analysis; especially memory leaks. Java profilers and heap dump analysis tools allow you to understand and validate your application memory footprint, including detection and resolution of memory leaks. Load and performance testing is also a must since this will allow you to validate your earlier estimates by simulating your forecast concurrent users. It will also expose your application bottlenecks and allow you to further fine tune your JVM settings. You can use tools such as Apache JMeter which is very easy to learn and use or explore other commercial products. Finally, I have seen quite often Java EE environments running perfectly fine until the day where one piece of the infrastructure start to fail e.g. hardware failure. Suddenly the environment is running at reduced capacity (reduced # of JVM processes) and the whole environment goes down. What happened? There are many scenarios that can lead to domino effects but lack of JVM tuning and capacity to handle fail-over (short term extra load) is very common. If your JVM processes are running at 80%+ OldGen space capacity with frequent garbage collections, how can you expect to handle any fail-over scenario? Your load and performance testing exercise performed earlier should simulate such scenario and you should adjust your tuning settings properly so your Java Heap has enough buffer to handle extra load (extra objects) at short term. This is mainly applicable for the dynamic memory footprint since fail-over means redirecting a certain % of your concurrent users to the available JVM processes (middleware instances). #5 – Divide and conquer At this point you have performed dozens of load testing iterations. You know that your JVM is not leaking memory. Your application memory footprint cannot be reduced any further. You tried several tuning strategies such as using a large 64-bit Java Heap space of 10 GB+, multiple GC policies but still not finding your performance level acceptable? In my experience I found that, with current JVM specifications, proper vertical and horizontal scaling which involved creating a few JVM processes per physical host and across several hosts will give you the throughput and capacity that you are looking for. Your IT environment will also more fault tolerant if you break your application list in a few logical silos, with their own JVM process, Threads and tuning values. This “divide and conquer” strategy involves splitting your application(s) traffic to multiple JVM processes and will provide you with: - Reduced Java Heap size per JVM process (both static & dynamic footprint) - Reduced complexity of JVM tuning - Reduced GC elapsed and pause time per JVM process - Increased redundancy and fail-over capabilities - Aligned with latest Cloud and IT virtualization strategies The bottom line is that when you find yourself spending too much time in tuning that single elephant 64-bit JVM process, it is time to revisit your middleware and JVM deployment strategy and take advantage of vertical & horizontal scaling. This implementation strategy is more taxing for the hardware but will really pay off in the long run. Please provide any comment and share your experience on JVM Heap sizing and tuning.
July 19, 2012
by Pierre - Hugues Charbonneau
· 143,259 Views · 7 Likes
article thumbnail
My Experience Moving Data from MySQL to Cassandra
I had a relational database, that I wanted to migrate to cassandra. Cassandra's sstableloader provides option to load the existing data from flat files to a cassandra ring. Hence this can be used as a way to migrate data in relational databases to cassandra, as most relational databases let us export the data into flat files. sqoop gives the option to do this effectively. Interestingly, DataStax Enterprise provides everything we want in the big data space as a package. This includes, cassandra, hadoop, hive, pig, sqoop, and mahout, which comes handy in this case. Under the resources directory, you may find the cassandra, dse, hadoop, hive, log4j-appender, mahout, pig, solr, sqoop, and tomcat specific configurations. For example, from resources/hadoop/bin, you may format the hadoop name node using ./hadoop namenode -format as usual. * Download and extract DataStax Enterprise binary archive (dse-2.1-bin.tar.gz). * Follow the documentation, which is also available as a PDF. * Migrating a relational database to cassandra is documented and is also blogged. * Before starting DataStax, make sure that the JAVA_HOME is set. This also can be set directly on conf/hadoop-env.sh. * Include the connector to the relational database into a location reachable by sqoop. I put mysql-connector-java-5.1.12-bin.jar under resources/sqoop. * Set the environment $ bin/dse-env.sh * Start DataStax Enterprise, as an Analytics node. $ sudo bin/dse cassandra -t where cassandra starts the Cassandra process plus CassandraFS and the -t option starts the Hadoop JobTracker and TaskTracker processes. if you start without the -t flag, the below exception will be thrown during the further operations that are discussed below. No jobtracker found Unable to run : jobtracker not found Hence do not miss the -t flag. * Start cassandra cli to view the cassandra keyrings and you will be able to view the data in cassandra, once you migrate using sqoop as given below. $ bin/cassandra-cli -host localhost -port 9160 Confirm that it is connected to the test cluster that is created on the port 9160, by the below from the CLI. [default@unknown] describe cluster; Cluster Information: Snitch: com.datastax.bdp.snitch.DseDelegateSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: f5a19a50-b616-11e1-0000-45b29245ddff: [127.0.1.1] If you have missed mentioning the host/port (starting the cli by just bin/cassandra-cli) or given it wrong, you will get the response as "Not connected to a cassandra instance." $ bin/dse sqoop import --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --table Category --split-by categoryName --cassandra-keyspace shopping_cart_db --cassandra-column-family Category_cf --cassandra-row-key categoryName --cassandra-thrift-host localhost --cassandra-create-schema Above command will now migrate the table "Category" in the shopping_cart_db with the primary key categoryName, into a cassandra keyspace named shopping_cart, with the cassandra row key categoryName. You may use the --direct mysql specific option, which is faster. In my above command, I have everything runs on localhost. +--------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-------------+------+-----+---------+-------+ | categoryName | varchar(50) | NO | PRI | NULL | | | description | text | YES | | NULL | | | image | blob | YES | | NULL | | +--------------+-------------+------+-----+---------+-------+ This also creates the respective java class (Category.java), inside the working directory. To import all the tables in the database, instead of a single table. $ bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct Here "-m 1" tag ensures a sequential import. If not specified, the below exception will be thrown. ERROR tool.ImportAllTablesTool: Error during import: No primary key could be found for table Category. Please specify one with --split-by or perform a sequential import with '-m 1'. To check whether the keyspace is created, [default@unknown] show keyspaces; ................ Keyspace: shopping_cart_db: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: ColumnFamily: Category_cf Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 200000.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy ............. [default@unknown] describe shopping_cart_db; Keyspace: shopping_cart_db: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:1] Column Families: ColumnFamily: Category_cf Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 200000.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy You may also use hive to view the databases created in cassandra, in an sql-like manner. * Start Hive $ bin/dse hive hive> show databases; OK default shopping_cart_db When the entire database is imported as above, separate java classes will be created for each of the tables. $ bin/dse sqoop import-all-tables -m 1 --connect jdbc:mysql://127.0.0.1:3306/shopping_cart_db --username root --password root --cassandra-thrift-host localhost --cassandra-create-schema --direct 12/06/15 15:42:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 12/06/15 15:42:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 12/06/15 15:42:11 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Category` AS t LIMIT 1 12/06/15 15:42:11 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Category.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Category.jar 12/06/15 15:42:13 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:13 INFO mapreduce.ImportJobBase: Beginning import of Category 12/06/15 15:42:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 12/06/15 15:42:15 INFO mapred.JobClient: Running job: job_201206151241_0007 12/06/15 15:42:16 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:42:25 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:42:25 INFO mapred.JobClient: Job complete: job_201206151241_0007 12/06/15 15:42:25 INFO mapred.JobClient: Counters: 18 12/06/15 15:42:25 INFO mapred.JobClient: Job Counters 12/06/15 15:42:25 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6480 12/06/15 15:42:25 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:42:25 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:42:25 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:42:25 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:42:25 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:42:25 INFO mapred.JobClient: Bytes Written=2848 12/06/15 15:42:25 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:42:25 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21419 12/06/15 15:42:25 INFO mapred.JobClient: CFS_BYTES_WRITTEN=2848 12/06/15 15:42:25 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:42:25 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:42:25 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:42:25 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:42:25 INFO mapred.JobClient: Map input records=1 12/06/15 15:42:25 INFO mapred.JobClient: Physical memory (bytes) snapshot=119435264 12/06/15 15:42:25 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:42:25 INFO mapred.JobClient: CPU time spent (ms)=630 12/06/15 15:42:25 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:42:25 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2085318656 12/06/15 15:42:25 INFO mapred.JobClient: Map output records=36 12/06/15 15:42:25 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:42:25 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 11.4492 seconds (0 bytes/sec) 12/06/15 15:42:25 INFO mapreduce.ImportJobBase: Retrieved 36 records. 12/06/15 15:42:25 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Customer` AS t LIMIT 1 12/06/15 15:42:25 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Customer.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:25 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Customer.jar 12/06/15 15:42:26 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:26 INFO mapreduce.ImportJobBase: Beginning import of Customer 12/06/15 15:42:26 INFO mapred.JobClient: Running job: job_201206151241_0008 12/06/15 15:42:27 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:42:35 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:42:35 INFO mapred.JobClient: Job complete: job_201206151241_0008 12/06/15 15:42:35 INFO mapred.JobClient: Counters: 17 12/06/15 15:42:35 INFO mapred.JobClient: Job Counters 12/06/15 15:42:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6009 12/06/15 15:42:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:42:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:42:35 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:42:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:42:35 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:42:35 INFO mapred.JobClient: Bytes Written=0 12/06/15 15:42:35 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:42:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21489 12/06/15 15:42:35 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:42:35 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:42:35 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:42:35 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:42:35 INFO mapred.JobClient: Map input records=1 12/06/15 15:42:35 INFO mapred.JobClient: Physical memory (bytes) snapshot=164855808 12/06/15 15:42:35 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:42:35 INFO mapred.JobClient: CPU time spent (ms)=510 12/06/15 15:42:35 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:42:35 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2082869248 12/06/15 15:42:35 INFO mapred.JobClient: Map output records=0 12/06/15 15:42:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:42:35 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.3143 seconds (0 bytes/sec) 12/06/15 15:42:35 INFO mapreduce.ImportJobBase: Retrieved 0 records. 12/06/15 15:42:35 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `OrderEntry` AS t LIMIT 1 12/06/15 15:42:35 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderEntry.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderEntry.jar 12/06/15 15:42:36 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:36 INFO mapreduce.ImportJobBase: Beginning import of OrderEntry 12/06/15 15:42:36 INFO mapred.JobClient: Running job: job_201206151241_0009 12/06/15 15:42:37 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:42:45 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:42:45 INFO mapred.JobClient: Job complete: job_201206151241_0009 12/06/15 15:42:45 INFO mapred.JobClient: Counters: 17 12/06/15 15:42:45 INFO mapred.JobClient: Job Counters 12/06/15 15:42:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6381 12/06/15 15:42:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:42:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:42:45 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:42:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:42:45 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:42:45 INFO mapred.JobClient: Bytes Written=0 12/06/15 15:42:45 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:42:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21569 12/06/15 15:42:45 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:42:45 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:42:45 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:42:45 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:42:45 INFO mapred.JobClient: Map input records=1 12/06/15 15:42:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=137252864 12/06/15 15:42:45 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:42:45 INFO mapred.JobClient: CPU time spent (ms)=520 12/06/15 15:42:45 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:42:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2014703616 12/06/15 15:42:45 INFO mapred.JobClient: Map output records=0 12/06/15 15:42:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:42:45 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2859 seconds (0 bytes/sec) 12/06/15 15:42:45 INFO mapreduce.ImportJobBase: Retrieved 0 records. 12/06/15 15:42:45 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `OrderItem` AS t LIMIT 1 12/06/15 15:42:45 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderItem.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:45 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/OrderItem.jar 12/06/15 15:42:46 WARN manager.CatalogQueryManager: The table OrderItem contains a multi-column primary key. Sqoop will default to the column orderNumber only for this job. 12/06/15 15:42:46 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:46 INFO mapreduce.ImportJobBase: Beginning import of OrderItem 12/06/15 15:42:46 INFO mapred.JobClient: Running job: job_201206151241_0010 12/06/15 15:42:47 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:42:55 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:42:55 INFO mapred.JobClient: Job complete: job_201206151241_0010 12/06/15 15:42:55 INFO mapred.JobClient: Counters: 17 12/06/15 15:42:55 INFO mapred.JobClient: Job Counters 12/06/15 15:42:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5949 12/06/15 15:42:55 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:42:55 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:42:55 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:42:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:42:55 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:42:55 INFO mapred.JobClient: Bytes Written=0 12/06/15 15:42:55 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:42:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21524 12/06/15 15:42:55 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:42:55 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:42:55 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:42:55 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:42:55 INFO mapred.JobClient: Map input records=1 12/06/15 15:42:55 INFO mapred.JobClient: Physical memory (bytes) snapshot=116674560 12/06/15 15:42:55 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:42:55 INFO mapred.JobClient: CPU time spent (ms)=590 12/06/15 15:42:55 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:42:55 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2014703616 12/06/15 15:42:55 INFO mapred.JobClient: Map output records=0 12/06/15 15:42:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:42:55 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2539 seconds (0 bytes/sec) 12/06/15 15:42:55 INFO mapreduce.ImportJobBase: Retrieved 0 records. 12/06/15 15:42:55 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:42:55 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Payment` AS t LIMIT 1 12/06/15 15:42:55 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Payment.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:42:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Payment.jar 12/06/15 15:42:56 WARN manager.CatalogQueryManager: The table Payment contains a multi-column primary key. Sqoop will default to the column orderNumber only for this job. 12/06/15 15:42:56 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:42:56 INFO mapreduce.ImportJobBase: Beginning import of Payment 12/06/15 15:42:56 INFO mapred.JobClient: Running job: job_201206151241_0011 12/06/15 15:42:57 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:43:05 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:43:05 INFO mapred.JobClient: Job complete: job_201206151241_0011 12/06/15 15:43:05 INFO mapred.JobClient: Counters: 17 12/06/15 15:43:05 INFO mapred.JobClient: Job Counters 12/06/15 15:43:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5914 12/06/15 15:43:05 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:43:05 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:43:05 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:43:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:43:05 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:43:05 INFO mapred.JobClient: Bytes Written=0 12/06/15 15:43:05 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:43:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21518 12/06/15 15:43:05 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:43:05 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:43:05 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:43:05 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:43:05 INFO mapred.JobClient: Map input records=1 12/06/15 15:43:05 INFO mapred.JobClient: Physical memory (bytes) snapshot=137998336 12/06/15 15:43:05 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:43:05 INFO mapred.JobClient: CPU time spent (ms)=520 12/06/15 15:43:05 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:43:05 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2082865152 12/06/15 15:43:05 INFO mapred.JobClient: Map output records=0 12/06/15 15:43:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:43:05 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2642 seconds (0 bytes/sec) 12/06/15 15:43:05 INFO mapreduce.ImportJobBase: Retrieved 0 records. 12/06/15 15:43:05 INFO tool.CodeGenTool: Beginning code generation 12/06/15 15:43:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `Product` AS t LIMIT 1 12/06/15 15:43:06 INFO orm.CompilationManager: HADOOP_HOME is /home/pradeeban/programs/dse-2.1/resources/hadoop/bin/.. Note: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Product.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/06/15 15:43:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-pradeeban/compile/926ddf787c73be06c4e2ad1f8fc522f1/Product.jar 12/06/15 15:43:06 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 12/06/15 15:43:06 INFO mapreduce.ImportJobBase: Beginning import of Product 12/06/15 15:43:07 INFO mapred.JobClient: Running job: job_201206151241_0012 12/06/15 15:43:08 INFO mapred.JobClient: map 0% reduce 0% 12/06/15 15:43:16 INFO mapred.JobClient: map 100% reduce 0% 12/06/15 15:43:16 INFO mapred.JobClient: Job complete: job_201206151241_0012 12/06/15 15:43:16 INFO mapred.JobClient: Counters: 18 12/06/15 15:43:16 INFO mapred.JobClient: Job Counters 12/06/15 15:43:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5961 12/06/15 15:43:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/06/15 15:43:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/06/15 15:43:16 INFO mapred.JobClient: Launched map tasks=1 12/06/15 15:43:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/06/15 15:43:16 INFO mapred.JobClient: File Output Format Counters 12/06/15 15:43:16 INFO mapred.JobClient: Bytes Written=248262 12/06/15 15:43:16 INFO mapred.JobClient: FileSystemCounters 12/06/15 15:43:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21527 12/06/15 15:43:16 INFO mapred.JobClient: CFS_BYTES_WRITTEN=248262 12/06/15 15:43:16 INFO mapred.JobClient: CFS_BYTES_READ=87 12/06/15 15:43:16 INFO mapred.JobClient: File Input Format Counters 12/06/15 15:43:16 INFO mapred.JobClient: Bytes Read=0 12/06/15 15:43:16 INFO mapred.JobClient: Map-Reduce Framework 12/06/15 15:43:16 INFO mapred.JobClient: Map input records=1 12/06/15 15:43:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=144871424 12/06/15 15:43:16 INFO mapred.JobClient: Spilled Records=0 12/06/15 15:43:16 INFO mapred.JobClient: CPU time spent (ms)=1030 12/06/15 15:43:16 INFO mapred.JobClient: Total committed heap usage (bytes)=121241600 12/06/15 15:43:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2085318656 12/06/15 15:43:16 INFO mapred.JobClient: Map output records=300 12/06/15 15:43:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/06/15 15:43:16 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 9.2613 seconds (0 bytes/sec) 12/06/15 15:43:16 INFO mapreduce.ImportJobBase: Retrieved 300 records. I found DataStax an interesting project to explore more. I have blogged on the issues that I faced on this as a learner, and how easily can they be fixed - Issues that you may encounter during the migration to Cassandra using DataStax/Sqoop and the fixes.
July 16, 2012
by Pradeeban Kathiravelu
· 20,424 Views · 2 Likes
article thumbnail
Apache Thrift with Java Quickstart
Apache Thrift is a RPC framework founded by facebook and now it is an Apache project. Thrift lets you define data types and service interfaces in a language neutral definition file. That definition file is used as the input for the compiler to generate code for building RPC clients and servers that communicate over different programming languages. You can refer Thrift white paper also. According to the official web site Apache Thrift is a, software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages. Image courtesy wikipedia Installing Apache Thrift in Windows Installation Thrift can be a tiresome process. But for windows the compiler is available as a prebuilt exe. Download thrift.exe and add it into your environment variables. Writing Thrift definition file (.thrift file) Writing the Thrift definition file becomes really easy once you get used to it. I found this tutorial quite useful to begin with. Example definition file (add.thrift) namespace java com.eviac.blog.samples.thrift.server // defines the namespace typedef i32 int //typedefs to get convenient names for your types service AdditionService { // defines the service to add two numbers int add(1:int n1, 2:int n2), //defines a method } Compiling Thrift definition file To compile the .thrift file use the following command. thrift --gen For my example the command is, thrift --gen java add.thrift After performing the command, inside gen-java directory you'll find the source codes which is useful for building RPC clients and server. In my example it will create a java code called AdditionService.java Writing a service handler Service handler class is required to implement the AdditionService.Iface interface. Example service handler (AdditionServiceHandler.java) package com.eviac.blog.samples.thrift.server; import org.apache.thrift.TException; public class AdditionServiceHandler implements AdditionService.Iface { @Override public int add(int n1, int n2) throws TException { return n1 + n2; } } Writing a simple server Following is an example code to initiate a simple thrift server. To enable the multithreaded server uncomment the commented parts of the example code. Example server (MyServer.java) package com.eviac.blog.samples.thrift.server; import org.apache.thrift.transport.TServerSocket; import org.apache.thrift.transport.TServerTransport; import org.apache.thrift.server.TServer; import org.apache.thrift.server.TServer.Args; import org.apache.thrift.server.TSimpleServer; public class MyServer { public static void StartsimpleServer(AdditionService.Processor processor) { try { TServerTransport serverTransport = new TServerSocket(9090); TServer server = new TSimpleServer( new Args(serverTransport).processor(processor)); // Use this for a multithreaded server // TServer server = new TThreadPoolServer(new // TThreadPoolServer.Args(serverTransport).processor(processor)); System.out.println("Starting the simple server..."); server.serve(); } catch (Exception e) { e.printStackTrace(); } } public static void main(String[] args) { StartsimpleServer(new AdditionService.Processor(new AdditionServiceHandler())); } } Writing the client Following is an example java client code which consumes the service provided by AdditionService. Example client code (AdditionClient.java) package com.eviac.blog.samples.thrift.client; import org.apache.thrift.TException; import org.apache.thrift.protocol.TBinaryProtocol; import org.apache.thrift.protocol.TProtocol; import org.apache.thrift.transport.TSocket; import org.apache.thrift.transport.TTransport; import org.apache.thrift.transport.TTransportException; public class AdditionClient { public static void main(String[] args) { try { TTransport transport; transport = new TSocket("localhost", 9090); transport.open(); TProtocol protocol = new TBinaryProtocol(transport); AdditionService.Client client = new AdditionService.Client(protocol); System.out.println(client.add(100, 200)); transport.close(); } catch (TTransportException e) { e.printStackTrace(); } catch (TException x) { x.printStackTrace(); } } } Run the server code(MyServer.java). It should output following and will listen to the requests. Starting the simple server... Then run the client code(AdditionClient.java). It should output following. 300
July 16, 2012
by Pavithra Gunasekara
· 43,532 Views · 2 Likes
article thumbnail
JMS With ActiveMQ
Java Message Service is a mechanism for integrating applications in a loosely coupled, flexible manner and delivers data asynchronously across applications.
July 14, 2012
by Pavithra Gunasekara
· 165,036 Views · 13 Likes
  • Previous
  • ...
  • 433
  • 434
  • 435
  • 436
  • 437
  • 438
  • 439
  • 440
  • 441
  • 442
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×