DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Performance Topics

article thumbnail
How to Optimize MySQL UNION for High Speed
There are two ways to speedup UNIONs in a MySQL database. First use UNION ALL if at all possible, and second try to push down your conditions. 1. UNION ALL is much faster than UNION How does a UNION work? Imagine you have two tables for shirts. The short_sleeve table looks like this: blue green gray black And long_sleeve another that looks like this: red green yellow blue Related: Why Generalists are Better at Scaling the Web If you UNION those two tables, first MySQL will sort the combined set into a temp table like this: black blue blue gray green green red yellow Once it’s done this sort, it can easily remove the duplicate blue & duplicate green for this resulting set: black blue gray green red yellow See also: Mythical MySQL DBA – the talent drought. Why does it do this? UNION is defined that way in SQL. Duplicates must be removed and this is an efficient way for the MySQL engine to remove them. Combine results, sort, remove duplicates and return the set. Queries with UNION can be accelerated in two ways. Switch to UNION ALL or try to push ORDER BY, LIMIT and WHERE conditions inside each subquery. You’ll be glad you did! What if we did UNION ALL? The result would look like this: blue green gray black red green yellow blue Read this: MySQL DBA Interview & Hiring Guide. It doesn’t have to sort, and doesn’t have to remove duplicates. If you imagine combining two 10 million row tables, and don’t have to sort, this speedup can be HUGE. 2. Use Push-down Conditions to speedup UNION in MySQL Imagine with our example above the shirts have a design date, the year they were released. Yes we’re keeping this example very simple to illustrate the concept. Here is the short_sleeve table: blue 2013 green 2013 green 2012 gray 2011 black 2009 black 2011 And long_sleeve table looks like this: red 2012 red 2013 green 2011 yellow 2010 blue 2011 For 2013 designs could combine them like this: (SELECT type, release FROM short_sleeve) UNION (SELECT type, release FROM long_sleeve); WHERE release >=2013; See also: 5 More Things Deadly to Scalability and the original 5 Things Toxic to Scalability.. Here the WHERE clause works on this 11 record temp table: black 2009 black 2011 blue 2011 blue 2013 gray 2011 green 2013 green 2012 green 2011 red 2012 red 2013 yellow 2010 But it would be much faster to move the WHERE inside each subquery like this: (SELECT type, release FROM short_sleeve WHERE release >=2013) UNION (SELECT type, release FROM long_sleeve WHERE release >=2013); That would be operating on a combined 3 record table. Faster to sort & remove duplicates. Smaller result sets cache better too, providing a pay forward dividend. That’s what performance optimization is all about! Read this: RDS or MySQL – 10 Use Cases. Remember multi-million row sets in each part of this query will quickly illustrate the optimization. We’re using very small results to make visualizing easier. You can also use this optimization for ORDER BY and for LIMIT conditions. By reducing the number of records returned by EACH PART of the UNION, you reduce the work that happens at the stage where they are all combined. If you’re seeing some UNION queries in your slow query log, I suggest you try this optimization out and see if you can tweak it.
June 17, 2013
by Sean Hull
· 24,041 Views
article thumbnail
The Benefits of a Reverse Proxy
A typical ASP.NET public website hosted on IIS is usually configured in such a way that the server that IIS is installed on is visible to the public internet. HTTP requests from a browser or web service client are routed directly to IIS which also hosts the ASP.NET worker process. All the functionality needed to produce the web site is embodied in a single server. This includes caching, SSL termination, authentication, serving static files and compression. This approach is simple and straightforward for small sites, but is hard to scale, both in terms of performance, and in terms of managing the complexity of a large complex application. This is especially true if you have a distributed service oriented architecture with multiple HTTP endpoints that appear and disappear frequently. A reverse proxy is server component that sits between the internet and your web servers. It accepts HTTP requests, provides various services, and forwards the requests to one or many servers. Having a point at which you can inspect, transform and route HTTP requests before they reach your web servers provides a whole host of benefits. Here are some: Load Balancing This is the reverse proxy function that people are most familiar with. Here the proxy routes incoming HTTP requests to a number of identical web servers. This can work on a simple round-robin basis, or if you have statefull web servers (it’s better not to) there are session-aware load balancers available. It’s such a common function that load balancing reverse proxies are usually just referred to as ‘load balancers’. There are specialized load balancing products available, but many general purpose reverse proxies also provide load balancing functionality. Security A reverse proxy can hide the topology and characteristics of your back-end servers by removing the need for direct internet access to them. You can place your reverse proxy in an internet facing DMZ, but hide your web servers inside a non-public subnet. Authentication You can use your reverse proxy to provide a single point of authentication for all HTTP requests. SSL Termination Here the reverse proxy handles incoming HTTPS connections, decrypting the requests and passing unencrypted requests on to the web servers. This has several benefits: Removes the need to install certificates on many back end web servers. Provides a single point of configuration and management for SSL/TLS Takes the processing load of encrypting/decrypting HTTPS traffic away from web servers. Makes testing and intercepting HTTP requests to individual web servers easier. Serving Static Content Not strictly speaking ‘reverse proxying’ as such. Some reverse proxy servers can also act as web servers serving static content. The average web page can often consist of megabytes of static content such as images, CSS files and JavaScript files. By serving these separately you can take considerable load from back end web servers, leaving them free to render dynamic content. Caching The reverse proxy can also act as a cache. You can either have a dumb cache that simply expires after a set period, or better still a cache that respects Cache-Control and Expires headers. This can considerably reduce the load on the back-end servers. Compression In order to reduce the bandwidth needed for individual requests, the reverse proxy can decompress incoming requests and compress outgoing ones. This reduces the load on the back-end servers that would otherwise have to do the compression, and makes debugging requests to, and responses from, the back-end servers easier. Centralised Logging and Auditing Because all HTTP requests are routed through the reverse proxy, it makes an excellent point for logging and auditing. URL Rewriting Sometimes the URL scheme that a legacy application presents is not ideal for discovery or search engine optimisation. A reverse proxy can rewrite URLs before passing them on to your back-end servers. For example, a legacy ASP.NET application might have a URL for a product that looks like this: http://www.myexampleshop.com/products.aspx?productid=1234 You can use a reverse proxy to present a search engine optimised URL instead: http://www.myexampleshop.com/products/1234/lunar-module Aggregating Multiple Websites Into the Same URL Space In a distributed architecture it’s desirable to have different pieces of functionality served by isolated components. A reverse proxy can route different branches of a single URL address space to different internal web servers. For example, say I’ve got three internal web servers: http://products.internal.net/ http://orders.internal.net/ http://stock-control.internal.net/ I can route these from a single external domain using my reverse proxy: http://www.example.com/products/ -> http://products.internal.net/ http://www.example.com/orders/ -> http://orders.internal.net/ http://www.example.com/stock/ -> http://stock-control.internal.net/ To an external customer it appears that they are simply navigating a single website, but internally the organisation is maintaining three entirely separate sites. This approach can work extremely well for web service APIs where the reverse proxy provides a consistent single public facade to an internal distributed component oriented architecture. So … So, a reverse proxy can off load much of the infrastructure concerns of a high-volume distributed web application. We’re currently looking at Nginx for this role. Expect some practical Nginx related posts about how to do some of this stuff in the very near future. Happy proxying!
May 6, 2013
by Mike Hadlow
· 32,884 Views · 2 Likes
article thumbnail
HotSpot GC Thread CPU footprint on Linux
The following question will test your knowledge on garbage collection and high CPU troubleshooting for Java applications running on Linux OS. This troubleshooting technique is especially crucial when investigating excessive GC and / or CPU utilization. It will assume that you do not have access to advanced monitoring tools such as Compuware dynaTrace or even JVisualVM. Future tutorials using such tools will be presented in the future but please ensure that you first master the base troubleshooting principles. Question: How can you monitor and calculate how much CPU % each of the Oracle HotSpot or JRockit JVM garbage collection (GC) threads is using at runtime on Linux OS? Answer: On the Linux OS, Java threads are implemented as native Threads, which results in each thread being a separate Linux process. This means that you are able to monitor the CPU % of any Java thread created by the HotSpot JVM using the top –H command (Threads toggle view). That said, depending of the GC policy that you are using and your server specifications, the HotSpot & JRockit JVM will create a certain number of GC threads that will be performing young and old space collections. Such threads can be easily identified by generating a JVM thread dump. As you can see below in our example, the Oracle JRockit JVM did create 4 GC threads identified as "(GC Worker Thread X)”. ===== FULL THREAD DUMP =============== Fri Nov 16 19:58:36 2012 BEA JRockit(R) R27.5.0-110-94909-1.5.0_14-20080204-1558-linux-ia32 "Main Thread" id=1 idx=0x4 tid=14911 prio=5 alive, in native, waiting -- Waiting for notification on: weblogic/t3/srvr/T3Srvr@0xfd0a4b0[fat lock] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method) at java/lang/Object.wait(J)V(Native Method) at java/lang/Object.wait(Object.java:474) at weblogic/t3/srvr/T3Srvr.waitForDeath(T3Srvr.java:730) ^-- Lock released while waiting: weblogic/t3/srvr/T3Srvr@0xfd0a4b0[fat lock] at weblogic/t3/srvr/T3Srvr.run(T3Srvr.java:380) at weblogic/Server.main(Server.java:67) at jrockit/vm/RNI.c2java(IIIII)V(Native Method) -- end of trace "(Signal Handler)" id=2 idx=0x8 tid=14920 prio=5 alive, in native, daemon "(GC Main Thread)" id=3 idx=0xc tid=14921 prio=5 alive, in native, native_waiting, daemon "(GC Worker Thread 1)" id=? idx=0x10 tid=14922 prio=5 alive, in native, daemon "(GC Worker Thread 2)" id=? idx=0x14 tid=14923 prio=5 alive, in native, daemon "(GC Worker Thread 3)" id=? idx=0x18 tid=14924 prio=5 alive, in native, daemon "(GC Worker Thread 4)" id=? idx=0x1c tid=14925 prio=5 alive, in native, daemon ……………………… Now let’s put all of these principles together via a simple example. Step #1 - Monitor the GC thread CPU utilization The first step of the investigation is to monitor and determine: Identify the native Thread ID for each GC worker thread shown via the Linux top –H command. Identify the CPU % for each GC worker thread. Step #2 – Generate and analyze JVM Thread Dumps At the same time of Linux top –H, generate 2 or 3 JVM Thread Dump snapshots via kill -3 . Open the JVM Thread Dump and locate the JVM GC worker threads. Now correlate the "top -H" output data with the JVM Thread Dump data by looking at the native thread id (tid attribute). As you can see in our example, such analysis did allow us to determine that all our GC worker threads were using around 20% CPU each. This was due to major collections happening at that time. Please note that it is also very useful to enable verbose:gc as it will allow you to correlate such CPU spikes with minor and major collections and determine how much your JVM GC process is contributing to the overall server CPU utilization.
April 17, 2013
by Pierre - Hugues Charbonneau
· 14,506 Views
article thumbnail
Stepping Backwards while Debugging: Move To Line
it happens to me many times: i’m stepping with the debugger through my code, and ups! i made one step too far! debugging, and made one step over too far what now? restart the whole debugging session? actually, there is a way to go ‘backwards’ gdb has a ‘reverse debugging’ feature, described here . i’m using the eclipse based codewarrior debugger, and this debug engine is not using gdb. the codewarrior debugger in mcu10.3 supports an eclipse feature: i select a code line in the editor view and use move to line : move to line what it does: it changes the current pc (program counter) of the program to that line: performed move to line now i can continue debugging from that line, e.g. stepping into that function call. yes, this is not true backward debugging. but it is simple and very effective. to perform true backward stepping, the debugger would need to reverse all operations, typically with a rather heavy state machine and data recording. but for the usual case where i simply need to go back a few lines, the ‘move to line’ is perfect. of course there are a few points to consider: this only changes the program counter. any variable changes/etc are not affected or reverted. in case of highly optimized code, there might be multiple sequence points per source line. so doing this for highly optimized code might not work correctly. it works ok within a function. it is not recommended to use it e.g. to set the pc outside of a function. because the context/stack frame is not set up. i use the ‘move to line’ frequently to ‘advance’ the program execution. e.g. to bypass some long sequences i’m not interested in, or to get out of an ‘endless’ loop. the same ‘move to line’ as available while doing assembly stepping too. see this post for details. happy line moving
April 15, 2013
by Erich Styger
· 9,861 Views
article thumbnail
Java EE 7 and EJB 3.2 support in JBoss AS 8
Some of you might be aware that the Public Final Draft version of Java EE 7 spec has been released. Among various other new things, this version of Java EE, brings in EJB 3.2 version of the EJB specification. EJB 3.2 has some new features compared to the EJB 3.1 spec. I'm quoting here the text present in the EJB 3.2 spec summarizing what's new: The Enterprise JavaBeans 3.2 architecture extends Enterprise JavaBeans to include the following new functionality and simplifications to the earlier EJB APIs: Support for the following features has been made optional in this release and their description is moved to a separate EJB Optional Features document: EJB 2.1 and earlier Entity Bean Component Contract for Container-Managed Persistence EJB 2.1 and earlier Entity Bean Component Contract for Bean-Managed Persistence Client View of an EJB 2.1 and earlier Entity Bean EJB QL: Query Language for Container-Managed Persistence Query Methods JAX-RPC Based Web Service Endpoints JAX-RPC Web Service Client View Added support for local asynchronous session bean invocations and non-persistent EJB Timer Service to EJB 3.2 Lite. Removed restriction on obtaining the current class loader; replaced ‘must not’ with ‘should exercise caution’ when using the Java I/O package. Added an option for the lifecycle callback interceptor methods of stateful session beans to be executed in a transaction context determined by the lifecycle callback method's transaction attribute. Added an option to disable passivation of stateful session beans. Extended the TimerService API to query all active timers in the same EJB module. Removed restrictions on javax.ejb.Timer and javax.ejb.TimerHandle references to be used only inside a bean. Relaxed default rules for designating implemented interfaces for a session bean as local or as remote business interfaces. Enhanced the list of standard activation properties. Enhanced embeddable EJBContainer by implementing AutoClosable interface. As can be seen, some of the changes proposed are minor. But there are some which are useful major changes. We'll have a look at a couple of such changes in this article. 1) New API TimerService.getAllTimers() EJB 3.2 version introduces a new method on the javax.ejb.TimerService interface, named getAllTimers. Previously the TimerService interface had (and still has) a getTimers method. The getTimers method was expected to return the active timers that are applicable for the bean on whose TimerService, the method had been invoked (remember: there's one TimerService per EJB). In this new EJB 3.2 version, the newly added getAllTimers() method is expected to return all the active timers that are applicable to *all beans within the same EJB module*. Typically, an EJB module corresponds to a EJB jar, but it could also be a .war deployment if the EJBs are packaged within the .war. This new getAllTimers() method is a convenience API for user applications which need to find all the active timers within the EJB module to which that bean belongs. 2) Ability to disable passivation of stateful beans Those familiar with EJBs will know that the EJB container provides passivation (storing the state of the stateful bean to some secondary store) and activation (loading the saved state of the stateful bean) capability to stateful beans. However, previous EJB versions didn't have a portable way of disabling passivation of stateful beans, if the user application desired to do that. The new EJB 3.2 version introduces a way where the user application can decide whether the stateful bean can be passivated or not. By default, the stateful bean is considered to be "passivation capable" (like older versions of EJB). However, if the user wants to disable passivation support for certain stateful bean, then the user has the option to either disable it via annotation or via the ejb-jar.xml deployment descriptor. Doing it the annotation way is as simple as setting the passivationCapable attribute on the @javax.ejb.Stateful annotation to false. Something like: @javax.ejb.Stateful(passivationCapable=false) // the passivationCapable attribute takes a boolean value public class MyStatefulBean { .... } Doing it in the ejb-jar.xml is as follows: foo-bar-bean org.myapp.FooBarStatefulBean Stateful false ... Two important things to note in that ejb-jar.xml are the version=3.2 attribute (along with the http://xmlns.jcp.org/xml/ns/javaee/ejb-jar_3_2.xsd schema location) on the ejb-jar root element and the passivation-capable element under the session element. So, using either of these approaches will allow you to disable passivation on stateful beans, if you want to do so. Java EE 7 and EJB 3.2 support in JBoss AS8: JBoss AS8 has been adding support for Java EE 7 since the Public Final Draft version of the spec has been announced. Support for EJB 3.2 is already added and made available. Some other Java EE 7 changes have also made it to the latest JBoss AS 8 builds. To keep track of the Java EE 7 changes in JBoss AS8, keep an eye on this JIRA https://issues.jboss.org/browse/AS7-6553. To use the already implemented features of Java EE 7 in general or EJB 3.2 in particular, you can download the latest nightly build/binary of JBoss AS from here. Give it a try and let us know how it goes. For any feedback, questions or if you run into any kind of issues, feel free to open a discussion thread in our user forum here.
April 5, 2013
by Jaikiran Pai
· 8,808 Views
article thumbnail
Algorithm of the Week: Aho-Corasick String Matching Algorithm in Haskell
let’s say you have a large piece of text and a dictionary of keywords. how do you quickly locate all the keywords? aho-corasick algorithm diagram well, there are many ways really, you could even iterate through the whole thing and compare words to keywords. but it turns out that’s going to be very slow. at least o(n_keywords * n_words) complexity. essentially you’re making as many passes over the text as your dictionary is big. in 1975 a couple of ibm researchers – alfred aho and margaret corasick – discovered an algorithm that can do this in a single pass. the aho-corasick string matching algorithm . i implemented it in haskell and it takes 0.005s to find 8 different keywords in oscar wilde’s the nightingale and the rose – a 12kb text. a quick naive keyword search implemented in python takes 0.023s . not a big difference practically speaking, but imagine a situation with megabytes of text and thousands of words in the dictionary. the authors mention printing out the result as a major bottleneck in their assessment of the algorithm. yep, printing . the aho-corasick algorithm at the core of this algorithm are three functions: the three functions of aho-corasick algorithm a parser based on a state machine, which maps (state, char) pairs to states and occasionally emits an output. this is called the goto function a failure function, which tells the goto function which state to jump into when the character it just read doesn’t match anything an output function, which maps states to outputs – potentially more than one per state the algorithm works in two stages. it will first construct the goto, failure and output functions. the complexity of this operation hinges solely on the size of our dictionary. then it iterates over the input text to produce all the matches. using state machines for parsing text is a well known trick – the real genius of this algorithm rests in that failure function if you ask me. it makes lateral transitions between states when the algorithm climbs itself into a wall. say you have she and hers in the dictionary. the goto machine eats your input string one character at the time. let’s say it’s already read s h . the next input is an e so it outputs she and reaches a final state. next it reads an r , but the state didn’t expect any more inputs, so the failure function puts us on the path towards hers . this is a bit tricky to explain in text, i suggest you look at the picture from the original article and look at what’s happening. my haskell implementation the first implementation i tried, relied on manully mapping inputs to outputs for the goto, failure and output functions by using pattern recognition. not very pretty, extremely hardcoded, but it worked and was easy to make. building the functions dynamically proved a bit trickier. type goto = map (int, char) int type failure = map int int type output = map int [string] first off, we build the goto function. -- builds the goto function build_goto::goto -> string -> (goto, string) build_goto m s = (add_one 0 m s, s) -- adds one string to goto function add_one::int -> goto -> [char] -> goto add_one _ m [] = m add_one state m (c:rest) | member key m = add_one (frommaybe 0 $ map.lookup key m) m rest | otherwise = add_one max (map.insert key max m) rest where key = (state, c) max = (size m)+1 essentially this builds a flattened prefix tree in a hashmap of (state, char) pairs mapping to the next state. it makes sure to avoid adding new edges to the three as much as possible. the reason it’s not simply a prefix tree are those lateral transitions; doing them in a tree would require backtracking and repeating of steps, so we haven’t achieved anything. once we have the goto function, building the output is trivial. -- builds the output function build_output::(?m::goto) => [string] -> output build_output [] = empty build_output (s:rest) = map.insert (fin 0 s) (list.filter (\x -> elem x dictionary) $ list.tails s) $ build_output rest -- returns the state in which an input string ends without using failures fin::(?m::goto) => int -> [char] -> int fin state [] = state fin state (c:rest) = fin next rest where next = frommaybe 0 $ map.lookup (state, c) ?m we are essentially going over the dictionary, finding the final state for each word and building a hash table mapping final states to their outputs. building the failure function was trickiest, because we need a way to iterate over the depths at which nodes are position in the goto state machine. but we threw that info away by using a hashmap. -- tells us which nodes in the goto state machine are at which traversal depth nodes_at_depths::(?m::goto) => [[int]] nodes_at_depths = list.map (\i -> list.filter (>0) $ list.map (\l -> if i < length l then l!!i else -1) paths) [0..(maximum $ list.map length paths)-1] where paths = list.map (path 0) dictionary we now have a list of lists, that tells us at which depth certain nodes are. -- builds the failure function build_fail::(?m::goto) => [[int]] -> int -> failure build_fail nodes 0 = fst $ mapaccuml (\f state -> (map.insert state 0 f, state)) empty (nodes!!0) build_fail nodes d = fst $ mapaccuml (\f state -> (map.insert state (decide_fail state lower) f, state)) lower (nodes!!d) where lower = build_fail nodes (d-1) -- inner step of building the failure function decide_fail::(?m::goto) => int -> failure -> int decide_fail state lower = findwithdefault 0 (s, c) ?m where (s', c) = key' state $ assocs ?m s = findwithdefault 0 s' lower -- gives us the key associated with a certain state (how to get there) key'::int -> [((int, char), int)] -> (int, char) key' _ [] = (-1, '_') -- this is ugly, being of maybe type would be better key' state ((k, v):rest) | state == v = k | otherwise = key' state rest here we are going over the list of nodes at depths and deciding what the failure should be for each depth based on the failures of depth-1. at depth zero, all failures go to the zeroth state. an important part of this process was inverting the goto hashmap so values point to keys, which is essentially what the key’ function does. finally, we can use the whole algorithm like this: main = do let ?m = fst $ mapaccuml build_goto empty dictionary let ?f = build_fail nodes_at_depths $ (length $ nodes_at_depths)-1 ?out = build_output dictionary print $ ahocorasick text a bit more involved than the usual example of haskell found online, it’s still pretty cool you can see the whole code on github here .
March 19, 2013
by Swizec Teller
· 21,932 Views
article thumbnail
Maven's Non-Resolvable Parent POM Problem
Need help dealing with Maven's non-resolvable parent problem? Check out this post to learn how.
March 12, 2013
by Roger Hughes
· 462,919 Views · 8 Likes
article thumbnail
Where is My Datastore in Hyper-V? Server Virtualization - Part 4
The term 'datastore’ is one that many of you who work with VMware are familiar, but which doesn’t really translate to the world of Microsoft’s Hyper-V. “Since Hyper-V does not require a different formatting of the underlying physical disk structure like VMFS(VMware’s proprietary disk format) we are able to browse the ‘datastore’ with File Explorer(In Windows 8/Server 2012…formerly known as Windows Explorer).” “Who said that?” That quote was from my friend Tommy Patterson, who writes about datastores and how they compare to the file system structures used in Hyper-V in Part 4 of our “20+ Days of Server Virtualization” series. In his article he describes the locations of the various components that define and make up a Hyper-V virtual machine, and even provides a script to help you quickly locate filesystem locations for your virtual machine bits. READ HIS ARTICLE HERE
March 9, 2013
by Kevin Remde
· 9,025 Views
article thumbnail
JUnit testing of Spring MVC application: Testing the Service Layer
In continuation of my earlier blogs on Introduction to Spring MVC and Testing DAO layer in Spring MVC, in this blog I will demonstrate how to test Service layer in Spring MVC. The objective of this demo is 2 fold, to build the Service layer using TDD and increase the code coverage during JUnit testing of Service layer. For people in hurry, get the latest code from Github and run the below command mvn clean test -Dtest=com.example.bookstore.service.AccountServiceTest Since in my earlier blog, we have already tested the DAO layer, in this blog we only need to focus on testing service layer. We need to mock the DAO layer so that we can control the behavior in Service layer and cover various scenarios. Mockito is a good framework which is used to mock a method and return known data and assert that in the JUnit. As a first step we define the AccountServiceTestContextConfiguration class with AccountServiceTest class. If you notice there are 2 beans defined in that class and we marked the as a @Configuration which shows that it is a Spring Context class. In the JUnit test we @Autowired AccountService class. And AccountServiceImpl @Autowired the AccountRepository class. When creating the Bean in the configuration file we also stubbed the AccountRepository class using Mockito, @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration public class AccountServiceTest { @Configuration static class AccountServiceTestContextConfiguration { @Bean public AccountService accountService() { return new AccountServiceImpl(); } @Bean public AccountRepository accountRepository() { return Mockito.mock(AccountRepository.class); } } //We Autowired the AccountService bean so that it is injected from the configuration @Autowired private AccountService accountService; @Autowired private AccountRepository accountRepository; During the setup of the JUnit we use Mockito mock findByUsername method to return a predefined account object as below @Before public void setup() { Account account = new AccountBuilder() { { address("Herve", "4650", "Rue de la gare", "1", null, "Belgium"); credentials("john", "secret"); name("John", "Doe"); } }.build(true); Mockito.when(accountRepository.findByUsername("john")).thenReturn(account); } Now we write the tests as below and test both the positive and negative scenarios, @Test(expected = AuthenticationException.class) public void testLoginFailure() throws AuthenticationException { accountService.login("john", "fail"); } @Test() public void testLoginSuccess() throws AuthenticationException { Account account = accountService.login("john", "secret"); assertEquals("John", account.getFirstName()); assertEquals("Doe", account.getLastName()); } } Finally we verify if the findByUsername method is called only once successfully as below in the teardown, @After public void verify() { Mockito.verify(accountRepository, VerificationModeFactory.times(1)).findByUsername(Mockito.anyString()); // This is allowed here: using container injected mocks Mockito.reset(accountRepository); } AccountService class looks as below, @Service @Transactional(readOnly = true) public class AccountServiceImpl implements AccountService { @Autowired private AccountRepository accountRepository; @Override public Account login(String username, String password) throws AuthenticationException { Account account = this.accountRepository.findByUsername(username, password); } else { throw new AuthenticationException("Wrong username/password", "invalid.username"); } return account; } } I hope this blog helped you. In my next blog, I will demo how to build a controller JUnit test. Reference: Pro Spring MVC: With Web Flow by by Marten Deinum, Koen Serneels
March 3, 2013
by Krishna Prasad
· 81,605 Views · 3 Likes
article thumbnail
Understanding TCP/IP Network Stack & Writing Network Apps
We cannot imagine Internet service without TCP/IP. All Internet services we have developed and used at NHN are based on a solid basis, TCP/IP. Understanding how data is transferred via the network will help you to improve performance through tuning, troubleshooting, or introduction to a new technology. This article will describe the overall operation scheme of the network stack based on data flow and control flow in Linux OS and the hardware layer. Key Characteristics of TCP/IP How should I design a network protocol to transmit data quickly while keeping the data order without any data loss? TCP/IP has been designed with this consideration. The following are the key characteristics of TCP/IP required to understand the concept of the stack. TCP and IP Technically, since TCP and IP have different layer structures, it would be correct to describe them separately. However, here we will describe them as one. 1. Connection-oriented First, a connection is made between two endpoints (local and remote) and then data is transferred. Here, the "TCP connection identifier" is a combination of addresses of the two endpoints, having type. 2. Bidirectional Byte Stream Bidirectional data communication is made by using byte stream. 3. In-order Delivery A receiver receives data in the order of sending data from a sender. For that, the order of data is required. To mark the order, 32-bit integer data type is used. 4. Reliability through ACK When a sender did not receive ACK (acknowledgement) from a receiver after sending data to the receiver, the sender TCP re-sends the data to the receiver. Therefore, the sender TCP buffers unacknowledged data from the receiver. 5. Flow Control A sender sends as much data as a receiver can afford. A receiver sends the maximum number of bytes that it can receive (unused buffer size, receive window) to the sender. The sender sends as much data as the size of bytes that the receiver's receive window allows. 6. Congestion Control The congestion window is used separately from the receive window to prevent network congestion by limiting the volume of data flowing in the network. Like the receive window, the sender sends as much data as the size of bytes that the receiver's congestion window allows by using a variety of algorithms such as TCP Vegas, Westwood, BIC, and CUBIC. Different from flow control, congestion control is implemented by the sender only. Data Transmission As indicated by its name, a network stack has many layers. The following Figure 1 shows the layer types. Figure 1: Operation Process by Each Layer of TCP/IP Network Stack for Data Transmission. There are several layers and the layers are briefly classified into three areas: User area Kernel area Device area Tasks at the user area and the kernel area are performed by the CPU. The user area and the kernel area are called "host" to distinguish them from the device area. Here, the device is the Network Interface Card (NIC) that sends and receives packets. It is a more accurate term than the commonly used "LAN card". Let's take a look at the user area. First, the application creates data to send (the "User data" box in Figure 1) and then calls the write() system call to send the data. Assume that the socket (fd in Figure 1) has been already created. When the system call is called, the area is switched to the kernel area. POSIX-series operating systems including Linux and Unix expose the socket to the application by using a file descriptor. In the POSIX-series operating system, the socket is a kind of a file. The file layer executes a simple examination and calls the socket function by using the socket structure connected to the file structure. The kernel socket has two buffers. One is the send socket buffer for sending; And the other is the receive socket buffer for receiving. When the write system call is called, the data in the user area is copied to the kernel memory and then added to the end of the send socket buffer. This is to send data in order. In the Figure 1, the light-gray box refers to the data in the socket buffer. Then, TCP is called. There is the TCP Control Block (TCB) structure connected to the socket. The TCB includes data required for processing the TCP connection. Data in the TCB are connection state (LISTEN, ESTABLISHED, TIME_WAIT),receive window, congestion window, sequence number, resending timer, etc. If the current TCP state allows for data transmission, a new TCP segment (in other words, a packet) is created. If data transmission is impossible due to flow control or such a reason, the system call is ended here and then the mode is returned to the user mode (in other words, the control is passed to the application). There are two TCP segments as shown in Figure 2: TCP header; And payload. Figure 2: TCP Frame Structure (source). The payload includes the data saved in the unacknowledged send socket buffer. The maximum length of the payload is the maximum value among the receive window, congestion window, and maximum segment size (MSS). Then, TCP checksum is computed. In this checksum computation, pseudo header information (IP addresses, segment length, and protocol number) is included. One or more packets can be transmitted according to the TCP state. In fact, since the current network stack uses the checksum offload, the TCP checksum is computed by NIC, not by the kernel. However, we assume that the kernel computes the TCP checksum for convenience. The created TCP segment goes down to the IP layer. The IP layer adds an IP header to the TCP segment and performs IP routing. IP routing is a procedure of searching the next hop IP in order to go to the destination IP. After the IP layer has computed and added the IP header checksum, it sends the data to the Ethernet layer. The Ethernet layer searches for the MAC address of the next hop IP by using the Address Resolution Protocol (ARP). It then adds the Ethernet header to the packet. The host packet is completed by adding the Ethernet header. After IP routing is performed, the transmit interface (NIC) is known as the result of IP routing. The interface is used for transmitting a packet to the next hop IP and the IP. Therefore, the transmit NIC driver is called. At this time, if a packet capture program such as tcpdump or Wireshark is running, the kernel copies the packet data onto the memory buffer that the program uses. In that way, the receiving packet is directly captured on the driver. Generally, the traffic shaper function is implemented to run on this layer. The driver requests packet transmission according to the driver-NIC communication protocol defined by the NIC manufacturer. After receiving the packet transmission request, the NIC copies the packets from the main memory to its memory and then sends it to the network line. At this time, by complying with the Ethernet standard, it adds the IFG (Inter-Frame Gap), preamble, and CRC to the packet. The IFG and preamble are used to distinguish the start of the packet (as a networking term, framing), and the CRC is used to protect the data (the same purpose as TCP and IP checksum). Packet transmission is started based on the physical speed of the Ethernet and the condition of Ethernet flow control. It is like getting the floor and speaking in a conference room. When an NIC sends a packet, the NIC generates interrupts on the host CPU. Every interrupt has its own interrupt number and the OS searches an adequate driver to handle the interrupt by using the number. The driver registers a function to handle the interrupt (an interrupt handler) when the driver is started. The OS calls the interrupt handler and then the interrupt handler returns the transmitted packet to the OS. So far we have discussed the procedure of data transmission through the kernel and the device when an application performs write. However, without a direct write request from the application, the kernel can transmit a packet by directly calling TCP. For example, when an ACK is received and the receive window is expanded, the kernel creates a TCP segment including the data left in the socket buffer and sends the TCP segment to the receiver. Data Receiving Now, let's take a look at how data is received. Data receiving is a procedure for how the network stack handles a packet coming in. Figure 3 shows how the network stack handles a packet received. Figure 3: Operation Process by Each Layer of TCP/IP Network Stack for Handling Data Received. First, the NIC writes the packet onto its memory. It checks whether the packet is valid by performing the CRC check and then sends the packet to the memory buffer of the host. This buffer is a memory that has already been requested by the driver to the kernel and allocated for receiving packets. After the buffer has been allocated, the driver tells the memory address and size to the NIC. When there is no host memory buffer allocated by the driver even though the NIC receives a packet, the NIC may drop the packet. After sending the packet to the host memory buffer, the NIC sends an interrupt to the host OS. Then, the driver checks whether it can handle the new packet or not. So far, the driver-NIC communication protocol defined by the manufacturer is used. When the driver should send a packet to the upper layer, the packet must be wrapped in a packet structure that the OS uses for the OS to understand the packet. For example, sk_buff of Linux, mbuf of BSD-series kernel, and NET_BUFFER_LIST of Microsoft Windows are the packet structures of the corresponding OS. The driver sends the wrapped packets to the upper layer. The Ethernet layer checks whether the packet is valid and then de-multiplexes the upper protocol (network protocol). At this time, it uses the ethertype value of the Ethernet header. The IPv4 ethertype value is 0x0800. It removes the Ethernet header and then sends the packet to the IP layer. The IP layer also checks whether the packet is valid. In other words, it checks the IP header checksum. It logically determines whether it should perform IP routing and make the local system handle the packet, or send the packet to the other system. If the packet must be handled by the local system, the IP layer de-multiplexes the upper protocol (transport protocol) by referring to the proto value of the IP header. The TCP proto value is 6. It removes the IP header and then sends the packet to the TCP layer. Like the lower layer, the TCP layer checks whether the packet is valid. It also checks the TCP checksum. As mentioned before, since the current network stack uses the checksum offload, the TCP checksum is computed by NIC, not by the kernel. Then it searches the TCP control block where the packet is connected. At this time, of the packet is used as an identifier. After searching the connection, it performs the protocol to handle the packet. If it has received new data, it adds the data to the receive socket buffer. According to the TCP state, it can send a new TCP packet (for example, an ACK packet). Now TCP/IP receiving packet handling has completed. The size of the receive socket buffer is the TCP receive window. To a certain point, the TCP throughput increases when the receive window is large. In the past, the socket buffer size had been adjusted on the application or the OS configuration. The latest network stack has a function to adjust the receive socket buffer size, i.e., the receive window, automatically. When the application calls the read system call, the area is changed to the kernel area and the data in the socket buffer is copied to the memory in the user area. The copied data is removed from the socket buffer. And then the TCP is called. The TCP increases the receive window because there is new space in the socket buffer. And it sends a packet according to the protocol status. If no packet is transferred, the system call is terminated. Network Stack Development Direction The functions of network stack layers described so far are the most basic functions. The network stack in the early 1990s had few more functions than the functions described above. However, the latest network stack has many more functions and complexity as the network stack implementation structure gets higher. The latest network stack is classified by purpose as follows. Packet Processing Procedure Manipulation It is a function like Netfilter (firewall, NAT) and traffic control. By inserting the user-controllable code to the basic processing flow, the function can work differently according to the user configuration. Protocol Performance It aims to improve the throughput, latency, and stability that the TCP protocol can achieve within the given network environment. Various congestion control algorithms and additional TCP functions such as SACK are the typical examples. The protocol improvement will not be discussed here since it is out of the scope. Packet Processing Efficiency The packet processing efficiency aims to improve the maximum number of packets that can be processed per second by reducing the CPU cycle, memory usage, and memory accesses that one system consumes to process packets. There have been several attempts to reduce the latency in the system. The attempts include stack parallel processing, header prediction, zero-copy, single-copy, checksum offload, TSO, LRO, RSS, etc. Control Flow in the Stack Now, we will take a more detailed look at the internal flow of the Linux network stack. Like a subsystem which is not a network stack, a network stack basically runs as the event-driven way that reacts when the event occurs. Therefore, there is no separated thread to execute the stack. Figure 1 and Figure 3 showed the simplified diagrams of control flow. Figure 4 below illustrates more exact control flow. Figure 4: Control Flow in the Stack. At Flow (1) in Figure 4, an application calls a system call to execute (use) the TCP. For example, calls the read system call and the write system call and then executes TCP. However, there is no packet transmission. Flow (2) is same as Flow (1) if it requires packet transmission after executing TCP. It creates a packet and sends down the packet to the driver. A queue is in front of the driver. The packet comes into the queue first, and then the queue implementation structure decides the time to send the packet to the driver. This is queue discipline (qdisc) of Linux. The function of Linux traffic control is to manipulate the qdisc. The default qdisc is a simple First-In-First-Out (FIFO) queue. By using another qdisc, operators can achieve various effects such as artificial packet loss, packet delay, transmission rate limit, etc. At Flow (1) and Flow (2), the process thread of the application also executes the driver. Flow (3) shows the case in which the timer used by the TCP has expired. For example, when the TIME_WAITtimer has expired, the TCP is called to delete the connection. Like Flow (3), Flow (4) is the case in which the timer used by the TCP has expired and the TCP execution result packet should be transmitted. For example, when the retransmit timer has expired, the packet of which ACK has not been received is transmitted. Flow (3) and Flow (4) show the procedure of executing the timer softirq that has processed the timer interrupt. When the NIC driver receives an interrupt, it frees the transmitted packet. In most cases, execution of the driver is terminated here. Flow (5) is the case of packet accumulation in the transmit queue. The driver requests softirq and the softirq handler executes the transmit queue to send the accumulated packet to the driver. When the NIC driver receives an interrupt and finds a newly received packet, it requests softirq. The softirq that processes the received packet calls the driver and transmits the received packet to the upper layer. In Linux, processing the received packet as shown above is called New API (NAPI). It is similar to polling because the driver does not directly transmit the packet to the upper layer, but the upper layer directly gets the packet. The actual code is called NAPI poll or poll. Flow (6) shows the case that completes execution of TCP, and Flow (7) shows the case that requires additional packet transmission. All of Flow (5), (6), and (7) are executed by the softirq which has processed the NIC interrupt. How to Process Interrupt and Received Packet Interrupt processing is complex; however, you need to understand the performance issue related to processing of packets received. Figure 5 shows the procedure of processing an interrupt. Figure 5: Processing Interrupt, softirq, and Received Packet. Assume that the CPU 0 is executing an application program (user program). At this time, the NIC receives a packet and generates an interrupt for the CPU 0. Then the CPU executes the kernel interrupt (called irq) handler. This handler refers to the interrupt number and then calls the driver interrupt handler. The driver frees the packet transmitted and then calls the napi_schedule() function to process the received packet. This function requests the softirq (software interrupt). After execution of the driver interrupt handler has been terminated, the control is passed to the kernel handler. The kernel handler executes the interrupt handler for the softirq. After the interrupt context has been executed, the softirq context will be executed. The interrupt context and the softirq context are executed by an identical thread. However, they use different stacks. And, the interrupt context blocks hardware interrupts; however, the softirq context allows for hardware interrupts. The softirq handler that processes the received packet is the net_rx_action() function. This function calls thepoll() function of the driver. The poll() function calls the netif_receive_skb() function and then sends the received packets one by one to the upper layer. After processing the softirq, the application restarts execution from the stopped point in order to request a system call. Therefore, the CPU that has received the interrupt processes the received packets from the first to the last. In Linux, BSD, and Microsoft Windows, the processing procedure is basically the same on this wise. When you check the server CPU utilization, sometimes you can check that only one CPU executes the softirq hard among the server CPUs. The phenomenon occurs due to the way of processing received packets explained so far. To solve the problem, multi-queue NIC, RSS, and RPS have been developed. Data Structure The followings are some key data structures. Take a look at them and review the code. sk_buff structure First, there is the sk_buff structure or skb structure that means a packet. Figure 6 shows some of the sk_buffstructure. As the functions have been advanced, they get more complicated. However, the basic functions are very common that anyone can think. Figure 6: Packet Structure sk_buff. Including Packet Data and meta data The structure directly includes the packet data or refers to it by using a pointer. In Figure 6, some of the packets (from Ethernet to buffer) refer to using the data pointer and the additional data (frags) refer to the actual page. The necessary information such as header and payload length is saved in the meta data area. For example, inFigure 6, the mac_header, the network_header, and the transport_header have the corresponding pointer data that points the starting position of the Ethernet header, IP header and TCP header, respectively. This way makes TCP protocol processing easy. How to Add or Delete a Header The header is added or deleted as up and down each layer of the network stack. Pointers are used for more efficient processing. For example, to remove the Ethernet header, just increase the head pointer. How to Combine and Divide Packet The linked list is used for efficient execution of tasks such as adding or deleting packet payload data to the socket buffer, or packet chain. The next pointer and the prev pointer are used for this purpose. Quick Allocation and Free As a structure is allocated whenever creating a packet, the quick allocator is used. For example, if data is transmitted at the speed of 10-Gigabit Ethernet, more than one million packets per second must be created and deleted. TCP Control Block Second, there is a structure that represents the TCP connection. Previously, it was abstractly called a TCP control block. Linux uses tcp_sock for the structure. In Figure 7, you can see the relationship among the file, the socket, and the tcp_sock. Figure 7: TCP Connection Structure. When a system call has occurred, it searches the file in the file descriptor used by the application that has called the system call. For the Unix-series OS, the socket, the file and the device for general file system for storage are abstracted to a file. Therefore, the file structure includes the least information. For a socket, a separate socket structure saves the socket-related information and the file refers to the socket as a pointer. The socket refers to the tcp_sock again. The tcp_sock is classified into sock, inet_sock, etc to support various protocols except TCP. It may be considered as a kind of polymorphism. All status information used by the TCP protocol is saved in the tcp_sock. For example, the sequence number, receive window, congestion control, and retransmit timer are saved in the tcp_sock. The send socket buffer and the receive socket buffer are the sk_buff lists and they include the tcp_sock. The dst_entry, the IP routing result, is referred to in order to avoid too frequent routing. The dst_entry allows for easy search of the ARP result, i.e., the destination MAC address. The dst_entry is part of the routing table. The structure of the routing table is very complex that it will not be discussed in this document. The NIC to be used for packet transmission is searched by using the dst_entry. The NIC is expressed as the net_device structure. Therefore, by searching just the file, it is very easy to find all structures (from the file to the driver) required to process the TCP connection with the pointer. The size of the structures is the memory size used by one TCP connection. The memory size is a few KBs (excluding the packet data). As more functions have been added, the memory usage has been gradually increased. Finally, let's see the TCP connection lookup table. It is a hash table used to search the TCP connection where the received packet belongs. The hash value is calculated by using the input data of of the packet and the Jenkins hash algorithm. It is told that the hash function has been selected by considering defense against attacks to the hash table. Following Code: How to Transmit Data We will check the key tasks performed by the stack by following the actual Linux kernel source code. Here, we will observe two paths which are frequently used. First, this is a path used to transmit data when an application calls the write system call. SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, ...) { struct file *file; [...] file = fget_light(fd, &fput_needed); [...] ===> ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos); struct file_operations { [...] ssize_t (*aio_read) (struct kiocb *, const struct iovec *, ...) ssize_t (*aio_write) (struct kiocb *, const struct iovec *, ...) [...] }; static const struct file_operations socket_file_ops = { [...] .aio_read = sock_aio_read, .aio_write = sock_aio_write, [...] }; When the application calls the write system call, the kernel performs the write() function of the file layer. First, the actual file structure of the file descriptor fd is fetched. And then the aio_write is called. This is the function pointer. In the file structure, you will see the file_operations structure pointer. The structure is generally called function table and includes the function pointers such as aio_read and aio_write. The actual table for the socket is socket_file_ops. The aio_write function used by the socket is sock_aio_write. The function table is used for the purpose that is similar to the Java interface. It is generally used for the kernel to perform code abstraction or refactoring. static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov, ..) { [...] struct socket *sock = file->private_data; [...] ===> return sock->ops->sendmsg(iocb, sock, msg, size); struct socket { [...] struct file *file; struct sock *sk; const struct proto_ops *ops; }; const struct proto_ops inet_stream_ops = { .family = PF_INET, [...] .connect = inet_stream_connect, .accept = inet_accept, .listen = inet_listen, .sendmsg = tcp_sendmsg, .recvmsg = inet_recvmsg, [...] }; struct proto_ops { [...] int (*connect) (struct socket *sock, ...) int (*accept) (struct socket *sock, ...) int (*listen) (struct socket *sock, int len); int (*sendmsg) (struct kiocb *iocb, struct socket *sock, ...) int (*recvmsg) (struct kiocb *iocb, struct socket *sock, ...) [...] }; The sock_aio_write() function gets the socket structure from the file and then calls sendmsg. It is also the function pointer. The socket structure includes the proto_ops function table. The proto_ops implemented by the IPv4 TCP is inet_stream_ops and the sendmsg is implemented by tcp_sendmsg. int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size) { struct sock *sk = sock->sk; struct iovec *iov; struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; [...] mss_now = tcp_send_mss(sk, &size_goal, flags); /* Ok commence sending. */ iovlen = msg->msg_iovlen; iov = msg->msg_iov; copied = 0; [...] while (--iovlen >= 0) { int seglen = iov->iov_len; unsigned char __user *from = iov->iov_base; iov++; while (seglen > 0) { int copy = 0; int max = size_goal; [...] skb = sk_stream_alloc_skb(sk, select_size(sk, sg), sk->sk_allocation); if (!skb) goto wait_for_memory; /* * Check whether we can use HW checksum. */ if (sk->sk_route_caps & NETIF_F_ALL_CSUM) skb->ip_summed = CHECKSUM_PARTIAL; [...] skb_entail(sk, skb); [...] /* Where to copy to? */ if (skb_tailroom(skb) > 0) { /* We have some space in skb head. Superb! */ if (copy > skb_tailroom(skb)) copy = skb_tailroom(skb); if ((err = skb_add_data(skb, from, copy)) != 0) goto do_fault; [...] if (copied) tcp_push(sk, flags, mss_now, tp->nonagle); [...] } tcp_sengmsg gets tcp_sock (i.e.,TCP control block) from the socket and copies the data that the application has requested to transmit to the send socket buffer. When copying data to sk_buff, how many bytes will one sk_buff include? One sk_buff copies and includes MSS (tcp_send_mss) bytes to help the code that actually creates packets. Maximum Segment Size (MSS) stands for the maximum payload size that one TCP packet includes. By using TSO and GSO, one sk_buff can save more data than MSS. This will be discussed later, not in this document. The sk_stream_alloc_skb function creates a new sk_buff, and skb_entail adds the new sk_buff to the tail of the send_socket_buffer. The skb_add_data function copies the actual application data to the data buffer of thesk_buff. All the data is copied by repeating the procedure (creating an sk_buff and adding it to the send socket buffer) several times. Therefore, sk_buffs at the size of the MSS are in the send socket buffer as a list. Finally, the tcp_push is called to make the data which can be transmitted now as a packet, and the packet is sent. static inline void tcp_push(struct sock *sk, int flags, int mss_now, ...) [...] ===> static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, ...) int nonagle, { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; [...] while ((skb = tcp_send_head(sk))) { [...] cwnd_quota = tcp_cwnd_test(tp, skb); if (!cwnd_quota) break; if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) break; [...] if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) break; /* Advance the send_head. This one is sent out. * This call will increment packets_out. */ tcp_event_new_data_sent(sk, skb); [...] The tcp_push function transmits as many of the sk_buffs in the send socket buffer as the TCP allows in sequence. First, the tcp_send_head is called to get the first sk_buff in the socket buffer and thetcp_cwnd_test and the tcp_snd_wnd_test are performed to check whether the congestion window and the receive window of the receiving TCP allow new packets to be transmitted. Then, the tcp_transmit_skb function is called to create a packet. static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, gfp_t gfp_mask) { const struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet; struct tcp_sock *tp; [...] if (likely(clone_it)) { if (unlikely(skb_cloned(skb))) skb = pskb_copy(skb, gfp_mask); else skb = skb_clone(skb, gfp_mask); if (unlikely(!skb)) return -ENOBUFS; } [...] skb_push(skb, tcp_header_size); skb_reset_transport_header(skb); skb_set_owner_w(skb, sk); /* Build TCP header and checksum it. */ th = tcp_hdr(skb); th->source = inet->inet_sport; th->dest = inet->inet_dport; th->seq = htonl(tcb->seq); th->ack_seq = htonl(tp->rcv_nxt); [...] icsk->icsk_af_ops->send_check(sk, skb); [...] err = icsk->icsk_af_ops->queue_xmit(skb); if (likely(err <= 0)) return err; tcp_enter_cwr(sk, 1); return net_xmit_eval(err); } tcp_transmit_skb creates the copy of the given sk_buff (pskb_copy). At this time, it does not copy the entire data of the application but the metadata. And then it calls skb_push to secure the header area and records the header field value. Send_check computes the TCP checksum. With the checksum offload, the payload data is not computed. Finally, queue_xmit is called to send the packet to the IP layer. Queue_xmit for IPv4 is implemented by the ip_queue_xmit function. int ip_queue_xmit(struct sk_buff *skb) [...] rt = (struct rtable *)__sk_dst_check(sk, 0); [...] /* OK, we know where to send it, allocate and build IP header. */ skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0)); skb_reset_network_header(skb); iph = ip_hdr(skb); *((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff)); if (ip_dont_fragment(sk, &rt->dst) && !skb->local_df) iph->frag_off = htons(IP_DF); else iph->frag_off = 0; iph->ttl = ip_select_ttl(inet, &rt->dst); iph->protocol = sk->sk_protocol; iph->saddr = rt->rt_src; iph->daddr = rt->rt_dst; [...] res = ip_local_out(skb); [...] ===> int __ip_local_out(struct sk_buff *skb) [...] ip_send_check(iph); return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, skb, NULL, skb_dst(skb)->dev, dst_output); [...] ===> int ip_output(struct sk_buff *skb) { struct net_device *dev = skb_dst(skb)->dev; [...] skb->dev = dev; skb->protocol = htons(ETH_P_IP); return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, skb, NULL, dev, ip_finish_output, [...] ===> static int ip_finish_output(struct sk_buff *skb) [...] if (skb->len > ip_skb_dst_mtu(skb) && !skb_is_gso(skb)) return ip_fragment(skb, ip_finish_output2); else return ip_finish_output2(skb); The ip_queue_xmit function executes tasks required by the IP layers. __sk_dst_check checks whether the cached route is valid. If there is no cached route or the cached route is invalid, it performs IP routing. And then it calls skb_push to secure the IP header area and records the IP header field value. After that, as following the function call, ip_send_check computes the IP header checksum and calls the netfilter function. IP fragment is created when ip_finish_output function needs IP fragmentation. No fragmentation is generated when TCP is used. Therefore, ip_finish_output2 is called and it adds the Ethernet header. Finally, a packet is completed. int dev_queue_xmit(struct sk_buff *skb) [...] ===> static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, ...) [...] if (...) { .... } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && qdisc_run_begin(q)) { [...] if (sch_direct_xmit(skb, q, dev, txq, root_lock)) { [...] ===> int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, ...) [...] HARD_TX_LOCK(dev, txq, smp_processor_id()); if (!netif_tx_queue_frozen_or_stopped(txq)) ret = dev_hard_start_xmit(skb, dev, txq); HARD_TX_UNLOCK(dev, txq); [...] } int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, ...) [...] if (!list_empty(&ptype_all)) dev_queue_xmit_nit(skb, dev); [...] rc = ops->ndo_start_xmit(skb, dev); [...] } The completed packet is transmitted through the dev_queue_xmit function. First, the packet passes via the qdisc. If the default qdisc is used and the queue is empty, the sch_direct_xmit function is called to directly send down the packet to the driver, skipping the queue. Dev_hard_start_xmit function calls the actual driver. Before calling the driver, the device TX is locked first. This is to prevent several threads from accessing the device simultaneously. As the kernel locks the device TX, the driver transmission code does not need an additional lock. It is closely related to the parallel processing that will be discussed next time. Ndo_start_xmit function calls the driver code. Just before, you will see ptype_all and dev_queue_xmit_nit. The ptype_all is a list that includes the modules such as packet capture. If a capture program is running, the packet is copied by ptype_all to the separate program. Therefore, the packet that tcpdump shows is the packet transmitted to the driver. When checksum offload or TSO is used, the NIC manipulates the packet. So the tcpdump packet is different from the packet transmitted to the network line. After completing packet transmission, the driver interrupt handler returns the sk_buff. Following Code: How to Receive Data The general executed path is to receive a packet and then to add the data to the receive socket buffer. After executing the driver interrupt handler, follow the napi poll handle first. static void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = &__get_cpu_var(softnet_data); unsigned long time_limit = jiffies + 2; int budget = netdev_budget; void *have; local_irq_disable(); while (!list_empty(&sd->poll_list)) { struct napi_struct *n; [...] n = list_first_entry(&sd->poll_list, struct napi_struct, poll_list); if (test_bit(NAPI_STATE_SCHED, &n->state)) { work = n->poll(n, weight); trace_napi_poll(n); } [...] } int netif_receive_skb(struct sk_buff *skb) [...] ===> static int __netif_receive_skb(struct sk_buff *skb) { struct packet_type *ptype, *pt_prev; [...] __be16 type; [...] list_for_each_entry_rcu(ptype, &ptype_all, list) { if (!ptype->dev || ptype->dev == skb->dev) { if (pt_prev) ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = ptype; } } [...] type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) { if (ptype->type == type && (ptype->dev == null_or_dev || ptype->dev == skb->dev || ptype->dev == orig_dev)) { if (pt_prev) ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = ptype; } } if (pt_prev) { ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); static struct packet_type ip_packet_type __read_mostly = { .type = cpu_to_be16(ETH_P_IP), .func = ip_rcv, [...] }; As mentioned before, the net_rx_action function is the softirq handler that receives a packet. First, the driver that has requested the napi poll is retrieved from the poll_list and the poll handler of the driver is called. The driver wraps the received packet with sk_buff and then calls netif_receive_skb. When there is a module that requests all packets, the netif_receive_skb sends packets to the module. Like packet transmission, the packets are transmitted to the module registered to the ptype_all list. The packets are captured here. Then, the packets are transmitted to the upper layer based on the packet type. The Ethernet packet includes 2-byte ethertype field in the header. The value indicates the packet type. The driver records the value in sk_buff(skb->protocol). Each protocol has its own packet_type structure and registers the pointer of the structure to the ptype_base hash table. IPv4 uses ip_packet_type. The Type field value is the IPv4 ethertype (ETH_P_IP) value. Therefore, the IPv4 packet calls the ip_rcv function. int ip_rcv(struct sk_buff *skb, struct net_device *dev, ...) { struct iphdr *iph; u32 len; [...] iph = ip_hdr(skb); [...] if (iph->ihl < 5 || iph->version != 4) goto inhdr_error; if (!pskb_may_pull(skb, iph->ihl*4)) goto inhdr_error; iph = ip_hdr(skb); if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) goto inhdr_error; len = ntohs(iph->tot_len); if (skb->len < len) { IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INTRUNCATEDPKTS); goto drop; } else if (len < (iph->ihl*4)) goto inhdr_error; [...] return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL, ip_rcv_finish); [...] ===> int ip_local_deliver(struct sk_buff *skb) [...] if (ip_hdr(skb)->frag_off & htons(IP_MF | IP_OFFSET)) { if (ip_defrag(skb, IP_DEFRAG_LOCAL_DELIVER)) return 0; } return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN, skb, skb->dev, NULL, ip_local_deliver_finish); [...] ===> static int ip_local_deliver_finish(struct sk_buff *skb) [...] __skb_pull(skb, ip_hdrlen(skb)); [...] int protocol = ip_hdr(skb)->protocol; int hash, raw; const struct net_protocol *ipprot; [...] hash = protocol & (MAX_INET_PROTOS - 1); ipprot = rcu_dereference(inet_protos[hash]); if (ipprot != NULL) { [...] ret = ipprot->handler(skb); [...] ===> static const struct net_protocol tcp_protocol = { .handler = tcp_v4_rcv, [...] }; The ip_rcv function executes tasks required by the IP layers. It examines packets such as the length and header checksum. After passing through the netfilter code, it performs the ip_local_deliver function. If required, it assembles IP fragments. Then, it calls ip_local_deliver_finish through the netfilter code. Theip_local_deliver_finish function removes the IP header by using the __skb_pull and then searches the upper protocol whose value is identical to the IP header protocol value. Similar to the Ptype_base, each transport protocol registers its own net_protocol structure in inet_protos. IPv4 TCP uses tcp_protocol and callstcp_v4_rcv that has been registered as a handler. When packets come into the TCP layer, the packet processing flow varies depending on the TCP status and the packet type. Here, we will see the packet processing procedure when the expected next data packet has been received in the ESTABLISHED status of the TCP connection. This path is frequently executed by the server receiving data when there is no packet loss or out-of-order delivery. int tcp_v4_rcv(struct sk_buff *skb) { const struct iphdr *iph; struct tcphdr *th; struct sock *sk; [...] th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) goto bad_packet; if (!pskb_may_pull(skb, th->doff * 4)) goto discard_it; [...] th = tcp_hdr(skb); iph = ip_hdr(skb); TCP_SKB_CB(skb)->seq = ntohl(th->seq); TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + skb->len - th->doff * 4); TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); TCP_SKB_CB(skb)->when = 0; TCP_SKB_CB(skb)->flags = iph->tos; TCP_SKB_CB(skb)->sacked = 0; sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest); [...] ret = tcp_v4_do_rcv(sk, skb); First, the tcp_v4_rcv function validates the received packets. When the header size is larger than the data offset (th->doff < sizeof(struct tcphdr) / 4), it is the header error. And then __inet_lookup_skb is called to look for the connection where the packet belongs from the TCP connection hash table. From the sock structure found, all required structures such as tcp_sock and socket can be got. int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) [...] if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */ sock_rps_save_rxhash(sk, skb->rxhash); if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len)) { [...] ===> int tcp_rcv_established(struct sock *sk, struct sk_buff *skb, [...] /* * Header prediction. */ if ((tcp_flag_word(th) & TCP_HP_BITS) == tp->pred_flags && TCP_SKB_CB(skb)->seq == tp->rcv_nxt && !after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))) { [...] if ((int)skb->truesize > sk->sk_forward_alloc) goto step5; NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITS); /* Bulk data transfer: receiver */ __skb_pull(skb, tcp_header_len); __skb_queue_tail(&sk->sk_receive_queue, skb); skb_set_owner_r(skb, sk); tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; [...] if (!copied_early || tp->rcv_nxt != tp->rcv_wup) __tcp_ack_snd_check(sk, 0); [...] step5: if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0) goto discard; tcp_rcv_rtt_measure_ts(sk, skb); /* Process urgent data. */ tcp_urg(sk, skb, th); /* step 7: process the segment text */ tcp_data_queue(sk, skb); tcp_data_snd_check(sk); tcp_ack_snd_check(sk); return 0; [...] } The actual protocol is executed from the tcp_v4_do_rcv function. If the TCP is in the ESTABLISHED status,tcp_rcv_esablished is called. Processing of the ESTABLISHED status is separately handled and optimized since it is the most common status. The tcp_rcv_established first executes the header prediction code. The header prediction is also quickly processed to detect in the common state. The common case here is that there is no data to transmit and the received data packet is the packet that must be received next time, i.e., the sequence number is the sequence number that the receiving TCP expects. In this case, the procedure is completed by adding the data to the socket buffer and then transmitting ACK. Go forward and you will see the sentence comparing truesize with sk_forward_alloc. It is to check whether there is any free space in the receive socket buffer to add new packet data. If there is, header prediction is "hit" (prediction succeeded). Then __skb_pull is called to remove the TCP header. After that, __skb_queue_tail is called to add the packet to the receive socket buffer. Finally, __tcp_ack_snd_check is called for transmitting ACK if necessary. In this way, packet processing is completed. If there is not enough free space, a slow path is executed. The tcp_data_queue function newly allocates the buffer space and adds the data packet to the socket buffer. At this time, the receive socket buffer size is automatically increased if possible. Different from the quick path, tcp_data_snd_check is called to transmit a new data packet if possible. Finally, tcp_ack_snd_check is called to create and transmit the ACK packet if necessary. The amount of code executed by the two paths is not much. This is accomplished by optimizing the common case. In other words, it means that the uncommon case will be processed significantly more slowly. The out-of-order delivery is one of the uncommon cases. How to Communicate between Driver and NIC Communication between a driver and the NIC is the bottom of the stack and most people do not care about it. However, the NIC is executing more and more tasks to solve the performance issue. Understanding the basic operation scheme will help you understand the additional technology. A driver and the NIC asynchronously communicate. First, a driver requests packet transmission (call) and the CPU performs another task without waiting for the response. And then the NIC sends packets and notifies the CPU of that, the driver returns the received packets (returns the result). Like packet transmission, packet receiving is asynchronous. First, a driver requests packet receiving and the CPU performs another task (call). Then, the NIC receives packets and notifies the CPU of that, and the driver processes the received packets received (returns the result). Therefore, a space to save the request and the response is necessary. In most cases, the NIC uses the ring structure. The ring is similar to the common queue structure. With the fixed number of entries, one entry saves one request data or one response data. The entries are sequentially used in turn. The name "ring" is generally used since the fixed entries are reused in turn. As following the packet transmission procedure shown in the following Figure 8, you will see how the ring is used. Figure 8: Driver-NIC Communication: How to Transmit Packet. The driver receives packets from the upper layer and creates the send descriptor that the NIC can understand. The send descriptor includes the packet size and the memory address by default. As the NIC needs the physical address to access the memory, the driver should change the virtual address of the packets to the physical address. Then, it adds the send descriptor to the TX ring (1). The TX ring is the send descriptor ring. Next, it notifies the NIC of the new request (2). The driver directly writes the data to a specific NIC memory address. In this way, Programmed I/O (PIO) is the data transmission method in which the CPU directly sends data to the device. The notified NIC gets the send descriptor of the TX ring from the host memory (3). Since the device directly accesses the memory without intervention of the CPU, the access is called Direct Memory Access (DMA). After getting the send descriptor, the NIC determines the packet address and the size and then gets the actual packets from the host memory (4). With the checksum offload, the NIC computes the checksum when the NIC gets the packet data from the memory. Therefore, overhead rarely occurs. The NIC sends packets (5) and then writes the number of packets that are sent to the host memory (6). Then, it sends an interrupt (7). The driver reads the number of packets that are sent and then returns the packets that have been sent so far. In the following Figure 9, you will see the procedure of receiving packets. Figure 9: Driver-NIC Communication: How to Receive Packets. First, the driver allocates the host memory buffer for receiving packets and then creates the receive descriptor. The receive descriptor includes the buffer size and the memory address by default. Like the send descriptor, it saves the physical address that the DMA uses in the receive descriptor. Then, it adds the receive descriptor to the RX ring (1). It is the receive request and the RX ring is the receive request ring. Through the PIO, the driver notifies that there is a new descriptor in the NIC (2). The NIC gets the new descriptor of the RX ring. And then it saves the size and location of the buffer included in the descriptor to the NIC memory (3). After the packets have been received (4), the NIC sends the packets to the host memory buffer (5). If the checksum offload function is existing, the NIC computes the checksum at this time. The actual size of received packets, the checksum result, and any other information are saved in the separate ring (the receive return ring) (6). The receive return ring saves the result of processing the receive request, i.e., the response. And then the NIC sends an interrupt (7). The driver gets packet information from the receive return ring and processes the received packets. If necessary, it allocates new memory buffer and repeats Step (1) and Step (2). To tune the stack, most people say that the ring and interrupt setting should be adjusted. When the TX ring is large, a lot of send requests can be made at once. When the RX ring is large, a lot of packet receives can be done at once. A large ring is useful for the workload that has a huge burst of packet transmission/receiving. In most cases, the NIC uses a timer to reduce the number of interrupts since the CPU may suffer from large overhead to process interrupts. To avoid flooding the host system with too many interrupts, interrupts are collected and sent regularly(interrupt coalescing) while sending and receiving the packets. Stack Buffer and Flow Control Flow control is executed in several stages in the stack. Figure 10 shows buffers used to transmit data. First, an application creates data and adds it to the send socket buffer. If there is no free space in the buffer, the system call is failed or the blocking occurs in the application thread. Therefore, the application data rate flowing into the kernel must be controlled by using the socket buffer size limit. Figure 10: Buffers Related to Packet Transmission. The TCP creates and sends packets to the driver through the transmit queue (qdisc). It is a typical FIFO queue type and the maximum length of the queue is the value of txqueuelen which can be checked by executing the ifconfig command. Generally, it is thousands of packets. The TX ring is between the driver and the NIC. As mentioned before, it is considered as a transmission request queue. If there is no free space in the queue, no transmission request is made and the packets are accumulated in the transmit queue. If too many packets are accumulated, packets are dropped. The NIC saves the packets to transmit in the internal buffer. The packet rate from this buffer is affected by the physical rate (ex: 1 Gb/s NIC cannot offer performance of 10 Gb/s). And with the Ethernet flow control, packet transmission is stopped if there is no free space in the receive NIC buffer. When the packet rate from the kernel is faster than the packet rate from the NIC, packets are accumulated in the buffer of the NIC. If there is no free space in the buffer, processing of transmission request from the TX ring is stopped. More and more requests are accumulated in the TX ring and finally there is no free space in the queue. The driver cannot make any transmission request and the packets are accumulated in the transmit queue. Like this, backpressure is sent from the bottom to the top through many buffers. Figure 11 shows the buffers that the receive packets are passing. The packets are saved in the receive buffer of the NIC. From the view of flow control, the RX ring between the driver and the NIC is considered as a packet buffer. The driver gets packets coming into the RX ring and then sends them to the upper layer. There is no buffer between the driver and the upper layer since the NIC driver that is used by the server system uses NAPI by default. Therefore, it can be considered as the upper layer directly gets packets from the RX ring. The payload data of packets is saved in the receive socket buffer. The application gets the data from the socket buffer later. Figure 11: Buffers Related to Packet Receiving. The driver that does not support NAPI saves packets in the backlog queue. Later, the NAPI handler gets packets. Therefore, the backlog queue can be considered as a buffer between the upper layer and the driver. If the packet processing rate of the kernel is slower than the packet flow rate into the NIC, the RX ring space is full. And the space of the buffer in the NIC is full, too. When the Ethernet flow control is used, the NIC sends a request to stop transmission to the transmission NIC or makes the packet drop. There is no packet drop due to lack of space in the receive socket buffer because the TCP supports end-to-end flow control. However, packet drop occurs due to lack of space in the socket buffer when the application rate is slow because the UDP does not support flow control. The sizes of the TX ring and the RX ring used by the driver in Figure 10 and Figure 11 are the sizes of the rings shown by the ethtool. For most workloads which regard throughput as important, it will be helpful to increase the ring size and the socket buffer size. Increasing the sizes reduces the possibility of failures caused by lack of space in the buffer while receiving and transmitting a lot of packets at a fast rate. Conclusion Initially, I planned to explain only the things that would be helpful for you to develop network programs, execute performance tests, and perform troubleshooting. In spite of my initial plan, the amount of description included in this document is not small. I hope this document will help you to develop network applications and monitor their performance. The TCP/IP protocol itself is very complicated and has many exceptions. However, you don't need to understand every line of TCP/IP-related code of the OS to understand performance and analyze the phenomena. Just understanding its context will be very helpful for you. With continuous advancement of system performance and implementation of the OS network stack, the latest server can offer 10-20 Gb/s TCP throughput without any problem. These days, there are too many technology types related to performance, such as TSO, LRO, RSS, GSO, GRO, UFO, XPS, IOAT, DDIO, and TOE, just like alphabet soup, to make us confused. In the next article, I will explain about the network stack from the performance perspective and discuss the problems and effects of this technology. By Hyeongyeop Kim, Senior Engineer at Performance Engineering Lab, NHN Corporation.
February 27, 2013
by Esen Sagynov
· 13,679 Views · 1 Like
article thumbnail
High Availability Load Balancing with HAProxy on FreeBSD
To achieve high availability load balancing with HAProxy on FreeBSD you can use a CARP to setup backup node and using that configuration to avoid SPOF.
January 18, 2013
by Alen Komljen
· 20,001 Views
article thumbnail
Optimization in R
Optimization is a very common problem in data analytics. Given a set of variables (which one has control), how to pick the right value such that the benefit is maximized. More formally, optimization is about determining a set of variables x1, x2, … that maximize or minimize an objective function f(x1, x2, …). Unconstrained optimization In an unconstraint case, the variable can be freely selected within its full range. A typical solution is to compute the gradient vector of the objective function [∂f/∂x1, ∂f/∂x2, …] and set it to [0, 0, …]. Solve this equation and output the result x1, x2 … which will give the local maximum. In R, this can be done by a numerical analysis method. > f <- function(x){(x[1] - 5)^2 + (x[2] - 6)^2} > initial_x <- c(10, 11) > x_optimal <- optim(initial_x, f, method="CG") > x_min <- x_optimal$par > x_min [1] 5 6 Equality constraint optimization Moving onto the constrained case, lets say x1, x2 … are not independent and then have to related to each other in some particular way: g1(x1, x2, …) = 0, g2(x1, x2, …) = 0. The optimization problem can be expressed as … Maximize objective function: f(x1, x2, …) Subjected to equality constraints: g1(x1, x2, …) = 0 g2(x1, x2, …) = 0 A typical solution is to turn the constraint optimization problem into an unconstrained optimization problem using Lagrange multipliers. Define a new function F as follows ... F(x1, x2, …, λ1, λ2, …) = f(x1, x2, …) + λ1.g1(x1, x2, …) + λ2.g2(x1, x2, …) + … Then solve for ... [∂F/∂x1, ∂F/∂x2, …, ∂F/∂λ1, ∂F/∂λ2, …] = [0, 0, ….] Inequality constraint optimization In this case, the constraint is inequality. We cannot use the Lagrange multiplier technique because it requires equality constraint. There is no general solution for arbitrary inequality constraints. However, we can put some restriction in the form of constraint. In the following, we study two models where constraint is restricted to be a linear function of the variables: w1.x1 + w2.x2 + … >= 0 Linear Programming Linear programming is a model where both the objective function and constraint function is restricted as linear combination of variables. The Linear Programming problem can be defined as follows ... Maximize objective function: f(x1, x2) = c1.x1 + c2.x2 Subjected to inequality constraints: a11.x1 + a12.x2 <= b1 a21.x1 + a22.x2 <= b2 a31.x1 + a32.x2 <= b3 x1 >= 0, x2 >=0 As an illustrative example, lets walkthrough a portfolio investment problem. In the example, we want to find an optimal way to allocate the proportion of asset in our investment portfolio. StockA gives 5% return on average StockB gives 4% return on average StockC gives 6% return on average To set some constraints, lets say my investment in C must be less than sum of A, B. Also, investment in A cannot be more than twice of B. Finally, at least 10% of investment in each stock. To formulate this problem: Variable: x1 = % investment in A, x2 = % in B, x3 = % in C Maximize expected return: f(x1, x2, x3) = x1*5% + x2*4% + x3*6% Subjected to constraints: 10% < x1, x2, x3 < 100% x1 + x2 + x3 = 1 x3 < x1 + x2 x1 < 2 * x2 > library(lpSolve) > library(lpSolveAPI) > # Set the number of vars > model <- make.lp(0, 3) > # Define the object function: for Minimize, use -ve > set.objfn(model, c(-0.05, -0.04, -0.06)) > # Add the constraints > add.constraint(model, c(1, 1, 1), "=", 1) > add.constraint(model, c(1, 1, -1), ">", 0) > add.constraint(model, c(1, -2, 0), "<", 0) > # Set the upper and lower bounds > set.bounds(model, lower=c(0.1, 0.1, 0.1), upper=c(1, 1, 1)) > # Compute the optimized model > solve(model) [1] 0 > # Get the value of the optimized parameters > get.variables(model) [1] 0.3333333 0.1666667 0.5000000 > # Get the value of the objective function > get.objective(model) [1] -0.05333333 > # Get the value of the constraint > get.constraints(model) [1] 1 0 0 Quadratic Programming Quadratic programming is a model where both the objective function is a quadratic function (contains up to two term products) while constraint function is restricted as linear combination of variables. The Quadratic Programming problem can be defined as follows ... Minimize quadratic objective function: f(x1, x2) = c1.x12 + c2. x1x2 + c2.x22 - (d1. x1 + d2.x2) Subject to constraints a11.x1 + a12.x2 == b1 a21.x1 + a22.x2 == b2 a31.x1 + a32.x2 >= b3 a41.x1 + a42.x2 >= b4 a51.x1 + a52.x2 >= b5 Express the problem in Matrix form: Minimize objective ½(DTx) - dTx Subject to constraint ATx >= b First k columns of A is equality constraint As an illustrative example, lets continue on the portfolio investment problem. In the example, we want to find an optimal way to allocate the proportion of asset in our investment portfolio. StockA gives 5% return on average StockB gives 4% return on average StockC gives 6% return on average We also look into the variance of each stock (known as risk) as well as the covariance between stocks. For investment, we not only want to have a high expected return, but also a low variance. In other words, stocks with high return but also high variance is not very attractive. Therefore, maximize the expected return and minimize the variance is the typical investment strategy. One way to minimize variance is to pick multiple stocks (in a portfolio) to diversify the investment. Diversification happens when the stock components within the portfolio moves from their average in different direction (hence the variance is reduced). The Covariance matrix ∑ (between each pairs of stocks) is given as follows: 1%, 0.2%, 0.5% 0.2%, 0.8%, 0.6% 0.5%, 0.6%, 1.2% In this example, we want to achieve a overall return of at least 5.2% of return while minimizing the variance of return To formulate the problem: Variable: x1 = % investment in A, x2 = % in B, x3 = % in C Minimize variance: xT∑x Subjected to constraint: x1 + x2 + x3 == 1 X1*5% + x2*4% + x3*6% >= 5.2% 0% < x1, x2, x3 < 100% > library(quadprog) > mu_return_vector <- c(0.05, 0.04, 0.06) > sigma <- matrix(c(0.01, 0.002, 0.005, + 0.002, 0.008, 0.006, + 0.005, 0.006, 0.012), + nrow=3, ncol=3) > D.Matrix <- 2*sigma > d.Vector <- rep(0, 3) > A.Equality <- matrix(c(1,1,1), ncol=1) > A.Matrix <- cbind(A.Equality, mu_return_vector, diag(3)) > b.Vector <- c(1, 0.052, rep(0, 3)) > out <- solve.QP(Dmat=D.Matrix, dvec=d.Vector, Amat=A.Matrix, bvec=b.Vector, meq=1) > out$solution [1] 0.4 0.2 0.4 > out$value [1] 0.00672 > Integration with Predictive Analytics Optimization is usually integrated with predictive analytics, which is another important part of data analytics. For an overview of predictive analytics, here is my previous blog about it. The overall processing can be depicted as follows: In this diagram, we use machine learning to train a predictive model in batch mode. Once the predictive model is available, observation data is fed into it at real time and a set of output variables is predicted. Such output variable will be fed into the optimization model as the environment parameters (e.g. return of each stock, covariance ... etc.) from which a set of optimal decision variable is recommended.
January 15, 2013
by Ricky Ho
· 9,851 Views
article thumbnail
Code Coverage Tools Comparison in Sonar
for those that are not familiar with sonar , ( i hope this post will make you at least try it or see it in action at http://nemo.sonarsource.org ) you can take a look at an earlier post i’ve written some time ago. in one sentence sonar is an open source platform that allows you to track and improve the quality of your source code. one of the key aspects when talking about software quality is the test coverage or code coverage which is how much of your source code is tested by unit tests. sonar integrates with the most popular open source code coverage tools ( jacoco , cobetura , emma ) and the well-known commercial clover by attlassian. by default it uses the jacoco (java code coverage) engine and you’ll shortly find out why before we move on, i’d like to give many kudos to evgeny mandrikov . this article is inspired by one of his older post s and its intention is to present a more updated comparison of the supported code coverage tools by sonar and point out some differences regarding their results and the way they work. recently sonar changed its default code coverage tool to jacoco and this post tries to explain the reasons behind that decision. some of the information is borrowed by evgeny’s post and the image is also taken from evgeny’s presentation about jacoco . so thanks a lot evgeny! now let’s go to the meat. for the comparison you’ll see, i’ve used the latest available sonar version 3.3, maven 2.2.1, java 1.6 and all analysis launched in a windows 7 machine (intel core i3-2120 cpu @ 3.30ghz) with 8gb ram. the projects were carefully selected ( a small, medium-sized and a large one – not that large as java code base but large enough to extract some results ). i ran five analysis for each open source code coverage tool ( i excluded the commercial clover from my comparison version ) and another five by disabling the code coverage mechanism. so that’s a total of 60 analysis ). in the following tables you can find some information about the code coverage tools and some basic metrics about the selected projects. pay attention to the date of the latest stable release. emma hasn’t been updated since dinosaurs era and cobertura is almost three years inactive. one might think that this isn’t an issue if they are stable and don’t need any new release. well, the truth is that both of them have bugs that frustrate end-users and there’s no one to fix them. on the other hand jacoco is continuously evolving and improving… the results of the analysis are displayed next. some important notices. emma doesn’t support branch coverage that’s why you’re not seeing any metrics. furthermore there are differences in the results of line and branch coverage, which are more concrete for larger projects. for instance in sonar jira plugin all three tools produce the same results whereas in sonar analysis and commons lang projects you can see that the numbers are not the same. now take a look at a graph that illustrates in a more readable way which tool is the fastest. it seems that emma and jacoco need the same amount of time to compute their metrics… but… as we already mentioned there’s a huge difference. there’s no branch coverage in emma reports. cobertura is always slower than jacoco so again the winner is jacoco. of course you can get even faster results by running a sonar analysis without computing code coverage metrics one last thing: jacoco, as the following figure shows is the only tool that analyses bytecode on-the-fly which is more . cobertura and emma run an offline analysis and use a class loader whereas jacoco has its own java agent for analysis code. this configuration allows jacoco to be very flexible, possible integrated with many other tools and frameworks and can be used with any language in a jvm environment. so, to sum up, if you’re using sonar ( if you don’t , you should ), then it strongly advisable to keep the default code coverage engine ( jacoco) , unless you have really important reasons for that. finally don’t forge to check sonar’s community 2013 unofficial survey and the upcoming book about sonar by manning publications. the release date is in about 3-4 months but you can get an early access version here . as always, feel free to comment or suggest improvements about the article and its content.
December 28, 2012
by Patroklos Papapetrou
· 109,507 Views · 1 Like
article thumbnail
When you should and should NOT use ENUM data type
ENUM is a new enumerated data type introduced in CUBRID 9.0. Like in all programming languages, the ENUM type is a data type composed of a set of static, ordered values. Users can define numeric and string values for ENUM columns. Working with ENUM types Creating an ENUM column is done by specifying a static list of possible values: CREATE TABLE person( name VARCHAR(255), gender ENUM('Male', 'Female') ); CUBRID understands the ENUM type as an ordered set of constants which, in the above example, is a set of {NULL: NULL, 1: 'Male', 2: 'Female”}. To assign a value to the gender column, users may either use the index of the value ({NULL, 1, 2}) or the actual constant literal ({NULL}, {'Male'}, {'Female'}). CUBRID restricts the values that can be assigned to this column to only values from this set + NULL. Moreover, ENUM column is case-sensitive, i.e. it will raise an error if you try to enter 'female' in lower case. Also, an empty string is allowed if it is defined as one of the elements of the ENUM column. In our examples, it is not allowed. csql> INSERT INTO person(name, gender) VALUES('Eugene', 'Male'); 1 row affected. 1 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('Anne', 2); 1 row affected. 1 command(s) successfully processed. csql> SELECT * FROM person; === === name gender ============================================ 'Anne' 'Female' 'Eugene' 'Male' 2 rows selected. Any attempt to insert a value outside of the defined set will result in a coercion error. In the below case, trying to insert an empty string raises an error because it is not in the set of allowed values defined in the person table. csql> INSERT INTO person(name, gender) VALUES('John', 'N/A'); IN line 1, COLUMN 44, ERROR: before ' ); ' Cannot coerce 'N/A' TO type enum. 0 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('John', 4); IN line 1, COLUMN 45, ERROR: before ' ); ' Cannot coerce 4 TO type enum. 0 command(s) successfully processed. csql> INSERT INTO person(name, gender) VALUES('John', ''); IN line 1, COLUMN 44, ERROR: before ' ); ' Cannot coerce '' TO type enum. 0 command(s) successfully processed. Why you should use the ENUM type There are three important reasons for which you should consider using the ENUM type: Reduce storage space. Reduce join complexity. Create cheap values constraints. Storage Space CUBRID uses only 1 byte per tuple when 255 or less ENUM elements are defined or 2 bytes for 256~65535 elements. This is because, rather that storing the constant literal of the value, CUBRID stores the index in the ordered set of that value. For very large tables, this might prove to be a significant storage space save. Take, for example, a table with 1,000,000,000 records which has an ENUM column defined as ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'). If you use a VARCHAR type instead of the ENUM type to store these values, the column would require anywhere between 5GB and 9GB of storage space. Using the ENUM type, you can reduce the required space to 2 bytes per tuple, adding up to a total of 2GB. Reduce join complexity JOIN way The same effect of the ENUM type can be achieved by creating a one to many relationship on two or more tables. Considering the example above, you can store values for days of the week like this: CREATE TABLE days_of_week( id SHORT PRIMARY KEY, name VARCHAR(9) ); CREATE TABLE opening_hours( week_day SHORT, opening_time TIME, closing_time TIME, FOREIGN KEY fk_dow (week_day) REFERENCES days_of_week(id) ); Then, when you wish to display the name of the week day, you would execute a query like: SELECT d.name day_name, o.opening_time, o.closing_time FROM days_of_week d, opening_hours o WHERE d.id = o.week_day ORDER BY d.id; === === day_name opening_time closing_time ================================================== 'Monday' 09:00:00 AM 06:00:00 PM 'Tuesday' 09:00:00 AM 06:00:00 PM 'Wednesday' 09:00:00 AM 06:00:00 PM ... ENUM way You can achieve the same effect using an ENUM column: CREATE TABLE opening_hours( week_day ENUM ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), opening_time TIME, closing_time TIME ); And there’s no JOIN required to select opening hours: SELECT week_day, opening_time, closing_time FROM opening_hours ORDER BY week_day; === === week_day opening_time closing_time ================================================== 'Monday' 09:00:00 AM 06:00:00 PM 'Tuesday' 09:00:00 AM 06:00:00 PM 'Wednesday' 09:00:00 AM 06:00:00 PM ... This can prove to be very useful, especially if your queries join several tables. Value constraints ENUM columns behave like foreign key relationships in the sense that values from an ENUM column are restricted to the values specified in the column definition. For a short list of values, this is more efficient than creating foreign key relationships. While foreign key relationships use index scans to enforce the restriction, ENUM columns just have to go through a list of predefined values which is faster even for small indexes. Why/When you should NOT use the ENUM type Even though ENUM is a great feature, there are cases when you’d better not use it. For example: When ENUM type is not fixed When ENUM type has a long list of values When your application does not know the list of ENUM values ENUM type is not reusable Portability is a concern When ENUM type is not fixed If you’re not sure if the ENUM type holds all possible values for that column, you should consider using a one to many relationship instead. The only way in which an ENUM column can be changed to handle more values is by using an ALTER statement. This is a very expensive operation in any RDBMS and requires administrator rights. Also, ALTER statements are maintenance operations and should, as much as possible, be performed offline. When ENUM type has a long list of values ENUM types should not be used if you cannot limit a set of possible values to a few elements. When your application does not know the list of ENUM values There are only two ways of getting a list of values you have defined for an ENUM type: parsing the output of SHOW CREATE TABLE statement: csql> SHOW CREATE TABLE opening_hours; === === TABLE CREATE TABLE ============================================ 'opening_hours' 'CREATE TABLE [opening_hours] ([week_day] ENUM('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'), [opening_time] TIME, [closing_time] TIME) selecting information from CUBRID system tables: csql> SELECT d.enumeration FROM _db_domain d, _db_attribute a WHERE a.attr_name = 'week_day' AND a.class_of.class_name = 'opening_hours' AND d IN a.domains; === === enumeration ====================== {'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'} Both might require complex coding and selecting from system tables requires administrator privileges. ENUM type is not reusable If you have several tables which require the names of week days, you will have to create an ENUM type for each of them. If you create a table to hold week days names, you can join this table with whichever other table that requires this information. Portability is a concern The ENUM type is only supported by a few RDBMSs and each one has its own idea as to how ENUM type is supposed to work. Below is a list of a few notable differences between CUBRID, MySQL and PostgreSQL: CUBRID PostgreSQL MySQL Inserting out of range value Throws error Throws error Inserts special value index 0 Comparing to char literals Compare as strings Compare as ENUM elements Compare as strings Comparing to values outside of the ENUM domain Compare as strings Throws error Compare as strings These subtle differences will most probably break your application in interesting and hard to understand ways. If you’re migrating from PostgreSQL to CUBRID for example, and you expect comparisons with char literals to be performed as ENUM comparisons, you’ll have a hard time understanding why your query returns weird results.
December 19, 2012
by Esen Sagynov
· 60,330 Views
article thumbnail
How to Monitor Java Garbage Collection
This is the second article in the series of "Become a Java GC Expert". In the first issue Understanding Java Garbage Collection we have learned about the processes for different GC algorithms, about how GC works, what Young and Old Generation is, what you should know about the 5 types of GC in the new JDK 7, and what the performance implications are for each of these GC types. In this article, I will explain how JVM is actually running Garbage Collection in the real time. What is GC Monitoring? Garbage Collection Monitoring refers to the process of figuring out how JVM is running GC. For example, we can find out: when an object in young has moved to old and by how much, or when stop-the-world has occurred and for how long. GC monitoring is carried out to see if JVM is running GC efficiently, and to check if additional GC tuning is necessary. Based on this information, the application can be edited or GC method can be changed (GC tuning). How to Monitor GC? There are different ways to monitor GC, but the only difference is how the GC operation information is shown. GC is done by JVM, and since the GC monitoring tools disclose the GC information provided by JVM, you will get the same results no matter how you monitor GC. Therefore, you do not need to learn all methods to monitor GC, but since it only requires a little amount of time to learn each GC monitoring method, knowing a few of them can help you use the right one for different situations and environments. The tools or JVM options listed below cannot be used universally regardless of the HVM vendor. This is because there is no need for a "standard" for disclosing GC information. In this example we will use HotSpot JVM (Oracle JVM). Since NHN is using Oracle (Sun) JVM, there should be no difficulties in applying the tools or JVM options that we are explaining here. First, the GC monitoring methods can be separated into CUI and GUI depending on the access interface. The typical CUI GC monitoring method involves using a separate CUI application called "jstat", or selecting a JVM option called "verbosegc" when running JVM. GUI GC monitoring is done by using a separate GUI application, and three most commonly used applications would be "jconsole", "jvisualvm" and "Visual GC". Let's learn more about each method. jstat jstat is a monitoring tool in HotSpot JVM. Other monitoring tools for HotSpot JVM are jps and jstatd. Sometimes, you need all three tools to monitor a Java application. jstat does not provide only the GC operation information display. It also provides class loader operation information or Just-in-Time compiler operation information. Among all the information jstat can provide, in this article we will only cover its functionality to monitor GC operating information. jstat is located in $JDK_HOME/bin, so if java or javac can run without setting a separate directory from the command line, so can jstat. You can try running the following in the command line. $> jstat –gc $ 1000 S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT 3008.0 3072.0 0.0 1511.1 343360.0 46383.0 699072.0 283690.2 75392.0 41064.3 2540 18.454 4 1.133 19.588 3008.0 3072.0 0.0 1511.1 343360.0 47530.9 699072.0 283690.2 75392.0 41064.3 2540 18.454 4 1.133 19.588 3008.0 3072.0 0.0 1511.1 343360.0 47793.0 699072.0 283690.2 75392.0 41064.3 2540 18.454 4 1.133 19.588 $> Just like in the example, the real type data will be output along with the following columns: S0C S1C S0U S1U EC EU OC OU PC. vmid (Virtual Machine ID), as its name implies, is the ID for the VM. Java applications running either on a local machine or on a remote machine can be specified using vmid. The vmid for Java application running on a local machine is called lvmid (Local vmid), and usually is PID. To find out the lvmid, you can write the PID value using a ps command or Windows task manager, but we suggest jps because PID and lvmid does not always match. jps stands for Java PS. jps shows vmids and main method information. Just like ps shows PIDs and process names. Find out the vmid of the Java application that you want to monitor by using jps, then use it as a parameter in jstat. If you use jps alone, only bootstrap information will show when several WAS instances are running in one equipment. We suggest that you use ps -ef | grep java command along with jps. GC performance data needs constant observation, therefore when running jstat, try to output the GC monitoring information on a regular basis. For example, running "jstat –gc 1000" (or 1s) will display the GC monitoring data on the console every 1 second. "jstat –gc 1000 10" will display the GC monitoring information once every 1 second for 10 times in total. There are many options other than -gc, among which GC related ones are listed below. Option Name Description gc It shows the current size for each heap area and its current usage (Ede, survivor, old, etc.), total number of GC performed, and the accumulated time for GC operations. gccapactiy It shows the minimum size (ms) and maximum size (mx) of each heap area, current size, and the number of GC performed for each area. (Does not show current usage and accumulated time for GC operations.) gccause It shows the "information provided by -gcutil" + reason for the last GC and the reason for the current GC. gcnew Shows the GC performance data for the new area. gcnewcapacity Shows statistics for the size of new area. gcold Shows the GC performance data for the old area. gcoldcapacity Shows statistics for the size of old area. gcpermcapacity Shows statistics for the permanent area. gcutil Shows the usage for each heap area in percentage. Also shows the total number of GC performed and the accumulated time for GC operations. Only looking at frequency, you will probably use -gcutil (or -gccause), -gc and -gccapacity the most in that order. -gcutil is used to check the usage of heap areas, the number of GC performed, and the total accumulated time for GC operations, while -gccapacity option and others can be used to check the actual size allocated. You can see the following output by using the -gc option: S0C S1C … GCT 1248.0 896.0 … 1.246 1248.0 896.0 … 1.246 … … … … Different jstat options show different types of columns, which are listed below. Each column information will be displayed when you use the "jstat option" listed on the right. Column Description Jstat Option S0C Displays the current size of Survivor0 area in KB -gc -gccapacity -gcnew -gcnewcapacity S1C Displays the current size of Survivor1 area in KB -gc -gccapacity -gcnew -gcnewcapacity S0U Displays the current usage of Survivor0 area in KB -gc -gcnew S1U Displays the current usage of Survivor1 area in KB -gc -gcnew EC Displays the current size of Eden area in KB -gc -gccapacity -gcnew -gcnewcapacity EU Displays the current usage of Eden area in KB -gc -gcnew OC Displays the current size of old area in KB -gc -gccapacity -gcold -gcoldcapacity OU Displays the current usage of old area in KB -gc -gcold PC Displays the current size of permanent area in KB -gc -gccapacity -gcold -gcoldcapacity -gcpermcapacity PU Displays the current usage of permanent area in KB -gc -gcold YGC The number of GC event occurred in young area -gc -gccapacity -gcnew -gcnewcapacity -gcold -gcoldcapacity -gcpermcapacity -gcutil -gccause YGCT The accumulated time for GC operations for Yong area -gc -gcnew -gcutil -gccause FGC The number of full GC event occurred -gc -gccapacity -gcnew -gcnewcapacity -gcold -gcoldcapacity -gcpermcapacity -gcutil -gccause FGCT The accumulated time for full GC operations -gc -gcold -gcoldcapacity -gcpermcapacity -gcutil -gccause GCT The total accumulated time for GC operations -gc -gcold -gcoldcapacity -gcpermcapacity -gcutil -gccause NGCMN The minimum size of new area in KB -gccapacity -gcnewcapacity NGCMX The maximum size of max area in KB -gccapacity -gcnewcapacity NGC The current size of new area in KB -gccapacity -gcnewcapacity OGCMN The minimum size of old area in KB -gccapacity -gcoldcapacity OGCMX The maximum size of old area in KB -gccapacity -gcoldcapacity OGC The current size of old area in KB -gccapacity -gcoldcapacity PGCMN The minimum size of permanent area in KB -gccapacity -gcpermcapacity PGCMX The maximum size of permanent area in KB -gccapacity -gcpermcapacity PGC The current size of permanent generation area in KB -gccapacity -gcpermcapacity PC The current size of permanent area in KB -gccapacity -gcpermcapacity PU The current usage of permanent area in KB -gc -gcold LGCC The cause for the last GC occurrence -gccause GCC The cause for the current GC occurrence -gccause TT Tenuring threshold. If copied this amount of times in young area (S0 ->S1, S1->S0), they are then moved to old area. -gcnew MTT Maximum Tenuring threshold. If copied this amount of times inside young arae, then they are moved to old area. -gcnew DSS Adequate size of survivor in KB -gcnew The advantage of jstat is that it can always monitor the GC operation data of Java applications running on local/remote machine, as long as a console can be used. From these items, the following result is output when –gcutil is used. At the time of GC tuning, pay careful attention to YGC, YGCT, FGC, FGCT and GCT. S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 66.44 54.12 10.58 86.63 217 0.928 2 0.067 0.995 0.00 66.44 54.12 10.58 86.63 217 0.928 2 0.067 0.995 0.00 66.44 54.12 10.58 86.63 217 0.928 2 0.067 0.995 These items are important because they show how much time was spent in running GC. In this example, YGC is 217 and YGCT is 0.928. So, after calculating the arithmetical average, you can see that it required about 4 ms (0.004 seconds) for each young GC. Likewise, the average full GC time us 33ms. But the arithmetical average often does not help analyzing the actual GC problem. This is due to the severe deviations in GC operation time. (In other words, if the average time is 0.067 seconds for a full GC, one GC may have lasted 1 ms while the other one lasted 57 ms.) In order to check the individual GC time instead of the arithmetical average time, it is better to use -verbosegc. -verbosegc -verbosegc is one of the JVM options specified when running a Java application. While jstat can monitor any JVM application that has not specified any options, -verbosegc needs to be specified in the beginning, so it could be seen as an unnecessary option (since jstat can be used instead). However, as -verbosegc displays easy to understand output results whenever a GC occurs, it is very helpful for monitoring rough GC information. jstat -verbosegc Monitoring Target Java application running on a machine that can log in to a terminal, or a remote Java application that can connect to the network by using jstatd Only when -verbogc was specified as a JVM starting option Output information Heap status (usage, maximum size, number of times for GC/time, etc.) Size of ew and old area before/after GC, and GC operation time Output Time Every designated time Whenever GC occurs Whenever useful When trying to observe the changes of the size of heap area When trying to see the effect of a single GC The followings are other options that can be used with -verbosegc. -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps (from JDK 6 update 4) If only -verbosegc is used, then -XX:+PrintGCDetails is applied by default. Additional options for –verbosgc are not exclusive and can be mixed and used together. When using -verbosegc, you can see the results in the following format whenever a minor GC occurs. [GC [: -> , secs] -> , secs] ] Collector Name of Collector Used for minor gc starting occupancy1 The size of young area before GC ending occupancy1 The size of young area after GC pause time1 The time when the Java application stopped running for minor GC starting occupancy3 The total size of heap area before GC ending occupancy3 The total size of heap area after GC pause time3 The time when the Java application stopped running for overall heap GC, including major GC This is an example of -verbosegc output for minor GC: S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 66.44 54.12 10.58 86.63 217 0.928 2 0.067 0.995 0.00 66.44 54.12 10.58 86.63 217 0.928 2 0.067 0.995 0.00 66.44 54.12 10.58 86.63 217 0.928 2 0.067 0.995 This is the example of output results after an Full GC occurred. [Full GC [Tenured: 3485K->4095K(4096K), 0.1745373 secs] 61244K->7418K(63104K), [Perm : 10756K->10756K(12288K)], 0.1762129 secs] [Times: user=0.19 sys=0.00, real=0.19 secs] If a CMS collector is used, then the following CMS information can be provided as well. As -verbosegc option outputs a log every time a GC event occurs, it is easy to see the changes of the heap usage rates caused by GC operation. (Java) VisualVM + Visual GC Java Visual VM is a GUI profiling/monitoring tool provided by Oracle JDK. Figure 1: VisualVM Screenshot. Instead of the version that is included with JDK, you can download Visual VM directly from its website. For the sake of convenience, the version included with JDK will be referred to as Java VisualVM (jvisualvm), and the version available from the website will be referred to as Visual VM (visualvm). The features of the two are not exactly identical, as there are slight differences, such as when installing plug-ins. Personally, I prefer the Visual VM version, which can be downloaded from the website. After running Visual VM, if you select the application that you wish to monitor from the window on the left side, you can find the "Monitoring" tab there. You can get the basic information about GC and Heap from this Monitoring tab. Though the basic GC status is also available through the basic features of VisualVM, you cannot access detailed information that is available from either jstat or -verbosegc option. If you want the detailed information provided by jstat, then it is recommended to install the Visual GC plug-in. Visual GC can be accessed in real time from the Tools menu. Figure 2: Viusal GC Installation Screenshot. By using Visual GC, you can see the information provided by running jstatd in a more intuitive way. Figure 3: Visual GC execution screenshot. HPJMeter HPJMeter is convenient for analyzing -verbosegc output results. If Visual GC can be considered as the GUI equivalent of jstat, then HPJMeter would be the GUI equivalent of -verbosgc. Of course, GC analysis is just one of the many features provided by HPJMeter. HPJMeter is a performance monitoring tool developed by HP. It can be used in HP-UX, as well as Linux and MS Windows. Originally, a tool called HPTune used to provide the GUI analysis feature for -verbosegc. However, since the HPTune feature has been integrated into HPJMeter since version 3.0, there is no need to download HPTune separately. When executing an application, the -verbosegc output results will be redirected to a separate file. You can open the redirected file with HPJMeter, which allows faster and easier GC performance data analysis through the intuitive GUI. Figure 4: HPJMeter.
October 24, 2012
by Esen Sagynov
· 99,677 Views · 7 Likes
article thumbnail
Understanding JVM Internals, from Basic Structure to Java SE 7 Features
Learn about the structure of JVM, how it works, executes Java bytecode, the order of execution, examples of common mistakes and their solutions, new Java SE 7 features.
October 19, 2012
by Esen Sagynov
· 179,980 Views · 20 Likes
article thumbnail
SQL Query Optimization and Normalization
Explore SQL query optimization and normalization.
October 4, 2012
by Michael Georgiou
· 37,738 Views · 2 Likes
article thumbnail
Resolve Circular Dependency in Spring Autowiring
I would consider this post as best practice for using Spring in enterprise application development.
September 27, 2012
by Gal Levinsky
· 127,391 Views · 24 Likes
article thumbnail
Building a Simple TCP Proxy Server with node.js
Today we're going to build a simple TCP proxy server. The scenario: we've got one host (the client) that establishes a TCP connection to another one (the remote). client —> remote We want to set up a proxy server in the middle, so the client will establish the connection with the proxy and the proxy will forward it to the remote, keeping in mind the remote response also. With node.js is really simple to perform those kind of network operations. client —> proxy -> remote var net = require('net'); var LOCAL_PORT = 6512; var REMOTE_PORT = 6512; var REMOTE_ADDR = "192.168.1.25"; var server = net.createServer(function (socket) { socket.on('data', function (msg) { console.log(' ** START **'); console.log('<< From client to proxy ', msg.toString()); var serviceSocket = new net.Socket(); serviceSocket.connect(parseInt(REMOTE_PORT), REMOTE_ADDR, function () { console.log('>> From proxy to remote', msg.toString()); serviceSocket.write(msg); }); serviceSocket.on("data", function (data) { console.log('<< From remote to proxy', data.toString()); socket.write(data); console.log('>> From proxy to client', data.toString()); }); }); }); server.listen(LOCAL_PORT); console.log("TCP server accepting connection on port: " + LOCAL_PORT); Simple, isn’t it? Source code in github
September 20, 2012
by Gonzalo Ayuso
· 23,586 Views
article thumbnail
Java 7: HashMap vs ConcurrentHashMap
As you may have seen from my past performance related articles and HashMap case studies, Java thread safety problems can bring down your Java EE application and the Java EE container fairly easily. One of most common problems I have observed when troubleshooting Java EE performance problems is infinite looping triggered from the non-thread safe HashMap get() and put() operations. This problem is known since several years but recent production problems have forced me to revisit this issue one more time. This article will revisit this classic thread safety problem and demonstrate, using a simple Java program, the risk associated with a wrong usage of the plain old java.util.HashMap data structure involved in a concurrent threads context. This proof of concept exercise will attempt to achieve the following 3 goals: Revisit and compare the Java program performance level between the non-thread safe and thread safe Map data structure implementations (HashMap, Hashtable, synchronized HashMap, ConcurrentHashMap) Replicate and demonstrate the HashMap infinite looping problem using a simple Java program that everybody can compile, run and understand Review the usage of the above Map data structures in a real-life and modern Java EE container implementation such as JBoss AS7 For more detail on the ConcurrentHashMap implementation strategy, I highly recommend the great article from Brian Goetz on this subject. Tools and server specifications As a starting point, find below the different tools and software’s used for the exercise: Sun/Oracle JDK & JRE 1.7 64-bit Eclipse Java EE IDE Windows Process Explorer (CPU per Java Thread correlation) JVM Thread Dump (stuck thread analysis and CPU per Thread correlation) The following local computer was used for the problem replication process and performance measurements: Intel(R) Core(TM) i5-2520M CPU @ 2.50Ghz (2 CPU cores, 4 logical cores) 8 GB RAM Windows 7 64-bit * Results and performance of the Java program may vary depending of your workstation or server specifications. Java program In order to help us achieve the above goals, a simple Java program was created as per below: The main Java program is HashMapInfiniteLoopSimulator.java A worker Thread class WorkerThread.java was also created The program is performing the following: Initialize different static Map data structures with initial size of 2 Assign the chosen Map to the worker threads (you can chose between 4 Map implementations) Create a certain number of worker threads (as per the header configuration). 3 worker threads were created for this proof of concept NB_THREADS = 3; Each of these worker threads has the same task: lookup and insert a new element in the assigned Map data structure using a random Integer element between 1 – 1 000 000. Each worker thread perform this task for a total of 500K iterations The overall program performs 50 iterations in order to allow enough ramp up time for the HotSpot JVM The concurrent threads context is achieved using the JDK ExecutorService As you can see, the Java program task is fairly simple but complex enough to generate the following critical criteria’s: Generate concurrency against a shared / static Map data structure Use a mix of get() and put() operations in order to attempt to trigger internal locks and / or internal corruption (for the non-thread safe implementation) Use a small Map initial size of 2, forcing the internal HashMap to trigger an internal rehash/resize Finally, the following parameters can be modified at your convenience: ## Number of worker threads private static final int NB_THREADS = 3; ## Number of Java program iterations private static final int NB_TEST_ITERATIONS = 50; ## Map data structure assignment. You can choose between 4 structures // Plain old HashMap (since JDK 1.2) nonThreadSafeMap = new HashMap(2); // Plain old Hashtable (since JDK 1.0) threadSafeMap1 = new Hashtable(2); // Fully synchronized HashMap threadSafeMap2 = new HashMap(2); threadSafeMap2 = Collections.synchronizedMap(threadSafeMap2); // ConcurrentHashMap (since JDK 1.5) threadSafeMap3 = new ConcurrentHashMap(2); /*** Assign map at your convenience ****/ assignedMapForTest = threadSafeMap3; Now find below the source code of our sample program. #### HashMapInfiniteLoopSimulator.java package org.ph.javaee.training4; import java.util.Collections; import java.util.Map; import java.util.HashMap; import java.util.Hashtable; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; /** * HashMapInfiniteLoopSimulator * @author Pierre-Hugues Charbonneau * */ public class HashMapInfiniteLoopSimulator { private static final int NB_THREADS = 3; private static final int NB_TEST_ITERATIONS = 50; private static Map assignedMapForTest = null; private static Map nonThreadSafeMap = null; private static Map threadSafeMap1 = null; private static Map threadSafeMap2 = null; private static Map threadSafeMap3 = null; /** * Main program * @param args */ public static void main(String[] args) { System.out.println("Infinite Looping HashMap Simulator"); System.out.println("Author: Pierre-Hugues Charbonneau"); System.out.println("http://javaeesupportpatterns.blogspot.com"); for (int i=0; i(2); // Plain old Hashtable (since JDK 1.0) threadSafeMap1 = new Hashtable(2); // Fully synchronized HashMap threadSafeMap2 = new HashMap(2); threadSafeMap2 = Collections.synchronizedMap(threadSafeMap2); // ConcurrentHashMap (since JDK 1.5) threadSafeMap3 = new ConcurrentHashMap(2); // ConcurrentHashMap /*** Assign map at your convenience ****/ assignedMapForTest = threadSafeMap3; long timeBefore = System.currentTimeMillis(); long timeAfter = 0; Float totalProcessingTime = null; ExecutorService executor = Executors.newFixedThreadPool(NB_THREADS); for (int j = 0; j < NB_THREADS; j++) { /** Assign the Map at your convenience **/ Runnable worker = new WorkerThread(assignedMapForTest); executor.execute(worker); } // This will make the executor accept no new threads // and finish all existing threads in the queue executor.shutdown(); // Wait until all threads are finish while (!executor.isTerminated()) { } timeAfter = System.currentTimeMillis(); totalProcessingTime = new Float( (float) (timeAfter - timeBefore) / (float) 1000); System.out.println("All threads completed in "+totalProcessingTime+" seconds"); } } } #### WorkerThread.java package org.ph.javaee.training4; import java.util.Map; /** * WorkerThread * * @author Pierre-Hugues Charbonneau * */ public class WorkerThread implements Runnable { private Map map = null; public WorkerThread(Map assignedMap) { this.map = assignedMap; } @Override public void run() { for (int i=0; i<500000; i++) { // Return 2 integers between 1-1000000 inclusive Integer newInteger1 = (int) Math.ceil(Math.random() * 1000000); Integer newInteger2 = (int) Math.ceil(Math.random() * 1000000); // 1. Attempt to retrieve a random Integer element Integer retrievedInteger = map.get(String.valueOf(newInteger1)); // 2. Attempt to insert a random Integer element map.put(String.valueOf(newInteger2), newInteger2); } } } Performance comparison between thread safe Map implementations The first goal is to compare the performance level of our program when using different thread safe Map implementations: Plain old Hashtable (since JDK 1.0) Fully synchronized HashMap (via Collections.synchronizedMap()) ConcurrentHashMap (since JDK 1.5) Find below the graphical results of the execution of the Java program for each iteration along with a sample of the program console output. # Output when using ConcurrentHashMap Infinite Looping HashMap Simulator Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com All threads completed in 0.984 seconds All threads completed in 0.908 seconds All threads completed in 0.706 seconds All threads completed in 1.068 seconds All threads completed in 0.621 seconds All threads completed in 0.594 seconds All threads completed in 0.569 seconds All threads completed in 0.599 seconds ……………… As you can see, the ConcurrentHashMap is the clear winner here, taking in average only half a second (after an initial ramp-up) for all 3 worker threads to concurrently read and insert data within a 500K looping statement against the assigned shared Map. Please note that no problem was found with the program execution e.g. no hang situation. The performance boost is definitely due to the improved ConcurrentHashMap performance such as the non-blocking get() operation. The 2 other Map implementations performance level was fairly similar with a small advantage for the synchronized HashMap. HashMap infinite looping problem replication The next objective is to replicate the HashMap infinite looping problem observed so often from Java EE production environments. In order to do that, you simply need to assign the non-thread safe HashMap implementation as per code snippet below: /*** Assign map at your convenience ****/ assignedMapForTest = nonThreadSafeMap; Running the program as is using the non-thread safe HashMap should lead to: No output other than the program header Significant CPU increase observed from the system At some point the Java program will hang and you will be forced to kill the Java process What happened? In order to understand this situation and confirm the problem, we will perform a CPU per Thread analysis from the Windows OS using Process Explorer and JVM Thread Dump. 1 - Run the program again then quickly capture the thread per CPU data from Process Explorer as per below. Under explore.exe you will need to right click over the javaw.exe and select properties. The threads tab will be displayed. We can see overall 4 threads using almost all the CPU of our system. 2 – Now you have to quickly capture a JVM Thread Dump using the JDK 1.7 jstack utility. For our example, we can see our 3 worker threads which seems busy/stuck performing get() and put() operations. ..\jdk1.7.0\bin>jstack 272 2012-08-29 14:07:26 Full thread dump Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode): "pool-1-thread-3" prio=6 tid=0x0000000006a3c000 nid=0x18a0 runnable [0x0000000007ebe000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) "pool-1-thread-2" prio=6 tid=0x0000000006a3b800 nid=0x6d4 runnable [0x000000000805f000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.get(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:29) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) "pool-1-thread-1" prio=6 tid=0x0000000006a3a800 nid=0x2bc runnable [0x0000000007d9e000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) .............. 3 – CPU per thread correlation It is now time to convert the Process Explorer thread ID DECIMAL format to HEXA format as per below. The HEXA value allows us to map and identify each thread as per below: ## TID: 1748 (nid=0X6D4) Thread name: pool-1-thread-2 CPU @25.71% Task: Worker thread executing a HashMap.get() operation at java.util.HashMap.get(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:29) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ## TID: 700 (nid=0X2BC) Thread name: pool-1-thread-1 CPU @23.55% Task: Worker thread executing a HashMap.put() operation at java.util.HashMap.put(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ## TID: 6304 (nid=0X18A0) Thread name: pool-1-thread-3 CPU @12.02% Task: Worker thread executing a HashMap.put() operation at java.util.HashMap.put(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ## TID: 5944 (nid=0X1738) Thread name: pool-1-thread-1 CPU @20.88% Task: Main Java program execution "main" prio=6 tid=0x0000000001e2b000 nid=0x1738 runnable [0x00000000029df000] java.lang.Thread.State: RUNNABLE at org.ph.javaee.training4.HashMapInfiniteLoopSimulator.main(HashMapInfiniteLoopSimulator.java:75) As you can see, the above correlation and analysis is quite revealing. Our main Java program is in a hang state because our 3 worker threads are using lot of CPU and not going anywhere. They may appear "stuck" performing HashMap get() & put() but in fact they are all involved in an infinite loop condition. This is exactly what we wanted to replicate. HashMap infinite looping deep dive Now let’s push the analysis one step further to better understand this looping condition. For this purpose, we added tracing code within the JDK 1.7 HashMap Java class itself in order to understand what is happening. Similar logging was added for the put() operation and also a trace indicating that the internal & automatic rehash/resize got triggered. The tracing added in get() and put() operations allows us to determine if the for() loop is dealing with circular dependency which would explain the infinite looping condition. #### HashMap.java get() operation public V get(Object key) { if (key == null) return getForNullKey(); int hash = hash(key.hashCode()); /*** P-H add-on- iteration counter ***/ int iterations = 1; for (Entry e = table[indexFor(hash, table.length)]; e != null; e = e.next) { /*** Circular dependency check ***/ Entry currentEntry = e; Entry nextEntry = e.next; Entry nextNextEntry = e.next != null?e.next.next:null; K currentKey = currentEntry.key; K nextNextKey = nextNextEntry != null?(nextNextEntry.key != null?nextNextEntry.key:null):null; System.out.println("HashMap.get() #Iterations : "+iterations++); if (currentKey != null && nextNextKey != null ) { if (currentKey == nextNextKey || currentKey.equals(nextNextKey)) System.out.println(" ** Circular Dependency detected! ["+currentEntry+"]["+nextEntry+"]"+"]["+nextNextEntry+"]"); } /***** END ***/ Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null; } HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.resize() in progress... HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 2 HashMap.resize() in progress... HashMap.resize() in progress... HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 2 HashMap.put() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 ** Circular Dependency detected! [362565=362565][333326=333326]][362565=362565] HashMap.put() #Iterations : 2 ** Circular Dependency detected! [333326=333326][362565=362565]][333326=333326] HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 ............................. HashMap.put() #Iterations : 56823 Again, the added logging was quite revealing. We can see that following a few internal HashMap.resize() the internal structure became affected, creating circular dependency conditions and triggering this infinite looping condition (#iterations increasing and increasing...) with no exit condition. It is also showing that the resize() / rehash operation is the most at risk of internal corruption, especially when using the default HashMap size of 16. This means that the initial size of the HashMap appears to be a big factor in the risk & problem replication. Finally, it is interesting to note that we were able to successfully run the test case with the non-thread safe HashMap by assigning an initial size setting at 1000000, preventing any resize at all. Find below the merged graph results: The HashMap was our top performer but only when preventing an internal resize. Again, this is definitely not a solution to the thread safe risk but just a way to demonstrate that the resize operation is the most at risk given the entire manipulation of the HashMap performed at that time. The ConcurrentHashMap, by far, is our overall winner by providing both fast performance and thread safety against that test case. JBoss AS7 Map data structures usage We will now conclude this article by looking at the different Map implementations within a modern Java EE container implementation such as JBoss AS 7.1.2. You can obtain the latest source code from the github master branch. Find below the report: Total JBoss AS7.1.2 Java files (August 28, 2012 snapshot): 7302 Total Java classes using java.util.Hashtable: 72 Total Java classes using java.util.HashMap: 512 Total Java classes using synchronized HashMap: 18 Total Java classes using ConcurrentHashMap: 46 Hashtable references were found mainly within the test suite components and from naming and JNDI related implementations. This low usage is not a surprise here. References to the java.util.HashMap were found from 512 Java classes. Again not a surprise given how common this implementation is since the last several years. However, it is important to mention that a good ratio was found either from local variables (not shared across threads), synchronized HashMap or manual synchronization safeguard so “technically” thread safe and not exposed to the above infinite looping condition (pending/hidden bugs is still a reality given the complexity with Java concurrency programming…this case study involving Oracle Service Bus 11g is a perfect example). A low usage of synchronized HashMap was found with only 18 Java classes from packages such as JMS, EJB3, RMI and clustering. Finally, find below a breakdown of the ConcurrentHashMap usage which was our main interest here. As you will see below, this Map implementation is used by critical JBoss components layers such as the Web container, EJB3 implementation etc. ## JBoss Single Sign On Used to manage internal SSO ID's involving concurrent Thread access Total: 1 ## JBoss Java EE & Web Container Not surprising here since lot of internal Map data structures are used to manage the http sessions objects, deployment registry, clustering & replication, statistics etc. with heavy concurrent Thread access. Total: 11 ## JBoss JNDI & Security Layer Used by highly concurrent structures such as internal JNDI security management. Total: 4 ## JBoss domain & managed server management, rollout plans... Total: 7 ## JBoss EJB3 Used by data structures such as File Timer persistence store, application Exception, Entity Bean cache, serialization, passivation... Total: 8 ## JBoss kernel, Thread Pools & protocol management Used by high concurrent Threads Map data structures involved in handling and dispatching/processing incoming requests such as HTTP. Total: 3 ## JBoss connectors such as JDBC/XA DataSources... Total: 2 ## Weld (reference implementation of JSR-299: Contexts and Dependency Injection for the JavaTM EE platform) Used in the context of ClassLoader and concurrent static Map data structures involving concurrent Threads access. Total: 3 ## JBoss Test Suite Used in some integration testing test cases such as an internal Data Store, ClassLoader testing etc. Total: 3 Final words I hope this article has helped you revisit this classic problem and understand one of the common problems and risks associated with a wrong usage of the non-thread safe HashMap implementation. My main recommendation to you is to be careful when using an HashMap in a concurrent threads context. Unless you are a Java concurrency expert, I recommend that you use ConcurrentHashMap instead which offers a very good balance between performance and thread safety. As usual, extra due diligence is always recommended such as performing cycles of load & performance testing. This will allow you to detect thread safety and / or performance problems before you promote the solution to your client production environment. Please provide any comments and share your experience with ConcurrentHashMap or HashMap implementations and troubleshooting.
September 7, 2012
by Pierre - Hugues Charbonneau
· 154,814 Views · 5 Likes
  • Previous
  • ...
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×