Language Resources

The Latest Languages Topics

Groovy Goodness: Drop or Take Elements with Condition

In Groovy we can use the drop() and take() methods to get elements from a collection or String object. Since Groovy 1.8.7 we also can use the dropWhile() and takeWhile() methods and use a closure to define a condition to stop dropping or taking elements. With the dropWhile() method we drop elements or characters until the condition in the closure is true. And the takeWhile() method returns elements from a collection or characters from a String until the condition of the closure is true. In the following example we see how we can use the methods: def s = "Groovy Rocks!" assert s.takeWhile { it != 'R' } == 'Groovy ' assert s.dropWhile { it != 'R' } == 'Rocks!' def list = 0..10 assert 0..4 == list.takeWhile { it < 5 } assert 5..10 == list.dropWhile { it < 5 } def m = [name: 'mrhaki', loves: 'Groovy', worksAt: 'JDriven'] assert [name: 'mrhaki'] == m.takeWhile { key, value -> key.length() == 4 } assert [loves: 'Groovy', worksAt: 'JDriven'] == m.dropWhile { it.key == 'name' } (Code is written with Groovy 2.0.4)

October 11, 2012

by Hubert Klein Ikkink

· 6,135 Views

How to Tune Java Garbage Collection

This is the third article in the series of "Become a Java GC Expert". In the first issue Understanding Java Garbage Collection we have learned about the processes for different GC algorithms, about how GC works, what Young and Old Generation is, what you should know about the 5 types of GC in the new JDK 7, and what the performance implications are for each of these GC types. In the second article How to Monitor Java Garbage Collection I have explained how JVM actually runs the Garbage Collection in the real time, how we can monitor GC, and which tools we can use to make this process faster and more effective. In this third article based on real cases as our examples I will show some of the best options you can use for GC tuning. I have written this article under the assumption that you have already understood the previous articles in this series. Therefore, for your further understanding, if you haven't already read the two previous articles, please do so before reading this one. Is GC Tuning Required? Or more precisely is GC tuning required for Java-based services? I should say GC tuning is not always required for all Java-based services. This means a Java-based system in operation has the following options and actions: The memory size has been specified using -Xms and –Xmx options. The -server option is included. Logs such as Timeout log are not left in the system. In other words, if you have not set the memory size and too many Timeout logs are printed, you need to perform GC tuning on your system. But, there is one thing to keep in mind: GC tuning is the last task to be done. Think about the fundamental cause of GC tuning. The Garbage Collector clears an object created in Java. The number of objects necessary to be cleared by the garbage collector as well as the number of GCs to be executed depend on the number of objects which have been created. Therefore, to control the GC performed by your system, you should, first, decrease the number of objects created. There is a saying, "many a little makes a mickle." We need to take care of small things, or they will add up and become something big which is difficult to manage. We need to use and make StringBuilder or StringBuffer a way of life instead of String. And it is better to accumulate as few logs as possible. However, we know that there are some cases we cannot help. We have seen that XML and JSON parsing use the most memory. Even though we use String as little as possible and process logs as well as we can, a huge temporary memory is used for parsing XML or JSON, some 10-100 MB. However, it is difficult not to use XML and JSON. Just understand that it takes too much memory. If application memory usage improves after repeated tunings, you can start GC tuning. I classify the purposes of GC tuning into two. One is to minimize the number of objects passed to the old area; and the other is to decrease Full GC execution time. Minimizing Number of Objects Passed to Old Area Generational GC is the GC provided by Oracle JVM, excluding the G1 GC which can be used from JDK 7 and higher versions. In other words, an object is created in the Eden area and transferred from and to the Survivor area. After that, the objects left are sent to the Old area. Some objects are created in the Eden area and directly passed to the Old area because of their large size. GC in the Old area takes relatively more time than the GC in the New area. Therefore, decreasing the number of objects passed to the Old area can decrease the full GC in frequency. Decreasing the number of objects passed to the Old area may be misunderstood as choosing to leave the object in the New area. However, this is impossible. Instead, you can adjust the size of the New area. Decreasing Full GC Time The execution time of Full GC is relatively longer than that of Minor GC. Therefore, if it takes too much time to execute Full GC (1 second or more), timeout may occur in several connected parts. If you try to decrease the Old area size to decrease Full GC execution time, OutOfMemoryError may occur or the number of Full GCs may increase. Alternatively, if you try to decrease the number of Full GC by increasing the Old area size, the execution time will be increased. Therefore, you need to set the Old area size to a "proper" value. Options Affecting the GC Performance As I have mentioned at the end of Understanding Java Garbage Collection, do not think that "Somebody's got a great performance when he used GC options. Why don't we use that option as he did?" The reason is that the size of objects created and their lifetime is different from one Web service to another. Simply consider, if a task is performed under the conditions of A, B, C, D and E, and the same task is performed under the conditions of only A and B, then which one will be done quicker? From a common-sense standpoint, the answer would be the task which is performed under conditions of A and B. Java GC options are the same. Setting several options does not enhance the speed of executing GC. Rather, it may make it slower. The basic principle of GC tuning is to apply the different GC options to two or more servers and compare them, and then add those options to the server for which the server has demonstrated enhanced performance or better GC time. Keep this in mind. The following table shows options related to memory size among the GC options that can affect performance. Table 1: JVM Options to Be Checked for GC Tuning. Classification Option Description Heap area size -Xms Heap area size when starting JVM -Xmx Maximum heap area size New area size -XX:NewRatio Ratio of New area and Old area -XX:NewSize New area size -XX:SurvivorRatio Ratio of Eden area and Survivor area I frequently use -Xms, -Xmx, and -XX:NewRatio options for GC tuning. -Xms and -Xmx option are particularly required. How you set the NewRatio option makes a significant difference on GC performance. Some people ask how to set the Perm area size? You can set the Perm area size with the -XX:PermSize and -XX:MaxPermSize options but only when OutOfMemoryError occurs and the cause is the Perm area size. Another option that may affect the GC performance is the GC type. The following table shows available options by GC type (based on JDK 6.0). Table 2: Available Options by GC Type. Classification Option Remarks Serial GC -XX:+UseSerialGC Parallel GC -XX:+UseParallelGC -XX:ParallelGCThreads=value Parallel Compacting GC -XX:+UseParallelOldGC CMS GC -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=value -XX:+UseCMSInitiatingOccupancyOnly G1 -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC In JDK 6, these two options must be used together. Except G1 GC, the GC type is changed by setting the option at the first line of each GC type. The most general GC type that does not intrude is Serial GC. It is optimized for client systems. There are a lot of options that affect GC performance. But you can get significant effect by setting the options mentioned above. Remember that setting too many options does not promise enhanced GC execution time. Procedure of GC Tuning The procedure of GC tuning is similar to the general performance improvement procedure. The following is the GC tuning procedure that I use. 1. Monitoring GC status You need to monitor the GC status to check the GC status of the system in operation. Please see various GC monitoring methods in How to Monitor Java Garbage Collection. 2. Deciding whether to tune GC after analyzing the monitoring result After checking the GC status, you should analyze the monitoring result and decide whether to tune GC or not. If the analysis shows that the time taken to execute GC is just 0.1-0.3 seconds. you don't need to waste your time on tuning the GC. However, if the GC execution time is 1-3 seconds, or more than 10 seconds, GC tuning is necessary. But, if you have allocated about 10GB Java memory and it is impossible to decrease the memory size, there is no way to tune GC. Before tuning GC, you need to think about why you need to allocate large memory size. If you have allocated the memory of 1 GB or 2 GB and OutOfMemoryError occurs, you should execute heap dump to verify and remove the cause. Note: Heap dump is a file of the memory that is used to check the objects and data in the Java memory. This file can be created by using the jmap command included in the JDK. While creating the file, the Java process stops. Therefore, do not create this file while the system is operating. Search on the Internet the detailed description on heap dump. For Korean readers, see my book I published last year: The story of troubleshooting for Java developers and system operators (Sangmin Lee, Hanbit Media, 2011, 416 pages). 3. Setting GC type/memory size If you have decided on GC tuning, select the GC type and set the memory size. At this time, if you have several servers, it is important to check the difference of each GC option by setting different GC options for each server. 4. Analyzing results Start analyzing the results after collecting data for at least 24 hours after setting GC options. If you are lucky, you will find the most suitable GC options for the system. If you are not, you should analyze the logs and check how the memory has been allocated. Then you need to find the optimum options for the system by changing the GC type/memory size. 5. If the result is satisfactory, apply the option to all servers and terminate GC tuning. If the GC tuning result is satisfactory, apply the option to all the servers and terminate GC tuning. In the following section, you will see the tasks to be done in each stage. Monitoring GC Status and Analyzing Results The best way to check the GC status of the Web Application Server (WAS) in operation is to use the jstat command. I have explained the jstat command in How To Monitor Java Garbage Collection, so I will describe the data to check in this article. The following example shows a JVM for which GC tuning has not been done (however, it is not the operation server). $ jstat -gcutil 21719 1s S0 S1 E O P YGC YGCT FGC FGCT GCT 48.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.673 48.66 0.00 48.10 49.70 77.45 3428 172.623 3 59.050 231.673 Here, check the values of YGC and YGCT. Divide YGCT by YGC. Then you get 0.050 seconds (50 ms). It means that it takes average 50 ms to execute GC in the Young area. With that result, you don't need to care about GC for the Young area. And now, check the values of FGCT and FGC. Divide FGCT by FGC. Then you get 19.68 seconds. It means that it takes average 19.68 seconds to execute GC. It may take 19.68 seconds to execute GC three times. Otherwise, it takes 1 second to execute GC two times and 58 seconds for once. In both cases, GC tuning is required. You can easily check GC status by using the jstat command; however, the best way to analyze GC is by generating logs with the –verbosegc option. For a detailed description on how to generate and tools to analyze logs, I have explained it the previous article. HPJMeter is my favorite among tools that are used to analyze the -verbosegc log. It is easy to use and analyze. With HPJmeter you can easily check the distribution of GC execution times and the frequency of GC occurrence. If the GC execution time meets all of the following conditions, GC tuning is not required. Minor GC is processed quickly (within 50 ms). Minor GC is not frequently executed (about 10 seconds). Full GC is processed quickly (within 1 second). Full GC is not frequently executed (once per 10 minutes). The values in parentheses are not the absolute values; they vary according to the service status. Some services may be satisfied with 0.9 seconds of Full GC processing speed, but some may not. Therefore, check the values and decide whether to execute GC tuning or not by considering each service. There is one thing you should be careful of when you check the GC status; do not check the time of Minor GC and Full GC only. You must check the number of GC executions, as well. If the New area size is too small, Minor GC will be too frequently executed (sometimes once or more per 1 second). In addition, the number of objects passed to the Old area increases, causing increased Full GC executions. Therefore, apply the –gccapacity option in the stat command to check how much the area is occupied. Setting GC Type/Memory Size Setting GC Type There are five GC types for Oracle JVM. However, if not JDK 7, one among Parallel GC, Parallel Compacting GC and CMS GC should be selected. There is no principle or rule to decide which one to select. If so, how can we select one? The most recommended way is to apply all three. However, one thing is clear - CMS GC is faster than other Parallel GCs. At this time, if so, just apply CMS GC. However, CMS GC is not always faster. Generally, Full GC of CMS GC is fast, however, when concurrent mode failure occurs, it is slower than other Parallel GCs. Concurrent mode failure Let's take a deeper look into the concurrent mode failure. The biggest difference between Parallel GC and CMS GC is the compaction task. The compaction task is to remove memory fragmentation by compacting memory in order to remove the empty space between allocated memory areas. In the Parallel GC type, the compaction is executed whenever Full GC is executed, taking too much time. However, after executing Full GC, memory can be allocated in a faster way since the next memory can be allocated sequentially. On the contrary, CMS GC does not accompany compaction. Therefore, the CMS GC is executed faster. However, when compaction is not executed, some empty spaces are generated in the memory as before executing Disk Defragmenter. Therefore, there may be no space for large objects. For example, 300 MB is left in the Old area, but some 10 MB objects cannot be sequentially saved in the area. In this case, "Concurrent mode failure" warning occurs and compaction is executed. However, if CMS GC is used, it takes a longer time to execute compaction than other Parallel GCs. And, it may cause another problem. For a more detailed description on concurrent mode failure, see Understanding CMS GC Logs, written by Oracle engineers. In conclusion, you should find the best GC type for your system. Each system requires its proper GC type, so you need to find the best GC type for your system. If you are running six servers, I recommend you to set the same options for each of two servers, add the -verbosegc option, and then analyze the result. Setting Memory Size The following shows the relationship between the memory size, the number of GC execution, and the GC execution time. Large memory size decreases the number of GC executions. increases the GC execution time. Small memory size decreases the GC execution time. increases the number of GC executions. There is no "right" answer to set the memory size to small or large. 10 GB is OK if the server resource is good and Full GC can be completed within 1 second even when the memory has been set to 10 GB. But most servers are not in the status. When the memory is set to 10 GB, it takes about 10 ~ 30 seconds to execute Full GC. Of course, the time may vary according the object size. If so, how we should set the memory size? Generally, I recommend 500 MB. But note that it does not mean that you should set the WAS memory with the –Xms500m and –Xmx500m options. Based on the current status before GC tuning, check the memory size left after Full GC. If there is about 300 MB left after Full GC, it is good to set the memory to 1 GB (300 MB (for default usage) + 500 MB (minimum for the Old area) + 200 MB (for free memory)). That means you should set the memory space with more than 500 MB for the Old area. Therefore, if you have three operation servers, set one server to 1 GB, one to 1.5 GB, and one to 2 GB, and then check the result. Theoretically, GC will be done fast in the order of 1 GB > 1.5 GB > 2 GB, so 1 GB will be the fastest to execute GC. However, it cannot be guaranteed that it takes 1 second to execute Full GC with 1 GB and 2 seconds with 2 GB. The time depends on the server performance and the object size. Therefore, the best way to create the measurement data is to set as many as possible and monitor them. You should set one more thing for setting the memory size: NewRatio. NewRatio is the ratio of the New area and the Old area. If XX:NewRatio=1, New area:Old area is 1:1. For 1 GB, New area:Old area is 500MB: 500MB. If NewRatio is 2, New area:Old area is 1:2. Therefore, as the value gets larger, the Old area size gets larger and the New area size gets smaller. It may not be an important thing, but NewRatio value significantly affects the entire GC performance. If the New area size is small, much memory is passed to the Old area, causing frequent Full GC and taking a long time to handle it. You may simply think that NewRatio 1 would be the best; however, it may not be so. When NewRatio is set to 2 or 3, the entire GC status may be better. And I have seen such cases. What is the fastest way to complete GC tuning? Comparing the results from performance tests is the fastest way to get the result. To set different options for each server and monitor the status, it is recommended to check the data after at least one or two days. However, you should prepare for giving the same load with the operation situation when you execute GC tuning through performance test. And the request ratio such as the URL that gives the load must be identical to that of the operation situation. However, giving accurate load is not easy for the professional performance tester and takes too long time for preparing. Therefore, it is more convenient and easier to apply the options to operation and wait for the result even though it takes a longer time. Analyzing GC Tuning Results After applying the GC option and setting the -verbosegc option, check whether the logs are accumulated as desired with the tail command. If the option is not exactly set and no log is accumulated, you will waste your time. If logs are accumulated as desired, check the result after collecting data for one or two days. The easiest way is to move logs to the local PC and analyze the data by using HPJMeter. In the analysis, focus on the following. The priority is determined by me. The most important item to decide the GC option is Full GC execution time. Full GC execution time Minor GC execution time Full GC execution interval Minor GC execution interval Entire Full GC execution time Entire Minor GC execution time Entire GC execution time Full GC execution times Minor GC execution timesl It is a very lucky case to find the most appropriate GC option, and in most cases, it's not. Be careful when executing GC tuning because OutOfMemoryError may occur if you try to complete GC tuning all at once. Examples of Tuning So far, we have theoretically discussed GC tuning without any examples. Now we will take a look at the examples of GC tuning. Example 1 The following example is GC tuning for Service S. For the newly developed Service S, it took too much time to execute Full GC. See the result of jstat –gcutil. S0 S1 E O P YGC YGCT FGC FGCT GCT 12.16 0.00 5.18 63.78 20.32 54 2.047 5 6.946 8.993 Information to the left Perm area is not important for the initial GC tuning. At this time, the values from the right YGC are important. The average value taken to execute Minor GC and Full GC once is calculated as below. Table 3: Average Time Taken to Execute Minor GC and Full GC for Service S. GC Type GC Execution Times GC Execution Time Average Minor GC 54 2.047 37 s Full GC 5 6.946 1,389 ms 37 ms is not bad for Minor GC. However, 1.389 seconds for Full GC means that timeout may frequently occur when GC occurs in the system of which DB Timeout is set to 1 second. In this case, the system requires GC tuning. First, you should check how the memory is used before starting GC tuning. Use the jstat –gccapacity option to check the memory usage. The result checked from this server is as follows. NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX OGC OC PGCMN PGCMX PGC PC YGC FGC 212992.0 212992.0 212992.0 21248.0 21248.0 170496.0 1884160.0 1884160.0 1884160.0 1884160.0 262144.0 262144.0 262144.0 262144.0 54 5 The key values are as follows. New area usage size: 212,992 KB Old area usage size: 1,884,160 KB Therefore, the totally allocated memory size is 2 GB, excluding the Perm area, and New area:Old area is 1:9. To check the status in a more detailed way than jstat, the -verbosegc log has been added and three options were set for the three instances as shown below. No other option has been added. NewRatio=2 NewRatio=3 NewRatio=4 After one day, the GC log of the system has been checked. Fortunately, no Full GC has occurred in this system after NewRatio has been set. Why? The reason is that most of the objects created from the system are destroyed soon, so the objects are not passed to the Old area but destroyed in the New area. In this status, it is not necessary to change other options. Just select the best value for NewRatio. So, how can we determine the best value? To get it, analyze the average response time of Minor GC for each NewRatio. The average response time of Minor GC for each option is as follows: NewRatio=2: 45 ms NewRatio=3: 34 ms NewRatio=4: 30 ms We have concluded that NewRatio=4 is the best option since the GC time is the shortest even though the New area size is the smallest. After applying the GC option, the server has no Full GC. For your information, the following is the result of executing jstat –gcutil some days after the JVM of the service had started. S0 S1 E O P YGC YGCT FGC FGCT GCT 8.61 0.00 30.67 24.62 22.38 2424 30.219 0 0.000 30.219 You many think that GC has not frequently occurred since the server has few requests. However, Full GC has not been executed while Minor GC has been executed 2,424 times. Example 2 This example is for Service A. We found that the JVM had not operated for a long time (8 seconds or more) periodically in the Application Performance Manager (APM) in the company. So we executed GC tuning. We were searching for the reason and found that it took a long time to execute Full GC, so we decided to execute GC tuning. As the starting stage of GC tuning, we added the -verbosegc option and the result is as follows. Figure 1: Duration Graph before GC Tuning. The above graph, which shows the duration, is one of the graphs that the HPJMeter automatically provides after analysis. The X-axis shows the time after the JVM has started and the Y-axis shows the response time of each GC. The green dots, the CMS, indicates the Full GC result, and the blue bots, Parallel Scavenge, indicates the Minor GC result. Previous I said that CMS GC would be the fastest. But the above result show that there were some cases which took up to 15 seconds. What has caused such result? Please remember what I said before: CMS gets slower when compaction is executed. In addition, the memory of the service has been set by using –Xms1g and –Xmx4g and the memory allocated was 4 GB. So I changed the GC type from CMS GC to Parallel GC. I changed the memory size to 2 GB and then set the NewRatio to 3. The result of jstat –gcutil after a few hours is as follows. S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 30.48 3.31 26.54 37.01 226 11.131 4 11.758 22.890 The Full GC time was faster, 3 seconds per one time, compared to 15 seconds for 4 GB. However, 3 seconds is still not so fast. So I created six cases as follows. Case 1: -XX:+UseParallelGC -Xms1536m -Xmx1536m -XX:NewRatio=2 Case 2: -XX:+UseParallelGC -Xms1536m -Xmx1536m -XX:NewRatio=3 Case 3: -XX:+UseParallelGC -Xms1g -Xmx1g -XX:NewRatio=3 Case 4: -XX:+UseParallelOldGC -Xms1536m -Xmx1536m -XX:NewRatio=2 Case 5: -XX:+UseParallelOldGC -Xms1536m -Xmx1536m -XX:NewRatio=3 Case 6: -XX:+UseParallelOldGC -Xms1g -Xmx1g -XX:NewRatio=3 Which one would be the fastest? The result showed that the smaller the memory size was, the better the result was. The following figure shows the duration graph of Case 6, which showed the highest GC improvement. The slowest response time was 1.7 seconds and the average had been changed to within 1 second, showing the improved result. Figure 2: Duration Graph after Applying Case 6. With the result, I changed all GC options of the service to Case 6. However, this change causes OutOfMemoryError at night each day. It is difficult to detail the reason here, but in short, batch data processing made a lack of JVM memory. The related problems are being cleared now. It is very dangerous to analyze the GC logs accumulated for a short time and to apply the result to all servers as executing GC tuning. Keep in mind that GC tuning can be executed without failure only when you analyze the service operation as well as the GC logs. We have reviewed two GC tuning examples to see how GC tuning is executed. As I mentioned, the GC option set in the examples can be identically set for the server which has the same CPU, OS version and JDK version with the service that executes the same functions. However, do not apply the option I did to your services in operation, since they may not work for you. Conclusion I execute GC tuning based on my experiences without executing heap dump and analyzing the memory in detail. Precise memory status analysis may draw the better GC tuning results. However, that kind of analysis may be helpful when the memory is used in the constant and routine pattern. But, if the service is heavily used and there are a lot of memory usage patterns, GC tuning based on reliable previous experience may be recommendable. I have executed the performance test by setting the G1 GC option to some servers, but have not applied to any operation server yet. The G1 GC option shows a faster result than any other GC types. However, it requires to upgrade to JDK 7. In addition, stability is still not guaranteed. Nobody knows if there is any critical bug or not. So the time is not yet ripe for applying the option. After JDK 7 is stabilized (this does not mean that it is not stable) and WAS is optimized for JDK 7, enabling stable application of G1 GC may finally work as expected and some day we may not need the GC tuning. For more detail on GC tuning, search on Slideshare.com for related materials. The most recommendable material is Everything I Ever Learned About JVM Performance Tuning @Twitter, written by Attila Szegedi, a Twitter engineer. Please take the time to read it.

October 10, 2012

by Esen Sagynov

· 70,277 Views · 2 Likes

No More Excuses to Use Null References in Java 8

Tony Hoare introduced null references in ALGOL W back in 1965 “simply because it was so easy to implement”. After many years he regretted his decision calling it "my billion dollar mistake". Unfortunately the vast majority of the languages created in the last decades have been built with the same wrong design decision so language designers and software engineers started to look for workarounds to avoid the infamous NullPointerException. Functional languages like Haskell or Scala structurally resolve this problem by wrapping the nullable values in an Option/Maybe monad. Other imperative languages like Groovy introduced a null-safe dereferencing operator (?. operator) to safely navigate values that could be potentially null. A similar feature has been proposed (and then discarded) as part of the project Coin in Java 7. Honestly I don't miss a null safe dereferencing operator in Java even because I can imagine that the majority of developers would start abusing it "just in case". Moreover, since the upcoming Java 8 will have lambda expressions, it will be straightforward to implement an Option monad that, as I hope to show in the remaining part of the post, is a far more powerful and flexible construct. I don't want to delve in category theory and explain what a monad is, even because there are already tons of very goodarticlesdoing this. My purpose is to quickly implement an Option monad using the Java 8 lambda expression syntax and then show how to use it with a very practical example. In Scala, a monad M is any class having the following 3 methods: def map[B](f: A => B): M[B] def flatMap[B](f: A => M[B]): M[B] def filter(p: A => Boolean): M[A] In particular you can think to an Option monad as a wrapper around a, possibly absent, value. So an Option of a generic type A could be define as it follows: import java.util.functions.Predicate; public abstract class Option { public static final None NONE = new None(); public abstract Option map(Func1 f); public abstract Option flatMap(Func1> f); public abstract Option filter(Predicate predicate); public abstract A getOrElse(A def); public static Some some(A value) { return new Some(value); } public static None none() { return NONE; } public static Option asOption(A value) { if (value == null) return none(); else return some(value); } } I also added some convenient factory methods for the Some and None concrete implementations of Option that I will implement later. Here Predicate is a single method interface defined in the new java.util.functions package: public interface Predicate { boolean test(T t); } that is used to determine if the input object matches a given criteria, while Func1 is another single method interface: public interface Func1 { R apply(A1 arg1); } that I defined to represent a more generic function of one argument of type A1 returning a result of type R. The abstract class Option has then two concrete implementations, one representing the absence of a value (something that we are used to wrongly model with the infamous null reference): public class None extends Option { None() { } @Override public Option map(Func1 f) { return NONE; } @Override public Option flatMap(Func1> f) { return NONE; } @Override public Option filter(Predicate predicate) { return NONE; } @Override public A getOrElse(A def) { return def; } } and the other wrapping an actually existing value: public class Some extends Option { private final A value; Some(A value) { this.value = value; } @Override public Option map(Func1 f) { return some(f.apply(value)); } @Override public Option flatMap(Func1> f) { return f.apply(value); } @Override public Option filter(Predicate predicate) { if (predicate.test(value)) return this; else return None.NONE; } @Override public A getOrElse(A def) { return value; } } Now, to try to put the Option at work with a concrete example, let's suppose we have a Map representing a set of named parameters with the corresponding values. We want to develop the method int readPositiveIntParam(Map params, String name) { // TODO ... } that, if the value associated with a given key is a String representing a positive integer returns that integer, but returns zero in all other case. In other words we want the following test to pass: @Test public void testMap() { Map param = new HashMap(); param.put("a", "5"); param.put("b", "true"); param.put("c", "-3"); // the value of the key "a" is a String representing a positive int so return it assertEquals(5, readPositiveIntParam(param, "a")); // returns zero since the value of the key "b" is not an int assertEquals(0, readPositiveIntParam(param, "b")); // returns zero since the value of the key "c" is an int but it is negative assertEquals(0, readPositiveIntParam(param, "c")); // returns zero since there is no key "d" in the map assertEquals(0, readPositiveIntParam(param, "d")); } If we couldn't rely on our Option we should accomplish this task with something similar to this: int readPositiveIntParam(Map params, String name) { String value = params.get(name); if (value == null) return 0; int i = 0; try { i = Integer.parseInt(value); } catch (NumberFormatException nfe) { } if (i < 0) return 0; return i; } too many conditional branches and returning points, isn't it? Using the Option monad we can achieve the same result with a single fluent statement: int readPositiveIntParam(Map params, String name) { return asOption(params.get(name)) .flatMap(FunctionUtils::stringToInt) .filter(i -> i > 0) .getOrElse(0); } where we used an helper static method FunctionUtils.stringToInt() as a function literal, with the :: syntax also introduced in Java 8, defined as: import static Option.*; public class FunctionUtils { public static Option stringToInt(String s) { try { return some(Integer.parseInt(s)); } catch (NumberFormatException nfe) { return none(); } } } This methods tries to convert a String in an int and, if it can't, it returns the None Option. Note that we could also define this behavior inline, while invoking the flatMap() method, using an anonymous lambda expression, but my advice is to develop a small library of utility functions, as I started doing here, in order to leverage the grater reusability allowed by functional programming. I think the comparison of the two readPositiveIntParam methods I provided illustrates well how the extensive use of the Option monad can finally allow us to write completely NullPointerException free software and, more in general, how a bigger employment of functional programming can dramatically reduce its cyclomatic complexity.

October 8, 2012

by Mario Fusco

· 76,303 Views · 1 Like

SQL Query Optimization and Normalization

Explore SQL query optimization and normalization.

October 4, 2012

by Michael Georgiou

· 37,812 Views · 2 Likes

OutOfMemoryError: Unable to Create New Native Thread – Problem Demystified

As you may have seen from my previous tutorials and case studies, Java Heap Space OutOfMemoryError problems can be complex to pinpoint and resolve. One of the common problems I have observed from Java EE production systems is OutOfMemoryError: unable to create new native thread; error thrown when the HotSpot JVM is unable to further create a new Java thread. This article will revisit this HotSpot VM error and provide you with recommendations and resolution strategies. If you are not familiar with the HotSpot JVM, I first recommend that you look at a high level view of its internal HotSpot JVM memory spaces. This knowledge is important in order for you to understand OutOfMemoryError problems related to the native (C-Heap) memory space. OutOfMemoryError: unable to create new native thread – what is it? Let’s start with a basic explanation. This HotSpot JVM error is thrown when the internal JVM native code is unable to create a new Java thread. More precisely, it means that the JVM native code was unable to create a new “native” thread from the OS (Solaris, Linux, MAC, Windows...). We can clearly see this logic from the OpenJDK 1.6 and 1.7 implementations as per below: Unfortunately at this point you won’t get more detail than this error, with no indication of why the JVM is unable to create a new thread from the OS… HotSpot JVM: 32-bit or 64-bit? Before you go any further in the analysis, one fundamental fact that you must determine from your Java or Java EE environment is which version of HotSpot VM you are using e.g. 32-bit or 64-bit. Why is it so important? What you will learn shortly is that this JVM problem is very often related to native memory depletion; either at the JVM process or OS level. For now please keep in mind that: A 32-bit JVM process is in theory allowed to grow up to 4 GB (even much lower on some older 32-bit Windows versions). For a 32-bit JVM process, the C-Heap is in a race with the Java Heap and PermGen space e.g. C-Heap capacity = 2-4 GB – Java Heap size (-Xms, -Xmx) – PermGen size (-XX:MaxPermSize) A 64-bit JVM process is in theory allowed to use most of the OS virtual memory available or up to 16 EB (16 million TB) As you can see, if you allocate a large Java Heap (2 GB+) for a 32-bit JVM process, the native memory space capacity will be reduced automatically, opening the door for JVM native memory allocation failures. For a 64-bit JVM process, your main concern, from a JVM C-Heap perspective, is the capacity and availability of the OS physical, virtual and swap memory. OK great but how does native memory affect Java threads creation? Now back to our primary problem. Another fundamental JVM aspect to understand is that Java threads created from the JVM requires native memory from the OS. You should now start to understand the source of your problem… The high level thread creation process is as per below: A new Java thread is requested from the Java program & JDK The JVM native code then attempt to create a new native thread from the OS The OS then attempts to create a new native thread as per attributes which include the thread stack size. Native memory is then allocated (reserved) from the OS to the Java process native memory space; assuming the process has enough address space (e.g. 32-bit process) to honour the request The OS will refuse any further native thread & memory allocation if the 32-bit Java process size has depleted its memory address space e.g. 2 GB, 3 GB or 4 GB process size limit The OS will also refuse any further Thread & native memory allocation if the virtual memory of the OS is depleted (including Solaris swap space depletion since thread access to the stack can generate a SIGBUS error, crashing the JVM * http://bugs.sun.com/view_bug.do?bug_id=6302804 In summary: Java threads creation require native memory available from the OS; for both 32-bit & 64-bit JVM processes For a 32-bit JVM, Java thread creation also requires memory available from the C-Heap or process address space Problem diagnostic Now that you understand native memory and JVM thread creation a little better, is it now time to look at your problem. As a starting point, I suggest that your follow the analysis approach below: Determine if you are using HotSpot 32-bit or 64-bit JVM When problem is observed, take a JVM Thread Dump and determine how many Threads are active Monitor closely the Java process size utilization before and during the OOM problem replication Monitor closely the OS virtual memory utilization before and during the OOM problem replication; including the swap memory space utilization if using Solaris OS Proper data gathering as per above will allow you to collect the proper data points, allowing you to perform the first level of investigation. The next step will be to look at the possible problem patterns and determine which one is applicable for your problem case. Problem pattern #1 – C-Heap depletion (32-bit JVM) From my experience, OutOfMemoryError: unable to create new native thread is quite common for 32-bit JVM processes. This problem is often observed when too many threads are created vs. C-Heap capacity. JVM Thread Dump analysis and Java process size monitoring will allow you to determine if this is the cause. Problem pattern #2 – OS virtual memory depletion (64-bit JVM) In this scenario, the OS virtual memory is fully depleted. This could be due to a few 64-bit JVM processes taking lot memory e.g. 10 GB+ and / or other high memory footprint rogue processes. Again, Java process size & OS virtual memory monitoring will allow you to determine if this is the cause. Problem pattern #3 – OS virtual memory depletion (32-bit JVM) The third scenario is less frequent but can still be observed. The diagnostic can be a bit more complex but the key analysis point will be to determine which processes are causing a full OS virtual memory depletion. Your 32-bit JVM processes could be either the source or the victim such as rogue processes using most of the OS virtual memory and preventing your 32-bit JVM processes to reserve more native memory for its thread creation process. Please note that this problem can also manifest itself as a full JVM crash (as per below sample) when running out of OS virtual memory or swap space on Solaris. # # A fatal error has been detected by the Java Runtime Environment: # # java.lang.OutOfMemoryError: requested 32756 bytes for ChunkPool::allocate. Out of swap space? # # Internal Error (allocation.cpp:166), pid=2290, tid=27 # Error: ChunkPool::allocate # # JRE version: 6.0_24-b07 # Java VM: Java HotSpot(TM) Server VM (19.1-b02 mixed mode solaris-sparc ) # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --------------- T H R E A D --------------- Current thread (0x003fa800): JavaThread "CompilerThread1" daemon [_thread_in_native, id=27, stack(0x65380000,0x65400000)] Stack: [0x65380000,0x65400000], sp=0x653fd758, free space=501k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) ……………… Native memory depletion: symptom or root cause? You now understand your problem and know which problem pattern you are dealing with. You are now ready to provide recommendations to address the problem…are you? Your work is not done yet, please keep in mind that this JVM OOM event is often just a “symptom” of the actual root cause of the problem. The root cause is typically much deeper so before providing recommendations to your client I recommend that you really perform deeper analysis. The last thing you want to do is to simply address and mask the symptoms. Solutions such as increasing OS physical / virtual memory or upgrading all your JVM processes to 64-bit should only be considered once you have a good view on the root cause and production environment capacity requirements. The next fundamental question to answer is how many threads were active at the time of the OutOfMemoryError? In my experience with Java EE production systems, the most common root cause is actually the application and / or Java EE container attempting to create too many threads at a given time when facing non happy paths such as thread stuck in a remote IO call, thread race conditions etc. In this scenario, the Java EE container can start creating too many threads when attempting to honour incoming client requests, leading to increase pressure point on the C-Heap and native memory allocation. Bottom line, before blaming the JVM, please perform your due diligence and determine if you are dealing with an application or Java EE container thread tuning problem as the root cause. Once you understand and address the root cause (source of thread creations), you can then work on tuning your JVM and OS memory capacity in order to make it more fault tolerant and better “survive” these sudden thread surge scenarios. Recommendations: First perform a JVM Thread Dump analysis and determine the source of all the active threads vs. an established baseline. Determine what is causing your Java application or Java EE container to create so many threads at the time of the failure Please ensure that your monitoring tools closely monitor both your Java VM processes size & OS virtual memory. This crucial data will be required in order to perform a full root cause analysis Do not assume that you are dealing with an OS memory capacity problem. Look at all running processes and determine if your JVM processes are actually the source of the problem or victim of other processes consuming all the virtual memory Revisit your Java EE container thread configuration & JVM thread stack size. Determine if the Java EE container is allowed to create more threads than your JVM process and / or OS can handle Determine if the Java Heap size of your 32-bit JVM is too large, preventing the JVM to create enough threads to fulfill your client requests. In this scenario, you will have to consider reducing your Java Heap size (if possible), vertical scaling or upgrade to a 64-bit JVM Capacity planning analysis to the rescue As you may have seen from my past article on the Top 10 Causes of Java EE Enterprise Performance Problems, lack of capacity planning analysis is often the source of the problem. Any comprehensive load and performance testing exercise should also properly determine the Java EE container threads, JVM & OS native memory requirement for your production environment; including impact measurements of "non-happy" paths. This approach will allow your production environment to stay away from this type of problem and lead to better system scalability and stability in the long run. Please provide any comment and share your experience with JVM native thread troubleshooting.

October 4, 2012

by Pierre - Hugues Charbonneau

· 70,868 Views · 2 Likes

Does Immutability Really Mean Thread Safety?

I am going to try to define immutability and its relation to thread safety.

October 3, 2012

by Thibault Delor

· 57,692 Views · 4 Likes

Difference Between Mysql Replace and Insert on Duplicate Key Update

While me and my friend roshan recently working as a support developers at Australia famous e-commerce website. recently roshan as assign a new bug in this site it’s related to the product synchronize process in the ware house product table and the e-commerce site, his main task was check the quickly the site product table and check with ware house product table product if the either insert new data into a site database, or update an existing record on the site database, Of course, doing a lookup to see if the record exists already and then either updating or inserting would be an expensive process (existing items are defined either by a unique key or a primary key). Luckily, MySQL offers two functions to combat this (each with two very different approaches). 1. REPLACE = DELETE+INSERT 2. INSERT ON DUPLICATE KEY UPDATE = UPDATE + INSERT 1 . REPLACE This syntax is the same as the INSERT function. When dealing with a record with a unique or primary key, REPLACE will either do a DELETE and then an INSERT, or just an INSERT if use this this function will cause a record to be removed, and inserted at the end. It will cause the indexing to get broken apart, decreasing the efficiency of the table. If, however REPLACE INTO ds_product SET pID = 3112, catID = 231, uniCost = 232.50, salePrice = 250.23; 2. ON DUPLICATE KEY UPDATE ON DUPLICATE KEY UPDATE clause to the INSERT function. This one actively hunts down an existing record in the table which has the same UNIQUE or PRIMARY KEY as the one we’re trying to update. If it finds an existing one, you specify a clause for which column(s) you would like to UPDATE. Otherwise, it will do a normal INSERT. INSERT INTO ds_product SET pID = 3112, catID = 231, uniCost = 232.50, salePrice = 250.23, ON DUPLICATE KEY UPDATE uniCost = 232.50, salePrice = 250.23; This should be helpful when trying to create database queries that add and update information, without having to go through the extra step. Thanks Have a Nice Day

October 3, 2012

by Prathap Givantha Kalansuriya

· 14,416 Views

Parsing a Connection String With 'Sprache' C# Parser

Sprache is a very cool lightweight parser library for C#. Today I was experimenting with parsing EasyNetQ connection strings, so I thought I’d have a go at getting Sprache to do it. An EasyNetQ connection string is a list of key-value pairs like this: key1=value1;key2=value2;key3=value3 The motivation for looking at something more sophisticated than simply chopping strings based on delimiters, is that I’m thinking of having more complex values that would themselves need parsing. But that’s for the future, today I’m just going to parse a simple connection string where the values can be strings or numbers (ushort to be exact). So, I want to parse a connection string that looks like this: virtualHost=Copa;username=Copa;host=192.168.1.1;password=abc_xyz;port=12345;requestedHeartbeat=3 … into a strongly typed structure like this: public class ConnectionConfiguration : IConnectionConfiguration { public string Host { get; set; } public ushort Port { get; set; } public string VirtualHost { get; set; } public string UserName { get; set; } public string Password { get; set; } public ushort RequestedHeartbeat { get; set; } } I want it to be as easy as possible to add new connection string items. First let’s define a name for a function that updates a ConnectionConfiguration. A uncommonly used version of the ‘using’ statement allows us to give a short name to a complex type: using UpdateConfiguration = Func; Now lets define a little function that creates a Sprache parser for a key value pair. We supply the key and a parser for the value and get back a parser that can update the ConnectionConfiguration. public static Parser BuildKeyValueParser( string keyName, Parser valueParser, Expression> getter) { return from key in Parse.String(keyName).Token() from separator in Parse.Char('=') from value in valueParser select (Func)(c => { CreateSetter(getter)(c, value); return c; }); } The CreateSetter is a little function that turns a property expression (like x => x.Name) into an Action. Next let’s define parsers for string and number values: public static Parser Text = Parse.CharExcept(';').Many().Text(); public static Parser Number = Parse.Number.Select(ushort.Parse); Now we can chain a series of BuildKeyValueParser invocations and Or them together so that we can parse any of our expected key-values: public static Parser Part = new List> { BuildKeyValueParser("host", Text, c => c.Host), BuildKeyValueParser("port", Number, c => c.Port), BuildKeyValueParser("virtualHost", Text, c => c.VirtualHost), BuildKeyValueParser("requestedHeartbeat", Number, c => c.RequestedHeartbeat), BuildKeyValueParser("username", Text, c => c.UserName), BuildKeyValueParser("password", Text, c => c.Password), }.Aggregate((a, b) => a.Or(b)); Each invocation of BuildKeyValueParser defines an expected key-value pair of our connection string. We just give the key name, the parser that understands the value, and the property on ConnectionConfiguration that we want to update. In effect we’ve defined a little DSL for connection strings. If I want to add a new connection string value, I simply add a new property to ConnectionConfiguration and a single line to the above code. Now lets define a parser for the entire string, by saying that we’ll parse any number of key-value parts: public static Parser> ConnectionStringBuilder = from first in Part from rest in Parse.Char(';').Then(_ => Part).Many() select Cons(first, rest); All we have to do now is parse the connection string and apply the chain of update functions to a ConnectionConfiguration instance: public IConnectionConfiguration Parse(string connectionString) { var updater = ConnectionStringGrammar.ConnectionStringBuilder.Parse(connectionString); return updater.Aggregate(new ConnectionConfiguration(), (current, updateFunction) => updateFunction(current)); } We get lots of nice things out of the box with Sprache, one of the best is the excellent error messages: Parsing failure: unexpected 'x'; expected host or port or virtualHost or requestedHeartbeat or username or password (Line 1, Column 1). Sprache is really nice for this kind of task. I’d recommend checking it out.

October 3, 2012

by Mike Hadlow

· 7,572 Views

VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not)

(Disclaimer: Based on personal experience and little research, the information might be incomplete.) VisualVM is a great tool for monitoring JVM (5.0+) regarding memory usage, threads, GC, MBeans etc. Let’s see how to use it over SSH to monitor (or even profile, using its sampler) a remote JVM either with JMX or without it. This post is based on Sun JVM 1.6 running on Ubuntu 10 and VisualVM 1.3.3. 1. Communication: JStatD vs. JMX There are two modes of communication between VisualVM and the JVM: Either over the Java Management Extensions (JMX) protocol or over jstatd. jstatd jstatd is a daemon that is distributed with JDK. You start it from the command line (it’s likely necessary to run it as the user running the target JVM or as root) on the target machine and VisualVM will contact it to fetch information about the remote JVMs. Advantages: Can connect to a running JVM, no need to start it with special parameters Disadvantages: Much more limited monitoring capabilities (f.ex. no CPU usage monitoring, not possible to run the Sampler and/or take thread dumps). Ex.: bash> cat jstatd.all.policy grant codebase "file:${java.home}/../lib/tools.jar" { permission java.security.AllPermission; } bash> sudo /path/to/JDK/bin/jstatd -J-Djava.security.policy=jstatd.all.policy # You can specify port with -p number and get more info with -J-Djava.rmi.server.logCalls=true Note: Replace “${java.home}/../lib/tools.jar” with the absolute “/path/to/jdk/lib/tools.jar” if you have only copied but not installed the JDK. If you get the failure Could not create remote object access denied (java.util.PropertyPermission java.rmi.server.ignoreSubClasses write) java.security.AccessControlException: access denied (java.util.PropertyPermission java.rmi.server.ignoreSubClasses write) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) then jstatd likely hasn’t been started with the right java.security.policy file (try to provide fully qualified path to it). More info about VisualVM and jstatd from Oracle. JMX Advantages: Using JMX will give you the full power of VisualVM. Disadvantages: Need to start the JVM with some system properties. You will generally want to use something like the following properties when starting the target JVM (though you could also enable SSL and/or require username and password): yourJavaCommand... -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=1098 See Remote JMX Connections. 2. Security: SSH The easiest way to connect to the remote JMX or jstatd over ssh is to use a SOCKS proxy, which standard ssh clients can set up. 2.1 Set Up the SSH Tunnel With SOCKS ssh -v -D 9696 my_server.example.com 2.2 Configure VisualVM to Use the Proxy Tools->Options->Network – Manual Proxy Settings – check it and configure SOCKS Proxy at localhost and port 9696 2.3 Connect VisualVM to the Target File -> Add Remote Host… – type the IP or hostname of the remote machine JStatD Connection You should see logs both in the ssh window (thanks to its “-v”, f.ex. “debug1: Connection to port 9696 forwarding to socks port 0 requested.” and “debug1: channel 3: free: direct-tcpip: listening port 9696 for 10.2.47.71 port 1099, connect from 127.0.0.1 port 61262, nchannels 6“) and in the console where you started jstatd (many, f.ex. “FINER: RMI TCP Connection(23)-10.2.47.71: …“) Wait few minutes after having added the remote host, you should then see the JVMs running there. Available stats: JVM arguments, Monitor: Heap, classes, threads monitoring (but not CPU). Sampler and MBeans require JMX. JMX Right-click on the remote host you have added and select Add JMX Connection …, type the JMX port you have chosen. You should see similar logs as with jstatd. Available stats: Also CPU usage, system properties, detailed Threads report with access to stack traces, CPU sampling (memory sampling not supported). Note: Sampler vs. Profiler The VisualVM’s Sampler excludes time spent in Object.wait and Thread.sleep (f.ex. waiting on I/O). Use the NetBeans Profiler to profile or sample a remote application if you want to have more control or want the possibility to include Object.wait and Thread.sleep time. It requires its Remote Pack (a java agent, i.e. a JAR file) to be in the target JVM (NetBeans’ Attach Wizard can generate the remote pack for you in step 4, Manual integration, and show you the options to pass to the target JVM to use it). You can run the profiler over SSH by forwarding its default port (5140) and attaching to the forwarded port at localhost. (NetBeans version 7.1.1.)

October 3, 2012

by Jakub Holý

· 100,914 Views · 2 Likes

Using Web Workers to Improve Performance of Image Manipulation

Today I would like to talk about picture manipulation. Not the Direct2D way I used in my previous article but the pure JavaScript way. The test case The test application is simple. On the left a picture to manipulate and on the right the updated result (a sepia tone effect is applied): The page itself is simple and is described as follow: The overall process to apply a sepia tone effect requires you to compute a new value for every pixel of the picture: finalRed= (red * 0.393) + (green * 0.769) + (blue * 0.189); finalGreen = (red * 0.349) + (green * 0.686) + (blue * 0.168); finalBlue= (red * 0.272) + (green * 0.534) + (blue * 0.131); To make it more realistic I added a bit of random in the formula so the final JavaScript code to apply to every pixel is: function noise() { return Math.random() * 0.5 + 0.5; }; function colorDistance(scale, dest, src) { return (scale * dest + (1 - scale) * src); }; var processSepia = function (pixel) { pixel.r = colorDistance(noise(), (pixel.r * 0.393) + (pixel.g * 0.769) + (pixel.b * 0.189), pixel.r); pixel.g = colorDistance(noise(), (pixel.r * 0.349) + (pixel.g * 0.686) + (pixel.b * 0.168), pixel.g); pixel.b = colorDistance(noise(), (pixel.r * 0.272) + (pixel.g * 0.534) + (pixel.b * 0.131), pixel.b); }; Brutal force Obviously the very first solution can consist to the use of a brutal approach with a function that apply the previous code on every pixel. To get access to the pixels, you can use the canvas context with the following code: var source = document.getElementById("source"); source.onload = function () { var canvas = document.getElementById("target"); canvas.width = source.clientWidth; canvas.height = source.clientHeight; tempContext.drawImage(source, 0, 0, canvas.width, canvas.height); var canvasData = tempContext.getImageData(0, 0, canvas.width, canvas.height); var binaryData = canvasData.data; } The binaryData object contains an array of every pixel and can be used to quickly read or write data directly to the canvas. So with this in mind, we can apply the whole effect with the following code: var source = document.getElementById("source"); source.onload = function () { var start = new Date(); var canvas = document.getElementById("target"); canvas.width = source.clientWidth; canvas.height = source.clientHeight; if (!canvas.getContext) { log.innerText = "Canvas not supported. Please install a HTML5 compatible browser."; return; } var tempContext = canvas.getContext("2d"); var len = canvas.width * canvas.height * 4; tempContext.drawImage(source, 0, 0, canvas.width, canvas.height); var canvasData = tempContext.getImageData(0, 0, canvas.width, canvas.height); var binaryData = canvasData.data; processSepia(binaryData, len); tempContext.putImageData(canvasData, 0, 0); var diff = new Date() - start; log.innerText = "Process done in " + diff + " ms (no web workers)"; } The processSepia function is just an variation of the previous one: var processSepia = function (binaryData, l) { for (var i = 0; i < l; i += 4) { var r = binaryData[i]; var g = binaryData[i + 1]; var b = binaryData[i + 2]; binaryData[i] = colorDistance(noise(), (r * 0.393) + (g * 0.769) + (b * 0.189), r); binaryData[i + 1] = colorDistance(noise(), (r * 0.349) + (g * 0.686) + (b * 0.168), g); binaryData[i + 2] = colorDistance(noise(), (r * 0.272) + (g * 0.534) + (b * 0.131), b); } }; With this solution, on my Intel Extreme processor (12 cores), the main process takes 150ms and obviously only use one processor: Adding web workers The best thing you can do when dealing with SIMD (single instruction multiple data) is to use a parallelization approach. Especially when you want to work with low-end hardware (such as phone devices) with limited resources. With JavaScript, to enjoy the power of parallelization, you have to use the Web Workers (my friend David Rousset wrote an excellent paper on this subject: http://blogs.msdn.com/b/davrous/archive/2011/07/15/introduction-to-the-html5-web-workers-the-javascript-multithreading-approach.aspx). Picture processing is a really good candidate for parallelization because (in the case of sepia tone) every processing is independent and so the following approach is possible: To do so, first of all you have to create a tools.js file to be used as a reference by other scripts: function noise() { return Math.random() * 0.5 + 0.5; }; function colorDistance(scale, dest, src) { return (scale * dest + (1 - scale) * src); }; var processSepia = function (binaryData, l) { for (var i = 0; i < l; i += 4) { var r = binaryData[i]; var g = binaryData[i + 1]; var b = binaryData[i + 2]; binaryData[i] = colorDistance(noise(), (r * 0.393) + (g * 0.769) + (b * 0.189), r); binaryData[i + 1] = colorDistance(noise(), (r * 0.349) + (g * 0.686) + (b * 0.168), g); binaryData[i + 2] = colorDistance(noise(), (r * 0.272) + (g * 0.534) + (b * 0.131), b); } }; The processSepia function will be applied to every bunch of the picture by a dedicated worker. The code of each worker is included in a pictureprocessor.js file: importScripts("tools.js"); self.onmessage = function (e) { var canvasData = e.data.data; var binaryData = canvasData.data; var l = e.data.length; var index = e.data.index; processSepia(binaryData, l); self.postMessage({ result: canvasData, index: index }); }; The main point here is that the canvas data (actually a part of it according to the current block to process) is cloned by JavaScript and passed to the worker. The worker is not working on the initial source but on a copy of it (using a specified algorithm: the structured clone algorithm). The copy itself is really quick and limited to a specific part of the picture. The main client page (default.js) has to create 4 workers and give them the right part of the picture. Then every worker will callback a function in the main thread using the messaging API (postMessage / onmessage) to give back the result: var source = document.getElementById("source"); source.onload = function () { var start = new Date(); var canvas = document.getElementById("target"); canvas.width = source.clientWidth; canvas.height = source.clientHeight; // Testing canvas support if (!canvas.getContext) { log.innerText = "Canvas not supported. Please install a HTML5 compatible browser."; return; } var tempContext = canvas.getContext("2d"); var len = canvas.width * canvas.height * 4; // Drawing the source image into the target canvas tempContext.drawImage(source, 0, 0, canvas.width, canvas.height); // If workers are not supported if (!window.Worker) { // Getting all the canvas data var canvasData = tempContext.getImageData(0, 0, canvas.width, canvas.height); var binaryData = canvasData.data; // Processing all the pixel with the main thread processSepia(binaryData, len); // Copying back canvas data to canvas tempContext.putImageData(canvasData, 0, 0); var diff = new Date() - start; log.innerText = "Process done in " + diff + " ms (no web workers)"; return; } // Let say we want to use 4 workers var workersCount = 4; var finished = 0; var segmentLength = len / workersCount; // This is the length of array sent to the worker var blockSize = canvas.height / workersCount; // Height of the picture chunck for every worker // Function called when a job is finished var onWorkEnded = function (e) { // Data is retrieved using a memory clone operation var canvasData = e.data.result; var index = e.data.index; // Copying back canvas data to canvas tempContext.putImageData(canvasData, 0, blockSize * index); finished++; if (finished == workersCount) { var diff = new Date() - start; log.innerText = "Process done in " + diff + " ms"; } }; // Launching every worker for (var index = 0; index < workersCount; index++) { var worker = new Worker("pictureProcessor.js"); worker.onmessage = onWorkEnded; // Getting the picture var canvasData = tempContext.getImageData(0, blockSize * index, canvas.width, blockSize); // Sending canvas data to the worker using a copy memory operation worker.postMessage({ data: canvasData, index: index, length: segmentLength }); } }; Using this technique, the complete process lasts only 80ms (from 150ms) on my computer and obviously uses 4 processors: On my low-end hardware (based on dual core system), the process falls to 500ms (from 900ms). The final code is available here: http://www.catuhe.com/msdn/pictureworkers.zip And the live version is right there: http://www.catuhe.com/msdn/workers/default.html (For comparison, the no web workers version: http://www.catuhe.com/msdn/workers/defaultnoworker.html) A important point to note is that on recent computers the difference can be thin or even in favor of the code without workers. The overhead of the memory copy must be balanced by a complex code used by the workers. The sepia tone could not be enough in some cases. However, the web workers will really be useful on low-end hardware. Porting to Windows 8 Finally I was not able to resist to the pleasure of porting my JavaScript code to create a Windows 8 application. It took me about 10 minutes to create a blank JavaScript project and copy/paste the JavaScript code inside (feel the power of native JavaScript code for Windows 8!) So feel free to grab the Windows 8 app code here: http://www.catuhe.com/msdn/Win8PictureWorkers.zip

October 2, 2012

by David Catuhe

· 8,076 Views

CSS3 Games Collection

have you ever thought about creating your own web games? i'm sure that most of you have already heard about the newest technologies like html5, canvas, webgl, and node.js. but i think that before you start working with these new technologies, you should start developing games with the simplest dom (like html, css, and javascript). i would like to provide you with a collection of such games, so you will be able to investigate and try them out. some of them even work without javascript! 1. whack-a-rat – css only game 2. survivor (1982 commodore 64 game remake) 3. sumon 4. 3d – css puzzle 5. duck hunt 6. dino pairs game 7. cops and robbers – css puzzle 8. cascading cube racer 9. css maze puzzle 10. one-of-a-kind css/js-based game portfolio plus, you can find tutorial about making this portfolio here 11. anigma 12. ninja jarimaru conclusion i hope that our new collection of css3 games was interesting for you. good luck!

September 29, 2012

by Andrei Prikaznov

· 65,751 Views · 4 Likes

Fixing Common Java Security Code Violations in Sonar

This article aims to show you how to quickly fix the most common java security code violations. It assumes that you are familiar with the concept of code rules and violations and how Sonar reports on them. However, if you haven’t heard these terms before then you might take a look at Sonar Concepts or the forthcoming book about Sonar for a more detailed explanation. To get an idea, during Sonar analysis, your project is scanned by many tools to ensure that the source code conforms with the rules you’ve created in your quality profile. Whenever a rule is violated… well a violation is raised. With Sonar you can track these violations with violations drilldown view or in the source code editor. There are hundreds of rules, categorized based on their importance. Ill try, in future posts, to cover as many as I can but for now let’s take a look at some common security rules / violations. There are two pairs of rules (all of them are ranked as critical in Sonar ) we are going to examine right now. 1. Array is Stored Directly ( PMD ) and Method returns internal array ( PMD ) These violations appear in the cases when an internal Array is stored or returned directly from a method. The following example illustrates a simple class that violates these rules. public class CalendarYear { private String[] months; public String[] getMonths() { return months; } public void setMonths(String[] months) { this.months = months; } } To eliminate them you have to clone the Array before storing / returning it as shown in the following class implementation, so noone can modify or get the original data of your class but only a copy of them. public class CalendarYear { private String[] months; public String[] getMonths() { return months.clone(); } public void setMonths(String[] months) { this.months = months.clone(); } } 2. Nonconstant string passed to execute method on an SQL statement (findbugs) and A prepared statement is generated from a nonconstant String (findbugs) Both rules are related to database access when using JDBC libraries. Generally there are two ways to execute an SQL Commants via JDBC connection : Statement and PreparedStatement. There is a lot of discussion about pros and cons but it’s out of the scope of this post. Let’s see how the first violation is raised based on the following source code snippet. Statement stmt = conn.createStatement(); String sqlCommand = "Select * FROM customers WHERE name = '" + custName + "'"; stmt.execute(sqlCommand); You’ve already noticed that the sqlcommand parameter passed to execute method is dynamically created during run-time which is not acceptable by this rule. Similar situations causes the second violation. String sqlCommand = "insert into customers (id, name) values (?, ?)"; Statement stmt = conn.prepareStatement(sqlCommand); You can overcome this problems with three different ways. You can either use StringBuilder or String.format method to create the values of the string variables. If applicable you can define the SQL Commands as Constant in class declaration, but it’s only for the case where the SQL command is not required to be changed in runtime. Let’s re-write the first code snippet using StringBuilder Statement stmt = conn.createStatement(); stmt.execute(new StringBuilder("Select FROM customers WHERE name = '"). append(custName). append("'").toString()); and using String.format Statement stmt = conn.createStatement(); String sqlCommand = String.format("Select * from customers where name = '%s'", custName); stmt.execute(sqlCommand); For the second example you can just declare the sqlCommand as following private static final SQLCOMMAND = insert into customers (id, name) values (?, ?)"; There are more security rules such as the blocker Hardcoded constant database password but I assume that nobody is still hardcodes passwords in source code files… In following articles I’m going to show you how to adhere to performance and bad practice rules. Until then I’m waiting for your comments or suggestions.

September 26, 2012

by Patroklos Papapetrou

· 27,126 Views

Choosing Static vs. Dynamic Languages for Your Startup

Everyone is thinking why in the world would anyone pick static, when you can be dynamic? Usually the thought process is, "what language am I most proficient in, that can do the job." Totally not a bad way to go about it. Now does this choice affect anything else? Testing? Speed of development? Robustness? Dynamic vs. Static Dynamic languages are languages that don’t necessarily need variables to be declared before they are used. Examples of dynamic languages are Python, Ruby, and PHP. So in dynamic languages the following is possible: num = 10 We have successfully assigned a value to variable without declaring it before hand. Simple enough, try doing this in Java (you can’t). This can *increase* development speed, without having to write boilerplate code. This can somewhat be a double edge sword, since dynamic languages types are checked during runtime, there is no way to tell if there is a bug in code until it is run. I know you can test, but you can’t test for everything. You can’t test for everything. Here is an example albeit trivial. def get_first_problem(problems): for problem in problems: problam = problem + 1 return problam Now if you are raging to some serious dubstep, its easy enough to miss that small typo, you go screw it and do it live, and deploy to production. Python will simply create the new variable and not a single thing will be said. Only you can stop bugs in production! Static languages are languages that variables need to be declared before use and type checking is done at compile time. Examples of static languages include Java, C, and C++. So in static languages the following is enforced static int awesomeNumber; awesomeNumber = 10; Many argue this increases robustness as well as decrease chances of Runtime Errors. Since the compiler will catch those horrible horrible mistakes you made throughout your code. Your methods contracts are tighter, downside to this is crap ton of boilerplate code. Weak and Strong Typing can be often be confused with dynamic and static languages. Weak typed languages can lead to philosophical questions like what does the number 2 added to the word ‘two’ give you? Things like this are possible with a weak typed language. a = 2 b = "2" concatenate(a, b) // Returns "22" add(a, b) // Returns 4 Traditionally languages may place restriction on what transaction may occur for example in a strong typed language adding a string and integer will result in a type error as shown below. >>> a = 10 >>> b = 'ten' >>> a + b Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'int' and 'str' >>> Conclusion Regardless of where you land on this discussion, claiming one is better than the other would lead to flame war, but there are places where each is strong. Dynamic languages are good for fast quick development cycles and prototyping, while static languages are better suited to longer development cycles where trivial bugs could be extremely costly (telecommunication systems, air traffic control). For example if some giant company called Moo Corp. spent millions of dollars on QA and Testing and a bug somehow gets into the field, to fix it would mean another round of testing. When sitting in that chair the choice is clear static languages FTW, its a hard job but someone has to milk the cows. Test, test, and test. Just a little food for thought, for when you are starting your next project. You never know what limitations you maybe placing on yourself and your team. What do you do consider when selecting a programming language for a project?

September 25, 2012

by Mahdi Yusuf

· 25,024 Views

Introducing the New Date and Time API for JDK 8

Date and time handling in Java is a somewhat tricky part when you are new to the language. Time can be accessed via the static method System.currentTimeMillis() which returns the current time in milliseconds from January 1st 1970. If you prefer to work with Objects instead you can use java.util.Date, a class whose methods are mostly deprecated in recent versions of Java. To work with time offsets, say add one month to a date, there is java.util.GregorianCalendar. All in all, those methods described here are not very convenient to work with. Java 7 and below are lacking a good date and time API. The Joda Time library is a common drop-in if you need to work with date/time. With JSR 310 (Java Specification Request) this is about to change. JSR 310 adds a new date, time and calendar API to Java 8. The ThreeTen project provides a reference implementation to this new API and can already be utilized in current Java projects (I however recommend not to do this for production). As the README states: The API is currently considered usable and accurate, yet incomplete and subject to change. If you use this API you must be able to handle incompatible changes in later versions. Building ThreeTen Building the ThreeTen project is relatively easy. It requires both Git and Ant to be installed on your system. git clone git://github.com/ThreeTen/threeten.git cd threeten ant This will first fetch the most recent version of ThreeTen and then start the build process using ant. Note that building the library also requires either OpenJDK 1.6 or Oracle JDK 1.6. JSR 310 The new API specifies a number of new classes which are divided into the categories of continuous and human time. Continuous time is based on Unix time and is represented as a single incrementing number. Class Description Instant A point in time in nanoseconds from January 1st 1970 Duration An amount of time measured in nanoseconds Human time is based on fields that we use in our daily lifes such as day, hour, minute and second. It is represented by a group of classes, some of which we will discuss in this article. Class Description LocalDate a date, without time of day, offset or zone LocalTime the time of day, without date, offset or zone LocalDateTime the date and time, without offset or zone OffsetDate a date with an offset such as +02:00, without time of day or zone OffsetTime the time of day with an offset such as +02:00, without date or zone OffsetDateTime the date and time with an offset such as +02:00, without a zone ZonedDateTime the date and time with a time zone and offset YearMonth a year and month MonthDay month and day Year/MonthOfDay/DayOfWeek/... classes for the important fields DateTimeFields stores a map of field-value pairs which may be invalid Calendrical access to the low-level API Period a descriptive amount of time, such as "2 months and 3 days" In addition to the above classes three support classes have been implemented. The Clock class wraps the current time and date, ZoneOffset is a time offset from UTC and ZoneId defines a time zone such as 'Australia/Brisbane'. Using the API Getting the current time The current time is represented by the Clock class. The class is abstract, so you can not create instances of it. The systemUTC() static method will return the current time based on your system clock and set to UTC. import javax.time.Clock; Clock clock = Clock.systemUTC(); To use the default time zone on your system there also is systemDefaultZone(). Clock clock = Clock.systemDefaultZone(); The millis() method can then be used to access the current time in milliseconds from January 1st, 1970. This shows, that the Clock class and all subclasses are wrapped around System.currentTimeMillis(). Clock clock = Clock.systemDefaultZone(); long time = clock.millis(); Working with time zones To work with time zones you need to import the ZoneId class. The class provides a method to get the default system time zone: import javax.time.ZoneId; import javax.time.Clock; ZoneId zone = ZoneId.systemDefault(); Clock clock = Clock.system(zone); As seen above, the ZoneId can then be used to get an instance of a Clock with that time zone. Other time zones can be accessed by their name, e.g.: ZoneId zone = ZoneId.of("Europe/Berlin"); Clock clock = Clock.system(zone); Getting human date and time Working with a time represented in a single long variable is not what we wanted. We want to work with objects that represent human readable time. The LocalDate, LocalTime and LocalDateTime classes do just that. import javax.time.LocalDate; // The now() method returns the current DateTime LocalDate date = LocalDate.now(); System.out.printf("%s-%s-%s", date.getYear(), date.getMonthValue(), date.getDayOfMonth() ); Using LocalDate to print the current date Doing calculations with times and dates One of the most important functionalities of JSR-310 is that you can do calculations with dates and times. The API makes it very easy to do that. import javax.time.LocalTime; import javax.time.Period; import static javax.time.calendrical.LocalPeriodUnit.HOURS; Period p = Period.of(5, HOURS); LocalTime time = LocalTime.now(); LocalTime newTime; newTime = time.plus(5, HOURS); // or newTime = time.plusHours(5); // or newTime = time.plus(p); Three ways of adding 5 hours to the current time Each class that represents human time implements the AdjustableDateTime interface. The interface requires the plus and the minus method that take a value and a PeriodUnit as argument. Conclusion This article gave a (very) brief introduction into the new date and time API that will ship with Java 8. The API seems to be very consistent and well thought through and provides many ways to interact with dates and times. Upon release of Java 8 the API will be moved from the javax.time package over to java.time, so there will be no conflict if you start using the current implementation.

September 25, 2012

by Fabian Becker

· 78,603 Views

Nested Data Structures, and non-1NF design in PostgreSQL

This has been adapted from an ongoing series currently running on my blog. It has been adapted to be more self-contained, and rely less on other blog entries. For more see http://ledgersmbdev.blogspot.com PostgreSQL provides a very advanced set of tools for doing data modelling in ways which drift back and forth across a relational and non-relational divide. While it is generally a good idea to make the database relational first, and add objects later, the principles of object-relational database design allow you to do a lot more with PostgreSQL than you can on many other database platforms. This article will discuss the use of non-first-normal-form designs, in particular the storage of arrays of tuples in columns to simulate a nested table. The possible uses and problems of such a design will be discussed in detail. One of the promises of object-relational modelling is the ability to address information modelling on complex and nested data structures. Nested data structures bring considerable richness to the database, which is lost in a pure, flat, relational model. Nested data structures can be used to model tuple constraints in ways that are impossible to do when looking at flat data structures, at least as long as those constraints are limited to the information in a single tuple. At the same time there are cases where they simplify things and cases where they complicate things. This is true both in the case of using these for storage and for interfacing with stored procedures. PostgreSQL allows for nested tuples to be stored in a database, and for arrays of tuples. Other ORDBMS's allow something similar (Informix, DB2, and Oracle all support nested tables). Nested tables in PostgreSQL provide a number of gotchas, and additionally exposing the data in them to relational queries takes some extra work. In this post we will look at modelling general ledger transactions using a nested table approach, and both the benefits and limitations of this approach. In general this trades one set of problems for another and it is important to recognize the problems going in. The storage example came out of a brainstorming session I had with Marc Balmer of Micro Systems, though it is worth noting that this is not the solution they use in their products, nor is it the approach currently used by LedgerSMB. Basic Table Structure: The basic data schema will end up looking like this: CREATE TABLE journal_type ( id serial not null unique, label text primary key ); CREATE TABLE account ( id serial not null unique, control_code text primary key, -- account number description text ); CREATE TYPE journal_line_type AS ( account_id int, amount numeric ); CREATE TABLE journal_entry ( id serial not null unique, journal_type int references journal_type(id), source_document_id text,-- for example invoice number date_posted date not null, description text, line_items journal_line_type[], PRIMARY KEY (journal_type, source_document_id) ); This schema has a number of obvious gotchas and cannot, by itself, guarantee the sorts of things we want to do. However, using object-relational modelling we can fix these in ways that cannot do in a purely relational schema. The main problems are: First, since this is a double entry model, we need a constraint that says that the sum of the amounts of the lines must always equal zero. However, if we just add a sum() aggregate, we will end up with it summing every record in the db every time we do an insert, which is not what we want. We also want to make sure that no account_id's are null and no amounts are null. Additionally it is not possible in the schema above to easily expose the journal line information to purely relational tools. However we can use a VIEW to do this, though this produces yet more problems. Finally referential integrity enforcement between the account lines and accounts cannot be done declaratively. We will have to create TRIGGERs to enforce this manually. These problems are traded off against the fact that the relational model does not allow for the first problem to be solved at all so we trade off the fact that we have some solutions which are a bit of a pain for the fact that we have some solutions at all. Nested Table Constraints If we simply had a tuple as a column, we could look inside the tuple with check constraints. Something like check((column).subcolumn is not null). However in this case we cannot do that because we need to aggregate on a set of tuples attached to the row. To do this instead we create a set of table methods for managing the constraints: CREATE OR REPLACE FUNCTION is_balanced(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ SELECT sum(amount) = 0 FROM unnest($1.line_items); $$; CREATE OR REPLACE FUNCTION has_no_null_account_ids(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ SELECT bool_and(account_id is not null) FROM unnest($1.line_items); $$; CREATE OR REPLACE FUNCTION has_no_null_amounts(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ select bool_and(amount is not null) from unnest($1.line_items); $$; We can then create our constraints. Note that because we have to create the methods first, we have to add our constraints after the functions are defined, and these are added after the table is constructed. I have gone ahead and given these friendly names so that errors are easier for people (and machines) to process and handle. ALTER TABLE journal_entry ADD CONSTRAINT is_balanced CHECK ((journal_entry).is_balanced); ALTER TABLE journal_entry ADD CONSTRAINT has_no_null_account_ids CHECK ((journal_entry).has_no_null_account_ids); ALTER TABLE journal_entry ADD CONSTRAINT has_no_null_amounts CHECK ((journal_entry).has_no_null_amounts); Now we have integrity constraints reaching into our nested data. So let's test this out. insert into journal_type (label) values ('General'); We will re-use the account data from the previous post: or_examples=# select * from account; id | control_code | description ----+--------------+------------- 1 | 1500 | Inventory 2 | 4500 | Sales 3 | 5500 | Purchase (3 rows) Let's try inserting a few meaningless transactions, some of which violate our constraints: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type]); ERROR: new row for relation "journal_entry" violates check constraint "is_balanced" So far so good. insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(null, -100)::journal_line_type]); ERROR: new row for relation "journal_entry" violates check constraint "has_no_null_account_ids" Still good. insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(2, -100)::journal_line_type, row(3, NULL)::journal_line_type]) ERROR: new row for relation "journal_entry" violates check constraint "has_no_null_amounts" Great. All constraints working properly. Let's try inserting a valid row: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(2, -100)::journal_line_type]); And it works! or_examples=# select * from journal_entry; id | journal_type | source_document_id | date_posted | description | li ne_items ----+--------------+--------------------+-------------+----------------+------------------------ 5 | 1 | ref-10001 | 2012-08-23 | This is a test | {"(1,100)","(2,-100)"} (1 row) Break-Out Views A second major problem that we will be facing with this schema is that if someone wants to create a report using a reporting tool that only really supports relational data very well, then the financial data will be opaque and not available. This scenario is one of the reasons why I think it is important generally to push the relational model to its breaking point before looking at object-relational functions. Consequently I think when doing nested tables it is important to ensure that the data in them is available through a relational interface, in this case, a view. In this case, we may want to model debits and credits in a way which is re-usable, so we will start by creating two type methods: CREATE OR REPLACE FUNCTION debits(journal_line_type) RETURNS NUMERIC LANGUAGE SQL AS $$ SELECT CASE WHEN $1.amount < 0 THEN $1.amount * -1 ELSE NULL END $$; CREATE OR REPLACE FUNCTION credits(journal_line_type) RETURNS NUMERIC LANGUAGE SQL AS $$ SELECT CASE WHEN $1.amount > 0 THEN $1.amount ELSE NULL END $$; Now we can use these as virtual columns anywhere a journal_line_type is used. The view definition itself is rather convoluted and this may impact performance. I am waiting for the LATERAL construct to become available which will make this easier. CREATE VIEW journal_line_items AS SELECT id AS journal_entry_id, (li).*, (li).debits, (li).credits FROM (SELECT je.*, unnest(line_items) li FROM journal_entry je) j; Remember li.debits and li.credits gets turned by the parser into debits(li) and credits(li), allowing for class.method notation here. Testing this out: SELECT * FROM journal_line_items; gives us journal_entry_id | account_id | amount | debits | credits ------------------+------------+--------+--------+--------- 5 | 1 | 100 | | 100 5 | 2 | -100 | 100 | 6 | 1 | 200 | | 200 6 | 3 | -200 | 200 | As you can see, this works. Now people with purely relational tools can access the information in the nested table. In general it is almost always worth creating break-out views of this sort where nested data is stored. However it is important to note that with larger data sets this is insufficient because indexing considerations makes it hard to look up specific information on a row level. This may or may not be the end of the world depending on data set size. Referential Integrity Controls The final problem is that relational integrity is not a well defined concept for nested data. For this reason, if we value relational integrity and foreign keys are involved, we must find ways of enforcing these. The simplest solution is a trigger which runs on insert, update, or delete, and manages another relation which can be used as a proxy for relational integrity checks. For example, we could: CREATE TABLE je_account ( je_id int references journal_entry (id), account_id int references account(id), primary key (je_id, account_id) ); This will be a very narrow table and so should be quick to search. It may also be useful in determining which accounts to look at for transactions if we need to do that. This table could then be used to optimize queries. To maintain the table we need to recognize that never ever will a journal entry's line items be updated or deleted. This is due to the need to maintain clear audit controls and trails. We may add other flags to the table to indicate transactions but we can handle insert, update, and delete conditions with a trigger, namely: CREATE FUNCTION je_ri_management() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ DECLARE accounts int[]; BEGIN IF TG_OP ILIKE 'INSERT' THEN INSERT INTO je_account (je_id, account_id) SELECT NEW.id, account_id FROM unnest(NEW.line_items) GROUP BY account_id; RETURN NEW; ELSIF TG_OP ILIKE 'UPDATE' THEN IF NEW.line_items <> OLD.line_items THEN RAISE EXCEPTION 'Cannot journal entry line items!'; ELSE RETURN NEW; END IF; ELSIF TG_OP ILIKE 'DELETE' THEN RAISE EXCEPTION 'Cannot delete journal entries!'; ELSE RAISE EXCEPTION 'Invalid TG_OP in trigger'; END IF; END; $$; Then we add the trigger with: CREATE TRIGGER je_breakout_for_ri AFTER INSERT OR UPDATE OR DELETE ON journal_entry FOR EACH ROW EXECUTE PROCEDURE je_ri_management(); The final invalid TG_OP could be omitted but this is not a bad check to have. Let's try this out: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10003', now()::date, 'This is a test', ARRAY[row(1, 200)::journal_line_type, row(3, -200)::journal_line_type]); or_examples=# select * from je_account; je_id | account_id -------+------------ 10 | 3 10 | 1 (2 rows) In this way referential integrity can be enforced. Solution 2.0: Refactoring the above to eliminate the view. The above solution will work great for small businesses but for larger businesses, querying this data will become slow for certain kinds of reports. Storage here is tied to a specific criteria, and indexing is somewhat problematic. There are ways we can address this, but they are not always optimal. At the same time our work is simplified because the actual accounting details are append-only. One solution to this is to refactor the above solution. Instead of: Main table Relational view Materialized view for referential integrity checking we can have: Main table, with tweaked storage for line items Materialized view for RI checking and relational access Unfortunately this sort of refactoring after the fact isn't simple. Typically you want to convert the journal_line_type type to a journal_line_type table, and inherit this in your materialized view table. You cannot simply drop and recreate since the column you are storing the data in is dependent on the structure. The solution is to rename the type, create a new one in its place. This must be done manually and there is no current capability to copy a composite type's structure into a table. You will then need to create a cast and a cast function. Then, when you can afford the downtime, you will want to convert the table to the new type. It is quite possible that the downtime will be delayed and you will have an extended time period where you are half-way through migrating the structure of your database. You can, however, decide to create a cast between the table and the type, perhaps an implicit one (though this is not inherited) and use this to centralize your logic. Unfortunately this leads to duplication-related complexity and in an ideal world would be avoided. However, assuming that the downtime ends up being tolerable, the resulting structures will end up such that they can be more readily optimized for a variety of workloads. In this regard you would have a main table, most likely with line_items moved to extended storage, whose function is to model journal entries as journal entries and apply relevant constraints, and a second table which models journal entry lines as independent lines. This also simplifies some of the constraint issues on the first table, and makes the modelling easier because we only have to look into the nested storage where we are looking at subset constraints. This section then provides a warning regarding the use of advanced ORDBMS functionality, namely that it is easy to get tunnel vision and create problems for the future. The complexity cost here is so high, that the primary model should generally remain relational, with things like nested storage primarily used to create constraints that cannot be effectively modelled otherwise. However, this becomes a great deal more complicated where values may be update or deleted. Here, however, we have a relatively simple case regarding data writes combined with complex constraints that cannot be effectively expressed in normalized, relational SQL. Therefore the standard maintenance concerns that counsel against duplicating information may give way to the fact that such duplication allows for richer constraints. Now, if we had been aware of the problems going in we would have chosen this structure all along. Our design would have been: CREATE TYPE journal_line AS ( entry_id bigserial primary key, --only possible key je_id int not null, account_id int, amount numeric ); After creating the journal entry table we'd: ALTER TABLE journal_line ADD FOREIGN KEY (je_id) REFERENCES journal_entry(id); If we have to handle purging old data we can make that key ON DELETE CASCADE. And the lines would have been of this type instead. We can then get rid of all constraints and their supporting functions other than the is_balanced one. Our debit and credit functions then also reference this type. Our trigger then looks like: CREATE FUNCTION je_ri_management() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ DECLARE accounts int[]; BEGIN IF TG_OP ILIKE 'INSERT' THEN INSERT INTO journal_line (je_id, account_id, amount) SELECT NEW.id, account_id, amount FROM unnest(NEW.line_items); RETURN NEW; ELSIF TG_OP ILIKE 'UPDATE' THEN RAISE EXCEPTION 'Cannot journal entry line items!'; ELSIF TG_OP ILIKE 'DELETE' THEN RAISE EXCEPTION 'Cannot delete journal entries!'; ELSE RAISE EXCEPTION 'Invalid TG_OP in trigger'; END IF; END; $$; Approval workflows can be handled with a separate status table with its own constraints. Deletions of old information (up to a specific snapshot) can be handled by a stored procedure which is unit tested and disables this trigger before purging data. This system has the advantage of having several small components which are all complete and easily understood, and it is made possible because the data is exclusively append-only. As you can see from the above examples, nested data structures greatly complicate the data model and create problems with relational math that must be addressed if data logic will remain meaningful. This is a complex field, and it adds a lot of complexity to storage. In general, these are best avoided in actual data storage except where this approach makes formerly insurmountable problems manageable. Moreover, they add complexity to optimization once data gets large. Thus while non-atomic fields in this regard make sense as an initial point of entry in some narrow cases, as a point of actual query, they are very rarely the right approaches. It is possible that, at some point, nested storage will be able to have its own indexes, foreign keys, etc. but I cannot imagine this being a high priority and so it isn't clear that this will ever happen. In general, it usually makes the most sense to simply store the data in a pseudo-normalized way, with any non-1NF designs being the initial point of entry in a linear write model. Nested Data Structures as Interfaces Nested data structures as interfaces to stored procedures are a little more manageable. The main difficulties are in application-side data construction and output parsing. Some languages handle this more easily than others. Upper-level construction and handling of these structures is relatively straight-forward on the database-side and poses none of these problems. However, they do cause additional complexity and this must be managed carefully. The biggest issue when interfacing with an application is that ROW types are not usually automatically constructed by application-level frameworks even if they have arrays. This leaves the programmer to choose between unstructured text arrays which are fundamentally non-discoverable (and thus brittle), and arrays of tuples which are discoverable but require a lot of additional application code to handle. At the same time as a chicken and egg problem, frameworks will not add handling for this sort of problem unless people are already trying to do it. So my general recommendation is to use nested data types everywhere in the database sparingly, only where the benefits clearly outweigh the complexity costs. Complexity costs are certainly lower in the interface level and there are many more cases where it these techniques are net wins there, but that does not mean that they should be routinely used even there.

September 25, 2012

by Chris Travers

· 20,902 Views

8 Common Code Violations in Java

At work, recently I did a code cleanup of an existing Java project. After that exercise, I could see a common set of code violations that occur again and again in the code. So, I came up with a list of such common violations and shared it with my peers so that an awareness would help to improve the code quality and maintainability. I’m sharing the list here to a bigger audience. The list is not in any particular order and all derived from the rules enforced by code quality tools such as CheckStyle, FindBugs and PMD. Here we go! Format source code and Organize imports in Eclipse: Eclipse provides the option to auto-format the source code and organize the imports (thereby removing unused ones). You can use the following shortcut keys to invoke these functions. Ctrl + Shift + F – Formats the source code. Ctrl + Shift + O – Organizes the imports and removes the unused ones. Instead of you manually invoking these two functions, you can tell Eclipse to auto-format and auto-organize whenever you save a file. To do this, in Eclipse, go to Window -> Preferences -> Java -> Editor -> Save Actions and then enable Perform the selected actions on save and check Format source code + Organize imports. Avoid multiple returns (exit points) in methods: In your methods, make sure that you have only one exit point. Do not use returns in more than one places in a method body. For example, the below code is NOT RECOMMENDED because it has more then one exit points (return statements). private boolean isEligible(int age){ if(age > 18){ return true; }else{ return false; } } The above code can be rewritten like this (of course, the below code can be still improved, but that’ll be later). private boolean isEligible(int age){ boolean result; if(age > 18){ result = true; }else{ result = false; } return result; } Simplify if-else methods: We write several utility methods that takes a parameter, checks for some conditions and returns a value based on the condition. For example, consider the isEligible method that you just saw in the previous point. private boolean isEligible(int age){ boolean result; if(age > 18){ result = true; }else{ result = false; } return result; } The entire method can be re-written as a single return statement as below. private boolean isEligible(int age){ return age > 18; } Do not create new instances of Boolean, Integer or String: Avoid creating new instances of Boolean, Integer, String etc. For example, instead of using new Boolean(true), use Boolean.valueOf(true). The later statement has the same effect of the former one but it has improved performance. Use curly braces around block statements. Never forget to use curly braces around block level statements such as if, for, while. This reduces the ambiguity of your code and avoids the chances of introducing a new bug when you modify the block level statement. NOT RECOMMENDED if(age > 18) return true; else return false; RECOMMENDED if(age > 18){ return true; }else{ return false; } Mark method parameters as final, wherever applicable: Always mark the method parameters as final wherever applicable. If you do so, when you accidentally modify the value of the parameter, you’ll get a compiler warning. Also, it makes the compiler to optimize the byte code in a better way. RECOMMENDED private boolean isEligible(final int age){ ... } Name public static final fields in UPPERCASE: Always name the public static final fields (also known as Constants) in UPPERCASE. This lets you to easily differentiate constant fields from the local variables. NOT RECOMMENDED public static final String testAccountNo = "12345678"; RECOMMENDED public static final String TEST_ACCOUNT_NO = "12345678";, Combine multiple if statements into one: Wherever possible, try to combine multiple if statements into single one. For example, the below code; if(age > 18){ if( voted == false){ // eligible to vote. } } can be combined into single if statements, as: if(age > 18 && !voted){ // eligible to vote } switch should have default: Always add a default case for the switch statements. Avoid duplicate string literals, instead create a constant: If you have to use a string in several places, avoid using it as a literal. Instead create a String constant and use it. For example, from the below code, private void someMethod(){ logger.log("My Application" + e); .... .... logger.log("My Application" + f); } The string literal “My Application” can be made as an Constant and used in the code. public static final String MY_APP = "My Application"; private void someMethod(){ logger.log(MY_APP + e); .... .... logger.log(MY_APP + f); } Additional Resources: A collection of Java best practices. List of available Checkstyle checks. List of PMD Rule sets

September 14, 2012

by Veera Sundar

· 46,024 Views · 1 Like

Perl in Node.js

Yes, Perl5 can be embedded in node.js! First of all, do a npm install perl. (P.S. node-perl requires a perl5 binary built with -fPIC and -Duseshrplib.) This is synchronous but useful embedded Perl5 for node.js. If you want to try any version of perl, you must check out perl-node. #>git clone git://github.com/hideo55/node-perl.git #>cd node-perl #>node-waf configure #>node-waf build #>node-waf install And then: var Perl = require('perl').Perl(); var perl = new Perl(); perl.Run({ opts : ["-Mfeature=say","-e","say 'Hello world'"] }, function(out,err){ console.log(out); }); perl.Run({ script : 'example.pl', args : ['foo', 'bar'] }); If you opted for Perl5: var Perl = require('perl-simple').Perl; var perl = new Perl(); var ret = perl.evaluate("reverse 'yoeman'"); console.log(ret); // => nameoy var Perl = require('../index.js').Perl; var perl = new Perl(); perl.use('LWP::UserAgent'); var ua = perl.getClass('LWP::UserAgent').new(); var res = ua.get('http://utf-8.jp/'); console.log(res.as_string()); Happy hacking!

September 14, 2012

by Hemanth HM

· 17,103 Views · 1 Like

Erlang: tuples and lists

You can't seriously program in a language just with scalar types like numbers, strings and atoms. For this reason, now that we have a basic knowledge of Erlang's syntax and variables, we have to delve into two basic vector types: tuples and lists. Both tuples and lists represent a collection of values; however, some rules of thumb (imho) to choosing between them are: tuples deal with heterogeneous values, while lists are homegeneous. A tuple is then usually built as a sequence of values of different types, while all of the values of a list are of the same type. This struct versus array differentiation is true also in Python. Tuples and lists are pattern-matched differently (we'll see more of this when writing pattern matching code, of course). Tuples have O(1) random access, while lists have O(N) random access, being built of cons cells. In general, fixed-size structures are modelled as tuples while sequences of N values (where N varies at runtime) are modelled as lists. Tuples Erlang tuples are similar in syntax to Python's ones: 1> MyTuple = {number, 42}. {number,42} 2> tuple_size(MyTuple). 2 3> element(1, MyTuple). number 4> element(2, MyTuple). 42 And of course they are immutable, like every other value: 5> setelement(2, MyTuple, 43). {number,43} 6> MyTuple. {number,42} They can have any number of values: 7> {true, 23, "Hello"}. {true,23,"Hello"} And the empty tuple is: 8> {}. {} Lists Lists are built as a sequence of cons cells (one of LISP's basic data structures; cons means construct*). Each cons cell is composed by a value and a pointer to another cons cell, which may be empty. Thus the list [1, 2, 3] is composed of three cons cells: p1: [1, p2] p2: [2, p3] p3: [3, p_to_empty_list] Lists can be either built as sequences or even by specifying the cons cells directly. In the first case, values are separated by `,`, while in the second they are separated by `|`: 1> [] 1> . [] 2> [1]. [1] 3> [1, 2]. [1,2] 4> [1 | [2]]. [1,2] 5> [1, 2, 3]. [1,2,3] 6> [1 | [2 | [3]]]. [1,2,3] Every function operating on lists is defined in terms of two primitives, head and tail, which return respectively the first element of the list and the rest of the list with that element removed. While in other languages these functions are provided as head/1 and tail/1 (car and cdr for friends), in Erlang they are implemented via pattern matching; this means they are built into the language syntax. Our little exercise for today is to write these constructs as ordinary functions, to introduce how pattern matching works on lists. head/1 Let's start with a simple case: an empty list. If you ask for the first element of such a list, our implementation should raise an error as there is no such element. #!/usr/bin/escript head([]) -> throw(error). main(_) -> head([]). This indeed shows: escript: exception throw: error in function erl_eval:local_func/5 in call from escript:interpret/4 in call from escript:start/1 in call from init:start_it/1 in call from init:start_em/1 when executed. What has happened? We can put literal values in the formal arguments of a function, and the body of the function will only be executed if the values match these literals. Of course this also means that when we execute: main(_) -> head([1]). we get: escript: exception error: {function_clause,[{local,head,[[1]]}]} since the function is only defined for [] as an argument. Let's add another clause to make it work also for 1-element lists: #!/usr/bin/escript head([]) -> throw( error); head([Element]) -> Element. main(_) -> E = head([1]), io:format("Head: ~p~n", [E]). Note that cases are separated by ; instead of `.` and are evaluated in sequence, so you should put the corner cases (or the base case of recursion) first. Here we didn't use a literal pattern, since we don't know what is in the list in general. We used a variable name, so that Element is filled with the value of the only element of the list. Now we can extend the code further, so that it deals with multiple-value lists: #!/usr/bin/escript head([]) -> throw( error); head([Element]) -> Element; head([Element | _Tail]) -> Element. main(_) -> io:format("Head of [1]: ~p~n", [head([1])]), io:format("Head of [1, 2]: ~p~n", [head([1, 2])]). We use _Tail instead of Tail to tell Erlang that we don't need the value of this argument, but that it must exist. Actually, we know that [1] is actually [1 | []], so we can simplify this code a bit as the third clause would match the single-element list case: #!/usr/bin/escript head([]) -> throw( error); head([Element | _Tail]) -> Element. main(_) -> io:format("Head of [1]: ~p~n", [head([1])]), io:format("Head of [1, 2]: ~p~n", [head([1, 2])]). You're not limited to pattern matching to act on lists: explore the lists* module to see how member(), nth() or length() can be use to test an element's presence, read a single value or calculate the length of the list. I'm out of space for today, so I leave the similar tail/1 implementation as an exercise for the reader. Conclusions Tuples and lists are base Erlang data structures. Exercise with them and with pattern matching in the shell to make sure that you know how to manipulate variables before we move from defining functions to make them collaborate.

September 12, 2012

by Giorgio Sironi

· 21,244 Views

A Better Java Shell Script Wrapper

In many Java projects, you often see wrapper shell script to invoke the java command with its custom application parameters. For example, $ANT_HOME/bin/ant, $GROOVY_HOME/bin/groovy, or even in our TimeMachine Scheduler you will see $TIMEMACHINE_HOME/bin/scheduler.sh. Writing these wrapper script is boring and error prone. Most of the problems come from setting the correct classpath for the application. If you're working on an in-house project for a company, then you can get away with hardcoding paths and your environment vars. But for open source projects, folks have to make the wrapper more flexible and generic. Most of them even provide a .bat version of it. Windows DOS is really a brutal and limited terminal to script away your project need. For this reason, I often encourage others to use Cygwin as much as they can. It at least has a real bash shell to work with. Another common problem with these wrappers is it can quickly get out of hand and have too many duplication of similar scripts liter every where in your project. In this post, I will show you a Java wrapper script that I've written. It's simple to use and very flexible for running just about any Java program. Let's see how it's used first, and then I will print its content at the bottom of the post. Introducing the run-java wrapper script If you take a look at $TIMEMACHINE_HOME/bin/scheduler.sh, you will see that it in turns calls a run-java script that comes in the same directory. DIR=$(dirname $0) SCHEDULER_HOME=$DIR/.. $DIR/run-java -Dscheduler.home="$SCHEDULER_HOME" timemachine.scheduler.tool.SchedulerServer "$@" As you can see, our run-java can take -D options. Not only this, it can also take -cp option as well! What's more is that you can specify these options even after the main class! This makes the run-java re-wrappable by other script, and still be able to add additional system properties and classpath. For examples, the TimeMachine comes with Groovy library, so instead of downloading it's full distribution again, you can simply invoke the groovy like this $TIMEMACHINE_HOME/bin/run-java groovy.ui.GroovyMain test.groovy You can use run-java in any directory you're in, so it's convenient. It will resolve it's own directory and load any jars in the lib directory automatically. Now if you want Groovy to run with more additional jars, you can use the -cp option like this: $TIMEMACHINE_HOME/bin/run-java -cp "$HOME/apps/my-app/lib/*" groovy.ui.GroovyMain test.groovy Often times things will go wrong if you are not careful with Java classpath, but with run-java script you can perform a dry run first: RUN_JAVA_DRY=1 $TIMEMACHINE_HOME/bin/run-java -cp "$HOME/apps/my-app/lib/*" groovy.ui.GroovyMain test.groovy You would run the above all in single line on a command prompt. It should print out your full java command with all options and arguments for you to inspect. There are many more options to the script, which you can find out more by reading the comments in it. The current script will work on any Linux bash or on a Windows Cygwin terminal. Using run-java during development with Maven Above examples are assuming you are in a released project structure such as this $TIMEMACHINE_HOME +- bin/run-java +- lib/*.jar But what about during development? A frequent use case is that you want to be able to run your latest compiled classes under target/classes without have to package up or release the entire project. You can use our run-java in these scenario as well. First, simply add bin/run-java in your project, then you run mvn compile dependency:copy-dependencies that will generate all the jar files into target/dependency. That's all. The run-java will automatically detect these directories and create the correct classpath to run your main class. If you use Eclipse IDE for development, then your target/classes will be always up-to-date, and the run-java can be a great gem to have in your project even for development. Get the run-java wrapper script now #!/usr/bin/env bash # # Copyright 2012 Zemian Deng # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # A wrapper script that run any Java6 application in unix/cygwin env. # # This script is assumed to be located in an application's "bin" directory. It will # auto resolve any symbolic link and always run in relative to this application # directory (which is one parent up from the script.) Therefore, this script can be # run any where in the file system and it will still reference this application # directory. # # This script will by default auto setup a Java classpath that picks up any "config" # and "lib" directories under the application directory. It also will also add a # any typical Maven project output directories such as "target/test-classes", # "target/classes", and "target/dependency" into classpath. This can be disable by # setting RUN_JAVA_NO_PARSE=1. # # If the "Default parameters" section bellow doesn't match to user's env, then user # may override these variables in their terminal session or preset them in shell's # profile startup script. The values of all path should be in cygwin/unix path, # and this script will auto convert them into Windows path where is needed. # # User may customize the Java classpath by setting RUN_JAVA_CP, which will prefix to existing # classpath, or use the "-cp" option, which will postfix to existing classpath. # # Usage: # run-java [java_opts] [-cp /more/classpath] [-Dsysprop=value] # # Example: # run-java example.Hello # run-java example.Hello -Dname=World # run-java org.junit.runner.JUnitCore example.HelloTest -cp "C:\apps\lib\junit4.8.2\*" # # Created by: Zemian Deng 03/09/2012 # This run script dir (resolve to absolute path) SCRIPT_DIR=$(cd $(dirname $0) && pwd) # This dir is where this script live. APP_DIR=$(cd $SCRIPT_DIR/.. && pwd) # Assume the application dir is one level up from script dir. # Default parameters JAVA_HOME=${JAVA_HOME:=/apps/jdk} # This is the home directory of Java development kit. RUN_JAVA_CP=${RUN_JAVA_CP:=$CLASSPATH} # A classpath prefix before -classpath option, default to $CLASSPATH RUN_JAVA_OPTS=${RUN_JAVA_OPTS:=} # Java options (-Xmx512m -XX:MaxPermSize=128m etc) RUN_JAVA_DEBUG=${RUN_JAVA_DEBUG:=} # If not empty, print the full java command line before executing it. RUN_JAVA_NO_PARSE=${RUN_JAVA_NO_PARSE:=} # If not empty, skip the auto parsing of -D and -cp options from script arguments. RUN_JAVA_NO_AUTOCP=${RUN_JAVA_NO_AUTOCP:=} # If not empty, do not auto setup Java classpath RUN_JAVA_DRY=${RUN_JAVA_DRY:=} # If not empty, do not exec Java command, but just print # OS specific support. $var _must_ be set to either true or false. CYGWIN=false; case "`uname`" in CYGWIN*) CYGWIN=true ;; esac # Define where is the java executable is JAVA_CMD=java if [ -d "$JAVA_HOME" ]; then JAVA_CMD="$JAVA_HOME/bin/java" fi # Auto setup applciation's Java Classpath (only if they exists) if [ -z "$RUN_JAVA_NO_AUTOCP" ]; then if $CYGWIN; then # Provide Windows directory conversion JAVA_HOME_WIN=$(cygpath -aw "$JAVA_HOME") APP_DIR_WIN=$(cygpath -aw "$APP_DIR") if [ -d "$APP_DIR_WIN\config" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\config" ; fi if [ -d "$APP_DIR_WIN\target\test-classes" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\target\test-classes" ; fi if [ -d "$APP_DIR_WIN\target\classes" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\target\classes" ; fi if [ -d "$APP_DIR_WIN\target\dependency" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\target\dependency\*" ; fi if [ -d "$APP_DIR_WIN\lib" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\lib\*" ; fi else if [ -d "$APP_DIR/config" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/config" ; fi if [ -d "$APP_DIR/target/test-classes" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/target/test-classes" ; fi if [ -d "$APP_DIR/target/classes" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/target/classes" ; fi if [ -d "$APP_DIR/target/dependency" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/target/dependency/*" ; fi if [ -d "$APP_DIR/lib" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/lib/*" ; fi fi fi # Parse addition "-cp" and "-D" after the Java main class from script arguments # This is done for convenient sake so users do not have to export RUN_JAVA_CP and RUN_JAVA_OPTS # saparately, but now they can pass into end of this run-java script instead. # This can be disable by setting RUN_JAVA_NO_PARSE=1. if [ -z "$RUN_JAVA_NO_PARSE" ]; then # Prepare variables for parsing FOUND_CP= declare -a NEW_ARGS IDX=0 # Parse all arguments and look for "-cp" and "-D" for ARG in "$@"; do if [[ -n $FOUND_CP ]]; then if [ "$OS" = "Windows_NT" ]; then # Can't use cygpath here, because cygpath will auto expand "*", which we do not # want. User will just have to use OS path when specifying "-cp" option. #ARG=$(cygpath -w -a $ARG) RUN_JAVA_CP="$RUN_JAVA_CP;$ARG" else RUN_JAVA_CP="$RUN_JAVA_CP:$ARG" fi FOUND_CP= else case $ARG in '-cp') FOUND_CP=1 ;; '-D'*) RUN_JAVA_OPTS="$RUN_JAVA_OPTS $ARG" ;; *) NEW_ARGS[$IDX]="$ARG" let IDX=$IDX+1 ;; esac fi done # Display full Java command. if [ -n "$RUN_JAVA_DEBUG" ] || [ -n "$RUN_JAVA_DRY" ]; then echo "$JAVA_CMD" $RUN_JAVA_OPTS -cp "$RUN_JAVA_CP" "${NEW_ARGS[@]}" fi # Run Java Main class using parsed variables if [ -z "$RUN_JAVA_DRY" ]; then "$JAVA_CMD" $RUN_JAVA_OPTS -cp "$RUN_JAVA_CP" "${NEW_ARGS[@]}" fi else # Display full Java command. if [ -n "$RUN_JAVA_DEBUG" ] || [ -n "$RUN_JAVA_DRY" ]; then echo "$JAVA_CMD" $RUN_JAVA_OPTS -cp "$RUN_JAVA_CP" "$@" fi # Run Java Main class if [ -z "$RUN_JAVA_DRY" ]; then "$JAVA_CMD" $RUN_JAVA_OPTS -cp "$RUN_JAVA_CP" "$@" fi fi

September 11, 2012

by Zemian Deng

· 14,748 Views

Getting Started: Apache Camel Using Groovy

From their site, it says the Apache Camel is a versatile open-source integration framework based on known Enterprise Integration Patterns. It might seem like a vague definition, but I want to tell you that this is a very productive Java library that can solve many of typical IT problems! You can think of it as a very light weight ESB framework with "batteries" included. In every jobs I've been to so far, folks are writing their own solutions in one way or another to solve many common problems (or they would buy some very expensive enterprisy ESB servers that takes months and months to learn, config, and maintain). Things that we commonly solve are integration (glue) code of existing business services together, process data in a certain workflow manner, or move and transform data from one place to another etc. These are very typical need in many IT environments. The Apache Camel can be used in cases like these; not only that, but also in a very productive and effective way! In this article, I will show you how to get started with Apache Camel along with just few lines of Groovy script. You can certainly also start off with a full Java project to try out Camel, but I find Groovy will give you the shortest working example and learning curve. Getting started with Apache Camel using Groovy So let's begin. First let's see a hello world demo with Camel + Groovy. @Grab('org.apache.camel:camel-core:2.10.0') @Grab('org.slf4j:slf4j-simple:1.6.6') import org.apache.camel.* import org.apache.camel.impl.* import org.apache.camel.builder.* def camelContext = new DefaultCamelContext() camelContext.addRoutes(new RouteBuilder() { def void configure() { from("timer://jdkTimer?period=3000") .to("log://camelLogger?level=INFO") } }) camelContext.start() addShutdownHook{ camelContext.stop() } synchronized(this){ this.wait() } Save above into a file named helloCamel.groovy and then run it like this: $ groovy helloCamel.groovy 388 [main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.10.0 (CamelContext: camel-1) is starting 445 [main] INFO org.apache.camel.management.ManagementStrategyFactory - JMX enabled. 447 [main] INFO org.apache.camel.management.DefaultManagementLifecycleStrategy - StatisticsLevel at All so enabling load performance statistics 678 [main] INFO org.apache.camel.impl.converter.DefaultTypeConverter - Loaded 170 type converters 882 [main] INFO org.apache.camel.impl.DefaultCamelContext - Route: route1 started and consuming from: Endpoint[timer://jdkTimer?period=3000] 883 [main] INFO org.apache.camel.impl.DefaultCamelContext - Total 1 routes, of which 1 is started. 887 [main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.10.0 (CamelContext: camel-1) started in 0.496 seconds 898 [Camel (camel-1) thread #1 - timer://jdkTimer] INFO camelLogger - Exchange[ExchangePattern:InOnly, BodyType:null, Body:[Body is null]] 3884 [Camel (camel-1) thread #1 - timer://jdkTimer] INFO camelLogger - Exchange[ExchangePattern:InOnly, BodyType:null, Body:[Body is null]] 6884 [Camel (camel-1) thread #1 - timer://jdkTimer] INFO camelLogger - Exchange[ExchangePattern:InOnly, BodyType:null, Body:[Body is null]] ... The little script above is simple but it presented few key features of Camel Groovyness. The first and last section of the helloCamel.groovy script are just Groovy featuers. The @Grab annotation will automatically download the dependency jars you specify. We import Java packages to use its classes later. At the end we ensure to shutdown Camel before exiting JVM through the Java Shutdown Hook mechanism. The program will sit and wait until user press CTRL+C, just as a typical server process behavior. The middle section is where the Camel action is. You would always create a Camel context to begin (think of it as the server or manager for the process.) And then you would add a Camel route (think of it as a workflow or pipeflow) that you like to process data (Camel likes to call these data "messages"). The route consists of a "from" starting point (where data generated), and one or more "to" points (where data going to be processed). Camel calls these destination 'points' as 'Endpoints'. These endpoints can be expressed in simple URI string format such as "timer://jdkTimer?period=3000". Here we are generating timer message in every 3 secs into the pipeflow, and then process by a logger URI, which will simply print to console output. After Camel context started, it will start processing data through the workflow, as you can observe from the output example above. Now try pressing CTRL+C to end its process. Notice how the Camel will shutdown everything very gracefully. 7312 [Thread-2] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.10.0 (CamelContext: camel-1) is shutting down 7312 [Thread-2] INFO org.apache.camel.impl.DefaultShutdownStrategy - Starting to graceful shutdown 1 routes (timeout 300 seconds) 7317 [Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Route: route1 shutdown complete, was consuming from: Endpoint[timer://jdkTimer?period=3000] 7317 [Thread-2] INFO org.apache.camel.impl.DefaultShutdownStrategy - Graceful shutdown of 1 routes completed in 0 seconds 7321 [Thread-2] INFO org.apache.camel.impl.converter.DefaultTypeConverter - TypeConverterRegistry utilization[attempts=2, hits=2, misses=0, failures=0] mappings[total=170, misses=0] 7322 [Thread-2] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.10.0 (CamelContext: camel-1) is shutdown in 0.010 seconds. Uptime 7.053 seconds. So that's our first taste of Camel ride! However, we titled this section as "Hello World!" demo, and yet we haven't seen any. But you might have also noticed that above script are mostly boiler plate code that we setup. No user logic has been added yet. Not even the logging the message part! We simply configuring the route. Now let's modify the script little bit so we will actually add our user logic to process the timer message. @Grab('org.apache.camel:camel-core:2.10.0') @Grab('org.slf4j:slf4j-simple:1.6.6') import org.apache.camel.* import org.apache.camel.impl.* import org.apache.camel.builder.* def camelContext = new DefaultCamelContext() camelContext.addRoutes(new RouteBuilder() { def void configure() { from("timer://jdkTimer?period=3000") .to("log://camelLogger?level=INFO") .process(new Processor() { def void process(Exchange exchange) { println("Hello World!") } }) } }) camelContext.start() addShutdownHook{ camelContext.stop() } synchronized(this){ this.wait() } Notice how I can simply append the process code part right after the to("log...") line. I have added a "processor" code block to process the timer message. The logic is simple: we greet the world on each tick. Making Camel route more concise and practical Now, do I have you at Hello yet? If not, then I hope you will be patient and continue to follow along for few more practical features of Camel. First, if you were to put Camel in real use, I would recommend you setup your business logic separately from the workflow route definition. This is so that you can clearly express and see your entire pipeflow of route at a glance. To do this, you want to move the "processor", into a service bean. @Grab('org.apache.camel:camel-core:2.10.0') @Grab('org.slf4j:slf4j-simple:1.6.6') import org.apache.camel.* import org.apache.camel.impl.* import org.apache.camel.builder.* import org.apache.camel.util.jndi.* class SystemInfoService { def void run() { println("Hello World!") } } def jndiContext = new JndiContext(); jndiContext.bind("systemInfoPoller", new SystemInfoService()) def camelContext = new DefaultCamelContext(jndiContext) camelContext.addRoutes(new RouteBuilder() { def void configure() { from("timer://jdkTimer?period=3000") .to("log://camelLogger?level=INFO") .to("bean://systemInfoPoller?method=run") } }) camelContext.start() addShutdownHook{ camelContext.stop() } synchronized(this){ this.wait() } Now, see how compact this workflow route has become? The Camel's Java DSL such as "from().to().to()" for defining route are so clean and simple to use. You can even show this code snip to your Business Analysts, and they would likely be able to verify your business flow easily! Wouldn't that alone worth a million dollars? How about another demo: FilePoller Processing File polling processing is a very common and effective way to solve many business problems. If you work for commercial companies long enough, you might have written one before. A typical file poller would process incoming files from a directory and then process the content, and then move the file into a output directory. Let's make a Camel route to do just that. @Grab('org.apache.camel:camel-core:2.10.0') @Grab('org.slf4j:slf4j-simple:1.6.6') import org.apache.camel.* import org.apache.camel.impl.* import org.apache.camel.builder.* import org.apache.camel.util.jndi.* class UpperCaseTextService { def String transform(String text) { return text.toUpperCase() } } def jndiContext = new JndiContext(); jndiContext.bind("upperCaseTextService", new UpperCaseTextService()) def dataDir = "/${System.properties['user.home']}/test/file-poller-demo" def camelContext = new DefaultCamelContext(jndiContext) camelContext.addRoutes(new RouteBuilder() { def void configure() { from("file://${dataDir}/in") .to("log://camelLogger") .to("bean://upperCaseTextService?method=transform") .to("file://${dataDir}/out") } }) camelContext.start() addShutdownHook{ camelContext.stop() } synchronized(this){ this.wait() } Here you see I defined a route to poll a $HOME/test/file-poller-demo/in directory for text files. Once it's found it will log it to console, and then process by a service that transform the content text into upper case. After this, it will send the file into $HOME/test/file-poller-demo/out directory. My goodness, reading the Camel route above probably express what I wrote down just as effective. Do you see the benefits here? What's the "batteries" included part. If you've used Python programming before, you might have heard the pharase that they claim often: Python has "batteries" included. This means their interpreter comes with a rich of libaries for most of the common programming need. You can often write python program without have to download separated external libraries. I am making similar analogies here with Apache Camel. The Camel project comes with so many ready to use components that you can find just about any transport protocals that can carry data. These Camel "components" are ones that support different 'Endpoint URI' that we have seen in our demos above. We have simply shown you timer, log, bean, and file components, but there are over 120 more. You will find jms, http, ftp, cfx, or tcp just to name a few. The Camel project also has an option for you to define route in declarative xml format. The xml is just an extension of a Spring xml config with Camel's namespace handler added on top. Spring is optional in Camel, but you can use it together in a very powerful way.

September 10, 2012

by Zemian Deng

· 15,692 Views · 1 Like