Language Resources

The Latest Languages Topics

Why %util Number from Iostat is Meaningless for MySQL Capacity Planning

Earlier this month I wrote about vmstat iowait cpu numbers, and some of the comments I got were advertising the use of util% as reported by the iostat tool instead. I find this number even more useless for MySQL performance tuning and capacity planning. Now let me start by saying this is a really tricky and deceptive number. Many DBAs who report instances of their systems having a very busy IO subsystem said the util% in vmstat was above 99% and therefore they believe this number is a good indicator of an overloaded IO subsystem. Indeed – when your IO subsystem is busy, up to its full capacity, the utilization should be very close to 100%. However, it is perfectly possible for the IO subsystem and MySQL with it to have plenty more capacity than when utilization is showing 100% – as I will show in an example. Before that though lets see what the iostat manual page has to say on this topic – from this main page we can read: %util Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits. Which says right here that the number is useless for pretty much any production database server that is likely to be running RAID, Flash/SSD, SAN or cloud storage (such as EBS) capable of handling multiple requests in parallel. Let’s look at the following illustration. I will run sysbench on a system with a rather slow storage data size larger than buffer pool and uniform distribution to put pressure on the IO subsystem. I will use a read-only benchmark here as it keeps things more simple… sysbench –num-threads=1 –max-requests=0 –max-time=6000000 –report-interval=10 –test=oltp –oltp-read-only=on –db-driver=mysql –oltp-table-size=100000000 –oltp-dist-type=uniform –init-rng=on –mysql-user=root –mysql-password= run I’m seeing some 9 transactions per second, while disk utilization from iostat is at nearly 96%: [ 80s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 171.82ms (95%) [ 90s] threads: 1, tps: 9.20, reads/s: 128.80, writes/s: 0.00 response time: 157.72ms (95%) [ 100s] threads: 1, tps: 9.00, reads/s: 126.00, writes/s: 0.00 response time: 215.38ms (95%) [ 110s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 141.39ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 127.90 0.70 4070.40 28.00 31.87 1.01 7.83 7.52 96.68 This makes a lot of sense – with read single thread read workload the drive should be only used getting data needed by the query, which will not be 100% as there is some extra time needed to process the query on the MySQL side as well as passing the result set back to sysbench. So 96% utilization; 9 transactions per second, this is a close to full-system capacity with less than 5% of device time to spare, right? Let’s run a benchmark with more concurrency – 4 threads at the time; we’ll see… [ 110s] threads: 4, tps: 21.10, reads/s: 295.40, writes/s: 0.00 response time: 312.09ms (95%) [ 120s] threads: 4, tps: 22.00, reads/s: 308.00, writes/s: 0.00 response time: 297.05ms (95%) [ 130s] threads: 4, tps: 22.40, reads/s: 313.60, writes/s: 0.00 response time: 335.34ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 295.40 0.90 9372.80 35.20 31.75 4.06 13.69 3.38 100.01 So we’re seeing 100% utilization now, but what is interesting – we’re able to reclaim much more than less than 5% which was left if we look at utilization – throughput of the system increased about 2.5x Finally let’s do the test with 64 threads – this is more concurrency than exists at storage level which is conventional hard drives in RAID on this system… [ 70s] threads: 64, tps: 42.90, reads/s: 600.60, writes/s: 0.00 response time: 2516.43ms (95%) [ 80s] threads: 64, tps: 42.40, reads/s: 593.60, writes/s: 0.00 response time: 2613.15ms (95%) [ 90s] threads: 64, tps: 44.80, reads/s: 627.20, writes/s: 0.00 response time: 2404.49ms (95%) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util dm-0 0.00 0.00 601.20 0.80 19065.60 33.60 31.73 65.98 108.72 1.66 100.00 In this case we’re getting 4.5x of throughput compared to single thread and 100% utilization. We’re also getting almost double throughput of the run with 4 thread where 100% utilization was reported. This makes sense – there are 4 drives which can work in parallel and with many outstanding requests they are able to optimize their seeks better hence giving a bit more than 4x. So what have we so ? The system which was 96% capacity but which could have driven still to provide 4.5x throughput – so it had plenty of extra IO capacity. More powerful storage might have significantly more ability to run requests in parallel so it is quite possible to see 10x or more room after utilization% starts to be reported close to 100% So if utilization% is not very helpful what can we use to understand our database IO capacity better ? First lets look at the performance reported from those sysbench runs. If we look at 95% response time you can see 1 thread and 4 threads had relatively close 95% time growing just from 150ms to 250-300ms. This is the number I really like to look at- if system is able to respond to the queries with response time not significantly higher than it has with concurrency of 1 it is not overloaded. I like using 3x as multiplier – ie when 95% spikes to be more than 3x of the single concurrency the system might be getting to the overload. With 64 threads the 95% response time is 15-20x of the one we see with single thread so it is surely overloaded. Do we have anything reported by iostat which we can use in a similar way? It turns out we do! Check out the “await” column which tells us how much the requester had to wait for the IO request to be serviced. With single concurrency it is 7.8ms which is what this drives can do for random IO and is as good as it gets. With 4 threads it is 13.7ms – less than double of best possible, so also good enough… with concurrency of 64 it is however 108ms which is over 10x of what this device could produce with no waiting and which is surely sign of overload. A couple words of caution. First, do not look at svctm which is not designed with parallel processing in mind. You can see in our case it actually gets better with high concurrency while really database had to wait a lot longer for requests submitted. Second, iostat mixes together reads and writes in single statistics which specifically for databases and especially on RAID with BBU might be like mixing apples and oranges together – writes should go to writeback cache and be acknowledged essentially instantly while reads only complete when actual data can be delivered. The tool pt-diskstats from Percona Tookit breaks them apart and so can be much more for storage tuning for database workload. Some of the recent operating systems also ship with sysstat/iostat which breaks out await to r_await and w_await which is much more useful. Final note – I used a read-only workload on purpose – when it comes to writes things can be even more complicated – MySQL buffer pool can be more efficient with more intensive writes plus group commit might be able to commit a lot of transactions with single disk write. Still, the same base logic will apply. Summary: The take away is simple – util% only shows if a device has at least one operation to deal with or is completely busy, which does not reflect actual utilization for a majority of modern IO subsystems. So you may have a lot of storage IO capacity left even when utilization is close to 100%.

July 1, 2014

by Peter Zaitsev

· 5,882 Views

Top 10 Hadoop Shell Commands to Manage HDFS

So you already know what Hadoop is? Why it is used? What problems you can solve with it?

June 30, 2014

by Saurabh Chhajed

· 485,731 Views · 8 Likes

The Dangers of SQL JOINs & Aggregate Functions: How to Avoid Fanouts

Every SQL developer uses JOINs and aggregate functions, but some may not be aware that the two interacting can return incorrect results if not handled carefully. Brett Sauve at Looker has covered the issue in this recent post - specifically, he focuses on the issue of "fanouts" - and it's an easy problem to overlook. A fanout occurs when the primary table in a query contains fewer rows than the combined table resulting from a JOIN. Because the new table contains more rows, aggregate functions like COUNT or SUM may include duplicates, throwing off the results. According to Sauve, the answer is careful ordering of JOIN tables: Herein lies a key point: to help avoid fanouts, begin your joins with the most granular table. Sauve also provides some tips to help SQL developers understand how to avoid fanouts and ensure the validity of their data. He points to three types of JOINs: 1-to-1 Many-to-1 1-to-Many Knowing which JOIN you are using can rule out the possibility of a fanout, for example, because it is not possible in a 1-to-1 or Many-to-1 JOIN. If you are using a 1-to-Many JOIN, though, Sauve has some tips there, too. For example, a messy JOIN can be cleaned up by grouping data to create a 1-to-1 relationship. Check out the full article for the rest of Sauve's tips to make sure your aggregate functions are working correctly, and if you're looking for a few more tips to keep your SQL skills sharp, you might try one of these: Yet Another 10 Common Mistakes Java Developers Make When Writing SQL 6 Simple Performance Tips for SQL SELECT Statements

June 30, 2014

by Alec Noller

· 16,640 Views

Making Operations on Volatile Fields Atomic

The expected behaviour for volatile fields is that they should behave in a multi-threaded application the same as they do in a single threaded application. They are not forbidden to behave the same way, but they are not guaranteed to behave the same way. The solution in Java 5.0+ is to use AtomicXxxx classes however these are relatively inefficient in terms of memory (they add a header and padding), performance (they add a references and little control over their relative positions), and syntactically they are not as clear to use. IMHO A simple solution if for volatile fields to act as they might be expected to do, the way JVM must support in AtomicFields which is not forbidden in the current JMM (Java- Memory Model) but not guaranteed. Why make fields volatile? The benefit of volatile fields is that they are visible across threads and some optimisations which avoid re-reading them are disabled so you always check again the current value even if you didn't change them. e.g. without volatile Thread 2: int a = 5; Thread 1: a = 6; (later) Thread 2: System.out.println(a); // prints 5 With volatile Thread 2: volatile int a = 5; Thread 1: a = 6; (later) Thread 2: System.out.println(a); // prints 6 Why not use volatile all the time? Volatile read and write access is substantially slower. When you write to a volatile field it stalls the entire CPU pipeline to ensure the data has been written to cache. Without this, there is a risk the next read of the value sees an old value, even in the same thread (See AtomicLong.lazySet() which avoids stalling the pipeline) The penalty can be in the order of 10x slower which you don't want to be doing on every access. What are the limitations of volatile? A significant limitation is that operations on the field is not atomic, even when you might think it is. Even worse than that is that usually, there is no difference. I.e. it can appear to work for a long time even years and suddenly/randomly break due to an incidental change such as the version of Java used, or even where the object is loaded into memory. e.g. which programs you loaded before running the program. e.g. updating a value Thread 2: volatile int a = 5; Thread 1: a += 1; Thread 2: a += 2; (later) Thread 2: System.out.println(a); // prints 6, 7 or 8 This is an issue because the read of a and the write of a are done separately and you can get a race condition. 99%+ of the time it will behave as expect, but sometimes it won't/ What can you do about it? You need to use AtomicXxxx classes. These wrap volatile fields with operations which behave as expected. Thread 2: AtomicInteger a = new AtomicInteger(5); Thread 1: a.incrementAndGet(); Thread 2: a.addAndGet(2); (later) Thread 2: System.out.println(a); // prints 8 What do I propose? The JVM has a means to behave as expected, the only surprising thing is you need to use a special class to do what the JMM won't guarantee for you. What I propose is that the JMM be changed to support the behaviour currently provided by the concurrency AtomicClasses. In each case the single threaded behaviour is unchanged. A multi-threaded program which does not see a race condition will behave the same. The difference is that a multi-threaded program does not have to see a race condition but changing the underlying behaviour current method suggested syntax notes x.getAndIncrement() x++ or x += 1 x.incrementAndGet() ++x x.getAndDecrment() x-- or x -= 1 x.decrementAndGet() --x x.addAndGet(y) (x += y) x.getAndAdd(y) ((x += y)-y) x.compareAndSet(e, y) (x == e ? x = y, true : false) Need to add the comma syntax used in other languages. These operations could be supported for all the primitive types such as boolean, byte, short, int, long, float and double. Additional assignment operators could be supported such as current method suggested syntax notes Atomic multiplication x *= 2; Atomic subtraction x -= y; Atomic division x /= y; Atomic modulus x %= y; Atomic shift x <<= y; Atomic shift x >>= z; Atomic shift x >>>= w; Atomic and x &= ~y; clears bits Atomic or x |= z; sets bits Atomic xor x ^= w; flips bits What is the risk? This could break code which relies on these operations occasionally failing due to race conditions. It might not be possible to support more complex expressions in a thread safe manner. This could lead to surprising bugs as the code can look like the works, but it doesn't. Never the less it will be no worse than the current state. JEP 193 - Enhanced Volatiles There is a JEP 193 to add this functionality to Java. An example is; class Usage { volatile int count; int incrementCount() { return count.volatile.incrementAndGet(); } } IMHO there is a few limitations in this approach. The syntax is fairly significant change. Changing the JMM might not require many changes the the Java syntax and possibly no changes to the compiler. It is a less general solution. It can be useful to support operations like volume += quantity; where these are double types. It places more burden on the developer to understand why he/she should use this instead of x++; Conclusion Much of the syntactic and performance overhead of using AtomicInteger and AtomicLong could be removed if the JMM guaranteed the equivalent single threaded operations behaved as expected for multi-threaded code. This feature could be added to earlier versions of Java by using byte code instrumentation.

June 27, 2014

by Peter Lawrey

· 11,955 Views

Option.fold() Considered Unreadable

We had a lengthy discussion recently during code review whether scala.Option.fold() is idiomatic and clever or maybe unreadable and tricky? Let's first describe what the problem is. Option.fold does two things: maps a function f overOption's value (if any) or returns an alternative alt if it's absent. Using simple pattern matching we can implement it as follows: val option: Option[T] = //... def alt: R = //... def f(in: T): R = //... val x: R = option match { case Some(v) => f(v) case None => alt } If you prefer one-liner, fold is actually a combination of map and getOrElse val x: R = option map f getOrElse alt Or, if you are a C programmer that still wants to write in C, but using Scala compiler: val x: R = if (option.isDefined) f(option.get) else alt Interestingly this is similar to how fold() is actually implemented, but that's an implementation detail. OK, all of the above can be replaced with single Option.fold(): val x: R = option.fold(alt)(f) Technically you can even use /: and \: operators (alt /: option) - but that would be simply masochistic. I have three problems with option.fold() idiom. First of all - it's anything but readable. We are folding (reducing) over Option - which doesn't really make much sense. Secondly it reverses the ordinary positive-then-negative-case flow by starting with failure (absence, alt) condition followed by presence block (f function; see also: Refactoring map-getOrElse to fold). Interestingly this method would work great for me if it was named mapOrElse: ** * Hypothetical in Option */ def mapOrElse[B](f: A => B, alt: => B): B = this map f getOrElse alt Actually there is already such method in Scalaz, called OptionW.cata. cata. Here is whatMartin Odersky has to say about it: "I personally find methods like cata that take two closures as arguments are often overdoing it. Do you really gain in readability over map + getOrElse? Think of a newcomer to your code[...]"While cata has some theoretical background, Option.fold just sounds like a random name collision that doesn't bring anything to the table, apart from confusion. I know what you'll say, that TraversableOnce has fold and we are sort-of doing the same thing. Why it's a random collision rather than extending the contract described inTraversableOnce? fold() method in Scala collections typically just delegates to one offoldLeft()/foldRight() (the one that works better for given data structure), thus it doesn't guarantee order and folding function has to be associative. But inOption.fold() the contract is different: folding function takes just one parameter rather than two. If you read my previous article about folds you know that reducing function always takes two parameters: current element and accumulated value (initial value during first iteration). But Option.fold() takes just one parameter: current Option value! This breaks the consistency, especially when realizing Option.foldLeft() andOption.foldRight() have correct contract (but it doesn't mean they are more readable). The only way to understand folding over option is to imagine Option as a sequence with0 or 1 elements. Then it sort of makes sense, right? No. def double(x: Int) = x * 2 Some(21).fold(-1)(double) //OK: 42 None.fold(-1)(double) //OK: -1 but: Some(21).toList.fold(-1)(double) : error: type mismatch; found : Int => Int required: (Int, Int) => Int Some(21).toList.fold(-1)(double) ^ If we treat Option[T] as a List[T], awkward Option.fold() breaks because it has different type than TraversableOnce.fold(). This is my biggest concern. I can't understand why folding wasn't defined in terms of the type system (trait?) and implemented strictly. As an example take a look at: Data.Foldable in Haskell (advanced) Data.Foldable typeclass describes various flavours of folding in Haskell. There are familiar foldl/foldr/foldl1/foldr1, in Scala namedfoldLeft/foldRight/reduceLeft/reduceRight accordingly. They have the same type as Scala and behave unsurprisingly with all types that you can fold over, including Maybe, lists, arrays, etc. There is also a function named fold, but it has a completely different meaning: class Foldable t where fold :: Monoid m => t m -> m While other folds are quite complex, this one barely takes a foldable container of ms (which have to be Monoids) and returns the same Monoid type. A quick recap: a type can be aMonoid if there exists a neutral value of that type and an operation that takes two values and produces just one. Applying that function with one of the arguments being neutral value yields the other argument. String ([Char]) is a good example with empty string being neutral value (mempty) and string concatenation being such operation (mappend). Notice that there are two different ways you can construct monoids for numbers: under addition with neutral value being 0 (x + 0 == 0 + x == x for any x) and under multiplication with neutral 1 (x * 1 == 1 * x == x for any x). Let's stick to strings. If I fold empty list of strings, I'll get an empty string. But when a list contains many elements, they are being concatenated: > fold ([] :: [String]) "" > fold [] :: String "" > fold ["foo", "bar"] "foobar" In the first example we have to explicitly say what is the type of empty list []. Otherwise Haskell compiler can't figure out what is the type of elements in a list, thus which monoid instance to choose. In second example we declare that whatever is returned from fold [], it should be a String. From that the compiler infers that [] actually must have a type of [String]. Last fold is the simplest: the program folds over elements in list and concatenates them because concatenation is the operation defined in Monoid Stringtypeclass instance. Back to options (or more precisely Maybe). Folding over Maybe monad having type parameter being Monoid (I can't believe I just said it) has an interesting interpretation: it either returns value inside Maybe or a default Monoid value: > fold (Just "abc") "abc" > fold Nothing :: String "" Just "abc" is same as Some("abc") in Scala. You can see here that if Maybe Stringis Nothing, neutral String monoid value is returned, that is an empty string. Summary Haskell shows that folding (also over Maybe) can be at least consistent. In ScalaOption.fold is unrelated to List.fold, confusing and unreadable. I advise avoiding it and staying with slightly more verbose map/getOrElse transformations or pattern matching. PS: Did I mention there is also Either.fold() (with even different contract) but noTry.fold()?

June 26, 2014

by Tomasz Nurkiewicz

· 9,622 Views

Eclipse Community Survey 2014 Results

We have published the results of the Eclipse Community Survey 2014. Thank you to everyone who participated in the survey this year. The complete results and data are available for anyone to download [xls] [ods]. As in other years, I think the results provide an interesting perspective on what tools software developers are using and the type of applications they are building. Here are some key highlights from the results this year: 1) Git #1 Code Management Tool. Git has finally surpassed Subversion to be the top code management tool used by software developers. A third of developers (33.3%) report they use Git as their primary code management tool compared to 30.7% using Subversion. Subversion continues to show a downward trend from previous years when it was used by more than half the developers. Of note, 9.6% claim GitHub is their primary code management tool so the prevalence of overall Git usage is becoming dominate. 2) Maven and Jenkins Key Tools. For Build and Release tools, Maven and Jenkins continue to be key tools used by developers. Of interest is the growth of Gradle from 2013 (4.5%) to 2014 (11%). 3) Top 3 Application Servers. Tomcat (32.6%), JBoss (11.8%) and Jetty (7.2%) continue to be the top 3 application servers. 4) Java 8 Adoption. Java 8 was released in March 2014 and already 9.2% of Java developers have migrated to Java 8 as their primary version of Java. 59.2% are using Java 7 but close to a quarter are using Java 6 or before. 5) Majority of Developers Use JavaScript. More and more software developers use multiple languages to develop software. Due to the Eclipse biased of the survey, Java is not surprisingly the top primary language. However, when asked what other languages developers might use, JavaScript stands out to be a popular language with a the majority of developers (56.2%) claiming it as a secondary language. 6) Developers Experimenting With Open Hardware. The Internet of Things (IoT) and Open Hardware have become important industry trends in the last couple of years. Over a third (35.7%) of software developers are spending their own personal time learning about devices like the BeagleBone, Arduino and Raspberry Pi. Thanks again to everyone who participated in the survey. I hope everyone finds the results of interest.

June 25, 2014

by Ian Skerrett

· 14,291 Views

Fingerprint CSS in MVC Web Sites

Optimizing for website performance includes setting long expiration dates on our static resources, such s images, stylesheets and JavaScript files. Doing that tells the browser to cache our files so it doesn’t have to request them every time the user loads a page. This is one of the most important things to do when optimizing websites. In ASP.NET on IIS7+ it’s really easy. Just add this chunk of XML to the web.config’s element: The above code tells the browsers to automatically cache all static resources for 365 days. That’s good and you should do this right now. The issue becomes clear the first time you make a change to any static file. How is the browser going to know that you made a change, so it can download the latest version of the file? The answer is that it can’t. It will keep serving the same cached version of the file for the next 365 days regardless of any changes you are making to the files. Fingerprinting The good news is that it is fairly trivial to make a change to our code, that changes the URL pointing to the static files and thereby tricking the browser into believing it’s a brand new resource that needs to be downloaded. Here’s a little class that I use on several websites, that adds a fingerprint, or timestamp, to the URL of the static file. using System; using System.IO; using System.Web; using System.Web.Caching; using System.Web.Hosting; public class Fingerprint { public static string Tag(string rootRelativePath) { if (HttpRuntime.Cache[rootRelativePath] == null) { string absolute = HostingEnvironment.MapPath("~" + rootRelativePath); DateTime date = File.GetLastWriteTime(absolute); int index = rootRelativePath.LastIndexOf('/'); string result = rootRelativePath.Insert(index, "/v-" + date.Ticks); HttpRuntime.Cache.Insert(rootRelativePath, result, new CacheDependency(absolute)); } return HttpRuntime.Cache[rootRelativePath] as string; } } All you need to change in order to use this class, is to modify the references to the static files. Modify references Here’s what it looks like in Razor for the stylesheet reference: …and in WebForms: content/site.css") %>" /> The result of using the FingerPrint.Tag method will in this case be: Since the URL now has a reference to a non-existing folder (v-634933238684083941), we need to make the web server pretend it exist. We do that with URL rewriting. URL rewrite By adding this snippet of XML to the web.config’s section, we instruct IIS 7+ to intercept all URLs with a folder name containing “v=[numbers]” and rewrite the URL to the original file path. You can use this technique for all your JavaScript and image files as well. The beauty is, that every time you change one of the referenced static files, the fingerprint will change as well. This creates a brand new URL every time so the browsers will download the updated files. FYI, you need to run the AppPool in Integrated Pipeline mode for the section to have any effect.

June 20, 2014

by Merrick Chaffer

· 11,738 Views

Java Code Review Checklist

Utilize this checklist to review the quality of your Java code, including security, performance, and static code analysis.

June 20, 2014

by Mahesh Chopker

· 211,680 Views · 22 Likes

How to Install Mono on a Raspberry Pi

This post exists to help with an MSDN Magazine article that I am authoring It provides some of the low-level details for the article How to install Mono and root certificates on a raspberry pi How to create an Azure mobile service How to create a Custom API inside Azure mobile services that the raspberry pi can call into How to create an Azure storage account MONO - HOW TO INSTALL ON A RASPBERRY PI Why Mono? How to install Mono on a raspberry pi Installing trusted root certificates on to the raspberry pi http://www.mono-project.com/Main_Page An open source, cross-platform, implementation of C# and the CLR that is binary compatible with Microsoft.NET Mono is a free and open source project led by Xamarin (formerly by Novell) that provides a .NET Framework-compatible set of tools including, among others, a C# compiler and a Common Language Runtime WHY MONO? Because it lets us write .net code compiled on Windows We can simply copy the binary files from Windows to Linux and run it as is From a raspberry pi device, it is possible to use a .net application to take a photo and upload it to Windows Azure storage HOW TO INSTALL ON A RASPBERRY PI RUNNING LINUX You will issue the following commands: pi@raspberrypi ~ $ sudo apt-get update pi@raspberrypi ~ $ sudo apt-get install mono-complete The first command makes sure all the local package index are up to date with the changes made in repositories. Second command installs the complete Mono tooling and runtime. MAKING SURE THAT YOUR MONO APPLICATIONS CAN MAKE A HTTPS REST-BASED CALLS This command downloads the trusted root certificates from the Mozilla LXR web site into the Mono certificate store. Once complete, the Raspberry PI will be capable of making web requests using HTTPS requests within Mono. pi@raspberrypi ~ $ mozroots --import --ask-remove --machine CREATING A NEW AZURE MOBILE SERVICES ACCOUNT The mobile services account is needed to host a Node.js application that provides shared access signatures to raspberry pi devices The shared access signature is needed by the raspberry pi, so that it can directly and securely upload photos to Azure storage STEPS TO CREATE AN AZURE MOBILE SERVICE The steps below will create an Azure mobile service The service will be used to host a Node.js application interacting with a raspberry pi devices We will provision a SQL database, although it will not be used initially FOLLOW THESE STEPS TO CREATE THE MOBILE SERVICE Login into the Azure Portal Select MOBILE SERVICES from the left menu pane at the Azure Portal. In the lower left corner select "+NEW" to create a new Azure Mobile Service. Make sure you've selected, "COMPUTE / MOBILE SERVICE / CREATE." You will now enter a url. We will call this service raspberrymobileservice. For the DATABASE, we will choose "Create a new SQL database instance." The REGION we chose is "West US." The BACKEND is "JavaScript." Click the "->" arrow to proceed to the next screen. In this screen you will "Specify database settings." The NAME of your database will based on the URL you entered previously. In this case, the database is called "raspberrymobileservice_db." You will need to choose a SERVER. We will choose "New SQL database server" from the drop-down list. You will need to provide a SERVER LOGIN NAME and a SERVER LOGIN PASSWORD. Take note of the login you provided as it will be needed later CREATING A CUSTOM API Azure mobile services allows you to create a custom API written in JavaScript that can be called from a raspberry pi device using REST This custom API is really just a Node.js application running in the server CREATING THE API TO RESPOND TO THE DEVICE TRYING TO UPLOAD PHOTOS Now that the service is established, we will turn our attention to creating an API that the device can call into to upload a photo. Login into the Azure Portal Your mobile service will take a few minutes to complete, and you should see the "Ready" flag as the "Status" for your service. Once it is ready you can drill into your service to customize its behavior. Just to the right of the service name, click the right arrow key "->" to drill into the service details. The top menu bar will offer many options, but we are interested in the one titled "API." The API allows you to create a series of node.JS API calls that a device can call into using rest-based approaches. Click on "API." from there, select "CREATE A CUSTOM API." You will be asked to provide an API name. Type in "photos" for the API name. Below you will see a series of drop-down combo boxes that relate to permission. We will keep the default value of "Anybody with the application key." This might not be the best option for all scenarios. You can read more about this here. http://msdn.microsoft.com/en-us/library/azure/jj193161.aspx. Click the checkmark to complete the process. The name of the AP you just created, "Photos," should be visible on the portal interface. To drill into the photos API click on the right arrow key "->". The right arrow key will be just to the right of the name of the API "Photos". At this point you should see a basic script that has been provided by default. We will overwrite this default script with our own script as described in the MSDN Magazine article. CREATING A STORAGE ACCOUNT TO STORE THE PHOTOS Navigate to the portal and create a storage account Create a container for the photos Obtain the: Storage Account Name (you will provide a name) Storage Account Access key (generated for you) Container Name (you will create) CREATING A STORAGE ACCOUNT We will need a storage account so that we can upload photos to it. The steps are well documented here: http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/ In our case we call the storage account raspberrystorage. This means that the URL that the device will use to upload photos is https://raspberrystorage.blob.core.windows.net/. As you complete these steps make sure that you choose the storage account location to be the same location as was used for your mobile services account. This avoids any unnecessary latency or bandwidth costs between data centers. Once the storage account is created, we will need to create a container within it. Photos or any blob for that matter, are always stored within a container. To create a container drill into your newly created storage account and select CONTAINERS from the top menu. From there, select CREATE A CONTAINER. The new container dialog box will ask for a name for your container. Take note of the name you provide. We are calling our container ?photocontainer.? When the raspberry pi device uploads photos to the storage account, it will target a specific container, such as the one we just created. You will next be asked to indicate ACCESS rights. To keep things simple we will select access rights of Public Blob. ENTERING APP SETTINGS Rather than hard-code storage account information inside your JavaScript/Node.js applications, you should consider using apps settings inside of the Azure mobile services portal This post also discusses it well: http://blogs.msdn.com/b/carlosfigueira/archive/2013/12/09/application-settings-in-azure-mobile-services.aspx ?The idea of application settings is a set of key-value pairs which can be set for the mobile service (either via the portal or via the command-line interface), and those values could be then read in the service runtime.? NAVIGATING TO APP SETTINGS Navigate to the Azure Mobile Services section of the portal. Drill into the specific service by hitting the arrow below Select from the Configure Menu at the top Scroll down to the very bottom to see app settings Note that we need to enter: - We need to get this from Azure Storage - PhotoContainerName - AccountName - AccountKey We get this information from the Azure Storage Section of the Portal. Note that you need to have provisioned a Storage Account to have this information. How to get the AccountKey with Azure Storage Services Now you can get the access keys HOW NODE.JS WILL ACCESS THE APP SETTINGS You will create a Node.js application inside of Azure Mobile Services See previous steps THE NODE.JS APPLICATION READING APP SETTINGS You will starting by going back to Azure Mobile Services and drill down into your newly minted service We called ours raspberrymobileservice Once you click API, you should see: Notice the app settings are being read on lines 12 to 14.

June 19, 2014

by Bruno Terkaly

· 16,799 Views

How to Declare Modules in Node.js

This article was originally written by Edwin Dalorzo at the Informatech CR Blog. One of those aspects of Node.js that took me a while to fully understand initially is how to properly declare modules. At first it looks kind of obvious and intuitively simple, but later, you realize you can expose different types of objects like functions, constructors, properties or entire object instances. So I have decided to write this article with different examples and techniques I learned while writing different type of modules in Node.js. On Modules, Import and Export Let’s start by the most obvious and simple thing. Something probably everyone learns since the first day of work with Node: every code file is considered a module. The variables, properties, functions, constructors that we declared in it are private to the module and other modules cannot gain access to them or use them unless the programmer of the module explicitly expose them to the public; namely everything we declare inside a module is encapsulated and hidden from the outside world by default unless explicitly stated otherwise. To expose something the programmer has access to a special object called module, which has a special property called exports. Everything that you publish in the module.exports object is made publicly available to other modules. For instance, in the code below, the variable pi is inaccessible to any other modules but foo.js, whereas the property named bar is made publicly available to any other modules importing the module foo.js. Note that this is a fundamental difference from JavaScript in Node.js when compared with JavaScript as executed in a browser where objects are publicly exposed in a global object (i.e. window). //module foo.js var pi = 3.14; module.exports.bar = 'Hello World'; Now a second module baz.js can “import” the module foo.js and gain access to the property bar. In Node, we achieve this effect by means of using a global function named require. Somewhat as follows: //module baz.js var foo = require('./foo'); console.log(foo.bar); //yields Hello World Technique 1 – Extending exports Object with Additional Functionality So, one technique to expose the functionality in a module consists in adding functions and properties to the module.exports object. When this is the case, Node provides a direct access to the exports object to make things simpler for us. For instance: //module foo.js exports.serviceOne = function(){ }; exports.serviceTwo = function(){ }; exports.serviceThree = function(){ }; And as you might expect, the users of this module, at importing it, would obtain a reference to the exports object and by this they would gain access to all the functionality exposed in it. //module bar.js var foo = require('./foo'); foo.serviceOne(); foo.serviceTwo(); foo.serviceThree(); Technique 2 – Substitute Default exports Object with Another Object By this point you probably suspect that given the fact that module.exports is just an object that exposes the public part of a module then we could probably define our own object and then replace the default module.exports object with our own. For instance: //module foo.js var service = { serviceOne: function(){ }, serviceTwo: function(){ }, serviceThree = function(){ } }; module.exports = service; The code in this last example would behave exactly as the code in the previous example, it’s just that this time we have explicitly created our exported object instead of using the one provided by default by Node. Technique 3 – Substitute Default exports Object with a Constructor Function In the examples so far we have always used an instance of an object as our exposed target. However there are occasions in which it may seem more convenient to allow the user to create as many instances of a given type as she wants. Nothing prevents us from replacing the module.exports object with other types of objects like a constructor function. In the example below we expose a constructor which the user can use to create many instances of the Foo type. //module Foo.js function Foo(name){ this.name = name; } Foo.prototype.serviceOne = function(){ }; Foo.prototype.serviceTwo = function(){ }; Foo.prototype.serviceThree = function(){ }; module.exports = Foo; And the user of this module can simply do something like this: //module bar.js var Foo = require('./Foo'); var foo = new Foo('Obi-wan'); foo.serviceOne(); foo.serviceTwo(); foo.serviceThree(); Technique 4 – Substitute Default exports Object with Plain Old Function It is easy to imagine now that if we can use a constructor function then we might just as well be able to use any other plain old JavaScript function as the target exposed in module.exports. As in the following example in which our exported function allows the user of this module to gain access to one of several other encapsulated service objects. //foo.js var serviceA = {}; serviceA.serviceOne = function(){ }; serviceA.serviceTwo = function(){ }; serviceA.serviceThree = function(){ }; var serviceB = {}; serviceB.serviceOne = function(){ }; serviceB.serviceTwo = function(){ }; serviceB.serviceThree = function(){ }; module.exports = function(name){ switch(name){ case 'A': return serviceA; case 'B': return serviceB; default: throw new Error('Unknown service name: ' + name); } }; Now the user that imports this module receives a reference to our anonymous function declared above and then she can simply invoke the function to gain access to one of our encapsulated objects. For instance: //module bar.js var foo = require('./foo'); var obj = foo('A'); obj.serviceOne(); obj.serviceTwo(); obj.serviceThree(); Many programmers ordinarily invoke the function immediately returned by require instead of assigning it to a reference first. For instance: //module bar.js var foo = require('./foo')('A'); foo.serviceOne(); foo.serviceTwo(); foo.serviceThree(); So, in summary, it is as simple as follows: everything that we expose in module.exports is what we get when we invoke require. And using different techniques we could expose objects, constructors functions, properties, etc. About Modules and the use Global State An interesting aspect of modules is the way they are evaluated. The module is evaluated the first time it is required and then it is cached. This means that after it has been evaluated no matter how many times we require it, we will always get the same exported object back. This means that, although Node provides a global object, it is probably better to use modules to store shared stated instead of putting it directly into the global object. For instance, the following module exposes the configuration of a Mongo database. //module config.js dbConfig = { url:'mongodb://foo', user: 'anakin', password: '*******' } We can easily share this module with as many other modules as we want, and everyone of them will get the exact same instance of the configuration object since the module is evaluated only once and the exported object is cached from there on. //foo.js var dbConfig1 = require('./config'); var dbConfig2 = require('./config'); var assert = require('assert'); assert(dbConfig1==dbConfi2);

June 18, 2014

by Luis Aguilar

· 28,137 Views · 2 Likes

Higher-order Functions, Functions Composition, and Currying in Java 8

The main concept behind functional programming is that data and behaviors can be treated and manipulated uniformly and in the same way. In practical terms this means that it is possible to pass to a method both values and functions and in the same way the method itself can return either a value or a function. With this regard a function accepting one or more functions as argument and/or returning another function as result is called an higher-order function. The purpose of this article is explaining what functions composition and currying are demonstrating why they can be both seen as special cases of higher-order functions and showing how to use them with a practical example in Java 8. Let's define a Converter as an object with a single method that takes as input a conversion rate and the value to be converted and returns the conversion's result that is obtained by simply multiplying these 2 arguments. Given this description the Converter can be seen as a function of 2 parameters and then defined as a particular implementation of the BiFunction interface taking 2 Doubles, the conversion rate and the value to be converted, as arguments and returning a third Double that is the converted value. public class Converter implements BiFunction { @Override public Double apply(Double conversionRate, Double value) { return conversionRate * value; } } In this way we could for instance convert 10, 20 or 50 miles in kilometers by instancing a new Converter and passing to it the conversion rate between miles and kilometers together with the number of miles to be converted. Converter converter = new Converter(); double tenMilesInKm = converter.apply(1.609, 10.0); double twentyMilesInKm = converter.apply(1.609, 20.0); double fiftyMilesInKm = converter.apply(1.609, 50.0); This works, but there is something not very practical with this approach: we are obliged to repeat the same conversion rate for all the invocation of our converter. It would be better if our converter could offer an easy way to obtain a specialized version of it just converting miles to kilometers. In other words we would like to create a Function out of the original BiFunction having set one of the 2 arguments of the BiFunction to a fixed value, that in our case is the miles/km conversion rate. In functional programming this particular operation is called currying. Currying Unfortunately the native Java 8 BiFunction interface doesn't provide currying out of the box. Nevertheless it is very easy to develop a custom extension of the original BiFunction interface having 2 additional default methods that allow to set one of the 2 arguments of the BiFunction to a fixed value. @FunctionalInterface public interface ExtendedBiFunction extends BiFunction { default Function curry1(T t) { return u -> apply(t, u); } default Function curry2(U u) { return t -> apply(t, u); } } The methods curry1 and curry2 fix respectively the first and the second argument of the BiFunction and return a Function having as single argument the one of the two that is still remained unfixed. If we now change our Converter just making it to implement this ExtendedBiFunction interface public class Converter implements ExtendedBiFunction it's possible to obtain a Function converting miles to kilometers fixing the first argument of the converter to the appropriate conversion rate. Function mi2kmConverter = converter.curry1(1.609); double tenMilesInKm = mi2kmConverter.apply(10.0); double twentyMilesInKm = mi2kmConverter.apply(20.0); double fiftyMilesInKm = mi2kmConverter.apply(50.0); Of course we can repeat this specialization process how many times we want for example to obtain a converter from ounces to grams. Function ou2grConverter = converter.curry1(28.345); double tenOuncesInGr = ou2grConverter.apply(10.0); double twentyOuncesInGr = ou2grConverter.apply(20.0); double fiftyOuncesInGr = ou2grConverter.apply(50.0); More formally currying is a technique where a function f of two arguments (x and y say) is seen instead as a function g of one argument which returns a function also of one argument. The value returned by the latter function is the same as the value of the original function. f(x,y) = (g(x))(y) Of course the same principle can be generalized to transform a function of n arguments in one of n-1 arguments having fixed the remaining one. It's also easy to see that the curry1 and curry2 methods are examples of higher-order function since they returns a Function as result. Function composition In the biggest part of cases a multiplication, as the one performed by our Converter class, is all you need to do to convert a unit measure into another. However there are some situations for which this couldn't be enough. For instance the formula to convert a Celsius temperature in a Farenheit one is: F = C * 9/5 + 32 After having multiplied the Celsius temperature by the 9/5 factor, you still have to add 32 in order to obtain the corresponding Farenheit value. Is there a way to obtain a Function converting Celsius to Farenheit degrees from our generic Converter? This is a case where functions composition can come to the rescue. In fact the native Java 8 Function interface already provides a default method named andThen accepting another Function as argument and returning a third Function as result (another higher-order function) that is the composition of the first 2. The resulting composed function first applies the original function to its input, and then applies the function passed as argument to the result. In this way the Celsius to Farenheit conversion function can be obtained currying the generic converter with the 9/5 factor and then applying a further function that increments the result of the first one of 32. Function celsius2farenheitConverter = converter.curry1(9.0/5).andThen(n -> n + 32); double tenCInF = celsius2farenheitConverter.apply(10.0); double twentyCInF = celsius2farenheitConverter.apply(20.0); double fiftyCInF = celsius2farenheitConverter.apply(50.0); What about the opposite conversion? We have to use the formula: C = (F - 32) * 5/9 meaning that this time the subtraction has to be performed before the multiplication by the conversion rate. In other words we need to compose the original conversion function with another function (the subtraction) that has to be applied before it and not after as we did above. The native Java 8 Function interface already provides a compose method allowing to achieve this, but for some reason the same method is not available also on the BiFunction interface. This is not too bad since we can add it (actually we need 2 of them, one for each argument) to our ExtendedBiFunction interface. @FunctionalInterface public interface ExtendedBiFunction extends BiFunction { default Function curry1(T t) { return u -> apply(t, u); } default Function curry2(U u) { return t -> apply(t, u); } default ExtendedBiFunction compose1(Function before) { return (v, u) -> apply(before.apply(v), u); } default ExtendedBiFunction compose2(Function before) { return (t, v) -> apply(t, before.apply(v)); } } Here the compose1 higher-order function composes the BiFuction with the before Function and returns another BiFunction that, when applied, first transform its first argument with the before Function and then applies the original BiFunction with the transformed first argument and the unchanged second one. The compose2 method implements exactly the same principle to the second argument of the BiFunction leaving unchanged the first one. We can now obtain the Farenheit to Celsius converter composing the generic Converter with a Function that subtract 32 from the value to be converted before before to currying the first argument fixing it to 5/9. Function farenheit2celsiusConverter = converter.compose2((Double n) -> n - 32).curry1(5.0/9); double tenFInC = farenheit2celsiusConverter.apply(10.0); double twentyFInC = farenheit2celsiusConverter.apply(20.0); double fiftyFInC = farenheit2celsiusConverter.apply(50.0); If you want you can both deepen your knowledge of the new API introduced into Java in its 8th major release and learn how to leverage the new functional features of the language reading the Java 8 in Action book that I just finished to write together with Raoul-Gabriel Urma and Alan Mycroft.

June 17, 2014

by Mario Fusco

· 55,002 Views · 11 Likes

Java 8 Friday: 10 Subtle Mistakes When Using the Streams API

at data geekery , we love java. and as we’re really into jooq’s fluent api and query dsl , we’re absolutely thrilled about what java 8 will bring to our ecosystem. java 8 friday every friday, we’re showing you a couple of nice new tutorial-style java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. you’ll find the source code on github . 10 subtle mistakes when using the streams api we’ve done all the sql mistakes lists: 10 common mistakes java developers make when writing sql 10 more common mistakes java developers make when writing sql yet another 10 common mistakes java developers make when writing sql (you won’t believe the last one) but we haven’t done a top 10 mistakes list with java 8 yet! for today’s occasion ( it’s friday the 13th ), we’ll catch up with what will go wrong in your application when you’re working with java 8. (it won’t happen to us, as we’re stuck with java 6 for another while) 1. accidentally reusing streams wanna bet, this will happen to everyone at least once. like the existing “streams” (e.g. inputstream ), you can consume streams only once. the following code won’t work: intstream stream = intstream.of(1, 2); stream.foreach(system.out::println); // that was fun! let's do it again! stream.foreach(system.out::println); you’ll get a java.lang.illegalstateexception: stream has already been operated upon or closed so be careful when consuming your stream. it can be done only once 2. accidentally creating “infinite” streams you can create infinite streams quite easily without noticing. take the following example: // will run indefinitely intstream.iterate(0, i -> i + 1) .foreach(system.out::println); the whole point of streams is the fact that they can be infinite, if you design them to be. the only problem is, that you might not have wanted that. so, be sure to always put proper limits: // that's better intstream.iterate(0, i -> i + 1) .limit(10) .foreach(system.out::println); 3. accidentally creating “subtle” infinite streams we can’t say this enough. you will eventually create an infinite stream, accidentally. take the following stream, for instance: intstream.iterate(0, i -> ( i + 1) % 2) .distinct() .limit(10) .foreach(system.out::println); so… we generate alternating 0′s and 1′s then we keep only distinct values, i.e. a single 0 and a single 1 then we limit the stream to a size of 10 then we consume it well… the distinct() operation doesn’t know that the function supplied to the iterate() method will produce only two distinct values. it might expect more than that. so it’ll forever consume new values from the stream, and the limit(10) will never be reached. tough luck, your application stalls. 4. accidentally creating “subtle” parallel infinite streams we really need to insist that you might accidentally try to consume an infinite stream. let’s assume you believe that the distinct() operation should be performed in parallel. you might be writing this: intstream.iterate(0, i -> ( i + 1) % 2) .parallel() .distinct() .limit(10) .foreach(system.out::println); now, we’ve already seen that this will turn forever. but previously, at least, you only consumed one cpu on your machine. now, you’ll probably consume four of them, potentially occupying pretty much all of your system with an accidental infinite stream consumption. that’s pretty bad. you can probably hard-reboot your server / development machine after that. have a last look at what my laptop looked like prior to exploding: if i were a laptop, this is how i’d like to go. 5. mixing up the order of operations so, why did we insist on your definitely accidentally creating infinite streams? it’s simple. because you may just accidentally do it. the above stream can be perfectly consumed if you switch the order of limit() and distinct() : intstream.iterate(0, i -> ( i + 1) % 2) .limit(10) .distinct() .foreach(system.out::println); this now yields: 0 1 why? because we first limit the infinite stream to 10 values (0 1 0 1 0 1 0 1 0 1), before we reduce the limited stream to the distinct values contained in it (0 1). of course, this may no longer be semantically correct, because you really wanted the first 10 distinct values from a set of data (you just happened to have “forgotten” that the data is infinite). no one really wants 10 random values, and only then reduce them to be distinct. if you’re coming from a sql background, you might not expect such differences. take sql server 2012, for instance. the following two sql statements are the same: -- using top selectdistincttop10 * fromi orderby.. -- using fetch select* fromi orderby.. offset 0 rows fetchnext10 rowsonly so, as a sql person, you might not be as aware of the importance of the order of streams operations. 6. mixing up the order of operations (again) speaking of sql, if you’re a mysql or postgresql person, you might be used to the limit .. offset clause. sql is full of subtle quirks, and this is one of them. the offset clause is applied first , as suggested in sql server 2012′s (i.e. the sql:2008 standard’s) syntax. if you translate mysql / postgresql’s dialect directly to streams, you’ll probably get it wrong: intstream.iterate(0, i -> i + 1) .limit(10) // limit .skip(5) // offset .foreach(system.out::println); the above yields 5 6 7 8 9 yes. it doesn’t continue after 9 , because the limit() is now applied first , producing (0 1 2 3 4 5 6 7 8 9). skip() is applied after, reducing the stream to (5 6 7 8 9). not what you may have intended. beware of the limit .. offset vs. "offset .. limit" trap! 7. walking the file system with filters we’ve blogged about this before . what appears to be a good idea is to walk the file system using filters: files.walk(paths.get(".")) .filter(p -> !p.tofile().getname().startswith(".")) .foreach(system.out::println); the above stream appears to be walking only through non-hidden directories, i.e. directories that do not start with a dot. unfortunately, you’ve again made mistake #5 and #6. walk() has already produced the whole stream of subdirectories of the current directory. lazily, though, but logically containing all sub-paths. now, the filter will correctly filter out paths whose names start with a dot “.”. e.g. .git or .idea will not be part of the resulting stream. but these paths will be: .\.git\refs , or .\.idea\libraries . not what you intended. now, don’t fix this by writing the following: files.walk(paths.get(".")) .filter(p -> !p.tostring().contains(file.separator + ".")) .foreach(system.out::println); while that will produce the correct output, it will still do so by traversing the complete directory subtree, recursing into all subdirectories of “hidden” directories. i guess you’ll have to resort to good old jdk 1.0 file.list() again. the good news is, filenamefilter and filefilter are both functional interfaces. 8. modifying the backing collection of a stream while you’re iterating a list , you must not modify that same list in the iteration body. that was true before java 8, but it might become more tricky with java 8 streams. consider the following list from 0..9: // of course, we create this list using streams: list list = intstream.range(0, 10) .boxed() .collect(tocollection(arraylist::new)); now, let’s assume that we want to remove each element while consuming it: list.stream() // remove(object), not remove(int)! .peek(list::remove) .foreach(system.out::println); interestingly enough, this will work for some of the elements! the output you might get is this one: 0 2 4 6 8 null null null null null java.util.concurrentmodificationexception if we introspect the list after catching that exception, there’s a funny finding. we’ll get: [1, 3, 5, 7, 9] heh, it “worked” for all the odd numbers. is this a bug? no, it looks like a feature. if you’re delving into the jdk code, you’ll find this comment in arraylist.arralistspliterator : /* * if arraylists were immutable, or structurally immutable (no * adds, removes, etc), we could implement their spliterators * with arrays.spliterator. instead we detect as much * interference during traversal as practical without * sacrificing much performance. we rely primarily on * modcounts. these are not guaranteed to detect concurrency * violations, and are sometimes overly conservative about * within-thread interference, but detect enough problems to * be worthwhile in practice. to carry this out, we (1) lazily * initialize fence and expectedmodcount until the latest * point that we need to commit to the state we are checking * against; thus improving precision. (this doesn't apply to * sublists, that create spliterators with current non-lazy * values). (2) we perform only a single * concurrentmodificationexception check at the end of foreach * (the most performance-sensitive method). when using foreach * (as opposed to iterators), we can normally only detect * interference after actions, not before. further * cme-triggering checks apply to all other possible * violations of assumptions for example null or too-small * elementdata array given its size(), that could only have * occurred due to interference. this allows the inner loop * of foreach to run without any further checks, and * simplifies lambda-resolution. while this does entail a * number of checks, note that in the common case of * list.stream().foreach(a), no checks or other computation * occur anywhere other than inside foreach itself. the other * less-often-used methods cannot take advantage of most of * these streamlinings. */ now, check out what happens when we tell the stream to produce sorted() results: list.stream() .sorted() .peek(list::remove) .foreach(system.out::println); this will now produce the following, “expected” output 0 1 2 3 4 5 6 7 8 9 and the list after stream consumption? it is empty: [] so, all elements are consumed, and removed correctly. the sorted() operation is a “stateful intermediate operation” , which means that subsequent operations no longer operate on the backing collection, but on an internal state. it is now “safe” to remove elements from the list! well… can we really? let’s proceed with parallel() , sorted() removal: list.stream() .sorted() .parallel() .peek(list::remove) .foreach(system.out::println); this now yields: 7 6 2 5 8 4 1 0 9 3 and the list contains [8] eek. we didn’t remove all elements!? free beers ( and jooq stickers ) go to anyone who solves this streams puzzler! this all appears quite random and subtle, we can only suggest that you never actually do modify a backing collection while consuming a stream. it just doesn’t work. 9. forgetting to actually consume the stream what do you think the following stream does? intstream.range(1, 5) .peek(system.out::println) .peek(i -> { if(i == 5) thrownewruntimeexception("bang"); }); when you read this, you might think that it will print (1 2 3 4 5) and then throw an exception. but that’s not correct. it won’t do anything. the stream just sits there, never having been consumed. as with any fluent api or dsl, you might actually forget to call the “terminal” operation. this might be particularly true when you use peek() , as peek() is an aweful lot similar to foreach() . this can happen with jooq just the same, when you forget to call execute() or fetch() : dsl.using(configuration) .update(table) .set(table.col1, 1) .set(table.col2, "abc") .where(table.id.eq(3)); oops. no execute() yes, the “best” way – with 1-2 caveats ;-) 10. parallel stream deadlock this is now a real goodie for the end! all concurrent systems can run into deadlocks, if you don’t properly synchronise things. while finding a real-world example isn’t obvious, finding a forced example is. the following parallel() stream is guaranteed to run into a deadlock: object[] locks = { newobject(), newobject() }; intstream .range(1, 5) .parallel() .peek(unchecked.intconsumer(i -> { synchronized(locks[i % locks.length]) { thread.sleep(100); synchronized(locks[(i + 1) % locks.length]) { thread.sleep(50); } } })) .foreach(system.out::println); note the use of unchecked.intconsumer() , which transforms the functional intconsumer interface into a org.jooq.lambda.fi.util.function.checkedintconsumer , which is allowed to throw checked exceptions. well. tough luck for your machine. those threads will be blocked forever :-) the good news is, it has never been easier to produce a schoolbook example of a deadlock in java! for more details, see also brian goetz’s answer to this question on stack overflow . conclusion with streams and functional thinking, we’ll run into a massive amount of new, subtle bugs. few of these bugs can be prevented, except through practice and staying focused. you have to think about how to order your operations. you have to think about whether your streams may be infinite. streams (and lambdas) are a very powerful tool. but a tool which we need to get a hang of, first.

June 16, 2014

by Lukas Eder

· 10,347 Views · 2 Likes

Converting ListenableFutures to CompletableFutures and back

Java 8 introduced CompletableFutures. They build on standard Futures and add completion callbacks, chaining and other useful stuff. But the world did not wait for Java 8 and lot of libraries added different variants of ListenableFutures which serve the same purpose. Some library authors are reluctant to add support for CompletableFutures even today. It makes sense, Java 8 is quite new and it's not easy to add support for CompletableFutures and be compatible with Java 7 at the same time. Luckily it's easy to convert to CompletableFutures and back. Let's take Spring 4 ListenableFutures as an example. How to convert it to CompletableFuture? static CompletableFuture buildCompletableFuture( final ListenableFuture listenableFuture ) { //create an instance of CompletableFuture CompletableFuture completable = new CompletableFuture() { @Override public boolean cancel(boolean mayInterruptIfRunning) { // propagate cancel to the listenable future boolean result = listenableFuture.cancel(mayInterruptIfRunning); super.cancel(mayInterruptIfRunning); return result; } }; // add callback listenableFuture.addCallback(new ListenableFutureCallback() { @Override public void onSuccess(T result) { completable.complete(result); } @Override public void onFailure(Throwable t) { completable.completeExceptionally(t); } }); return completable; } We just create a CompletableFuture instance and add a callback to the ListenableFuture. In the callback method we just notify the CompletableFuture that the underlying task has finished. We can even propagate call to cancel method if we want to. That's all you need to convert to CompletableFuture. What about the opposite direction? The approach is a bit different, but it's more or less straightforward as well class ListenableCompletableFutureWrapper implements ListenableFuture { private final ListenableFutureCallbackRegistry callbackRegistry = new ListenableFutureCallbackRegistry<>(); private final Future wrappedFuture; ListenableCompletableFutureWrapper( CompletableFuture wrappedFuture ) { this.wrappedFuture = wrappedFuture; wrappedFuture.whenComplete((result, ex) -> { if (ex != null) { if (ex instanceof CompletionException && ex.getCause() != null ) { callbackRegistry.failure(ex.getCause()); } else { callbackRegistry.failure(ex); } } else { callbackRegistry.success(result); } }); } @Override public void addCallback( ListenableFutureCallback callback ) { callbackRegistry.addCallback(callback); } @Override public boolean cancel(boolean mayInterruptIfRunning) { return wrappedFuture.cancel(mayInterruptIfRunning); } @Override public boolean isCancelled() { return wrappedFuture.isCancelled(); } @Override public boolean isDone() { return wrappedFuture.isDone(); } @Override public T get() throws InterruptedException, ExecutionException { return wrappedFuture.get(); } @Override public T get(long timeout, TimeUnit unit) throws InterruptedException, ExecutionException, TimeoutException { return wrappedFuture.get(timeout, unit); } } We just wrap the CompletableFuture and again register a callback. The only non-obvious part is the use of ListenableFutureCallbackRegistry which keeps track of registered ListenableFutureCallbacks. We also have to do some exception processing, but that's all. If you need to do something like this, I have good news. I have wrapped the code to a reusable library, so you do not have to copy and paste the code, you can just use it as described in the library documentation.

June 16, 2014

by Lukas Krecan

· 29,821 Views · 23 Likes

The Performance Price of Foreign Keys in PostgreSQL

If you want to optimize the performance of your PostgreSQL tables, you might benefit from looking in places you wouldn't ordinarily expect. For example, your foreign keys. According to Shaun Thomas in this recent post, foreign keys can have a considerable impact on performance. Ordinarily they're a good thing - Thomas acknowledges that - but there are certain design choices in PostgreSQL that can complicate them: In PostgreSQL, every foreign key is maintained with an invisible system-level trigger added to the source table in the reference. At least one trigger must go here, as operations that modify the source data must be checked that they do not violate the constraint. Each trigger, Thomas says, requires some overhead, and as the number of triggers increases, so does the cost. Thomas offers an example query to demonstrate where this kind of thing becomes an issue - not really a practical or real-world example, though - but the performance hit is substantial: . . . after merely five foreign keys, performance of our updates drops by 28.5%. By the time we have 20 foreign keys, the updates are 95% slower! This is not to say that you shouldn't use foreign keys, Thomas says, but that you should be careful with them and avoid abusing them. In other words, if you don't need a foreign key, you should probably just leave it out. Check out the Thomas' post for all the details.

June 12, 2014

by Alec Noller

· 17,677 Views

Querying XML CLOB Data Directly in Oracle

Oracle 9i introduced a new datatype - XMLType - specifically designed for handling XML naively in the database. However, we may find we have a database that stores XML directly in a CLOB/NCLOB datatype (either because of legacy data, or because we are supporting multiple database platforms). It can be useful to query this data directly using the XMLType functions available. This post just shows a very simple example of a query to do this. (Note: there may be better ways to do this - but this worked for me when I needed an ad-hoc query quickly!) Our Table Assume we have a table called USER, which has the following columns Our embedded XML (example for Ringo) Our Query Say we want to look at account details in the XML - e.g. find which users all have the same sort code. A query like the following: SELECT ID, NAME XMLTYPE(u.accountxml).EXTRACT('/person/account/sortCode/text()') as SORTCODE FROM USER u; Will produce output like this. Obviously from here you can use all the standard SQL operators to filter, join etc further to get what you need:

June 11, 2014

by Adrian Milne

· 45,621 Views · 1 Like

What's Wrong in Java 8, Part V: Tuples

In part 5 of our "What's Wrong in Java 8" series, we turn to Tuples.

June 10, 2014

by Pierre-Yves Saumont

· 132,818 Views · 4 Likes

Building a Simple RESTful API with Java Spark

Disclaimer: This post is about the Java micro web framework named Spark and not about the data processing engine Apache Spark. In this blog post we will see how Spark can be used to build a simple web service. As mentioned in the disclaimer, Spark is a micro web framework for Java inspired by the Ruby framework Sinatra. Spark aims for simplicity and provides only a minimal set of features. However, it provides everything needed to build a web application in a few lines of Java code. Getting Started Let's assume we have a simple domain class with a few properties and a service that provides some basic CRUDfunctionality: public class User { private String id; private String name; private String email; // getter/setter } public class UserService { // returns a list of all users public List getAllUsers() { .. } // returns a single user by id public User getUser(String id) { .. } // creates a new user public User createUser(String name, String email) { .. } // updates an existing user public User updateUser(String id, String name, String email) { .. } } We now want to expose the functionality of UserService as a RESTful API (For simplicity we will skip the hypermedia part of REST ;-)). For accessing, creating and updating user objects we want to use following URL patterns: GET /users Get a list of all users GET /users/ Get a specific user POST /users Create a new user PUT /users/ Update a user The returned data should be in JSON format. To get started with Spark we need the following Maven dependencies: com.sparkjava spark-core 2.0.0 org.slf4j slf4j-simple 1.7.7 Spark uses SLF4J for logging, so we need to a SLF4J binder to see log and error messages. In this example we use the slf4j-simple dependency for this purpose. However, you can also use Log4j or any other binder you like. Having slf4j-simple in the classpath is enough to see log output in the console. We will also use GSON for generating JSON output and JUnit to write a simple integration tests. You can find these dependencies in the complete pom.xml. Returning All Users Now it is time to create a class that is responsible for handling incoming requests. We start by implementing the GET /users request that should return a list of all users. import static spark.Spark.*; public class UserController { public UserController(final UserService userService) { get("/users", new Route() { @Override public Object handle(Request request, Response response) { // process request return userService.getAllUsers(); } }); // more routes } } Note the static import of spark.Spark.* in the first line. This gives us access to various static methods including get(), post(), put() and more. Within the constructor the get() method is used to register aRoute that listens for GET requests on /users. A Route is responsible for processing requests. Whenever aGET /users request is made, the handle() method will be called. Inside handle() we return an object that should be sent to the client (in this case a list of all users). Spark highly benefits from Java 8 Lambda expressions. Route is a functional interface (it contains only one method), so we can implement it using a Java 8 Lambda expression. Using a Lambda expression the Routedefinition from above looks like this: get("/users", (req, res) -> userService.getAllUsers()); To start the application we have to create a simple main() method. Inside main() we create an instance of our service and pass it to our newly created UserController: public class Main { public static void main(String[] args) { new UserController(new UserService()); } } If we now run main(), Spark will start an embedded Jetty server that listens on Port 4567. We can test our first route by initiating a GET http://localhost:4567/users request. In case the service returns a list with two user objects the response body might look like this: [com.mscharhag.sparkdemo.User@449c23fd, com.mscharhag.sparkdemo.User@437b26fe] Obviously this is not the response we want. Spark uses an interface called ResponseTransformer to convert objects returned by routes to an actual HTTP response. ReponseTransformer looks like this: public interface ResponseTransformer { String render(Object model) throws Exception; } ResponseTransformer has a single method that takes an object and returns a String representation of this object. The default implementation of ResponseTransformer simply calls toString() on the passed object (which creates output like shown above). Since we want to return JSON we have to create a ResponseTransformer that converts the passed objects to JSON. We use a small JsonUtil class with two static methods for this: public class JsonUtil { public static String toJson(Object object) { return new Gson().toJson(object); } public static ResponseTransformer json() { return JsonUtil::toJson; } } toJson() is an universal method that converts an object to JSON using GSON. The second method makes use of Java 8 method references to return a ResponseTransformer instance. ResponseTransformer is again a functional interface, so it can be satisfied by providing an appropriate method implementation (toJson()). So whenever we call json() we get a new ResponseTransformer that makes use of our toJson()method. In our UserController we can pass a ResponseTransformer as a third argument to Spark's get()method: import static com.mscharhag.sparkdemo.JsonUtil.*; public class UserController { public UserController(final UserService userService) { get("/users", (req, res) -> userService.getAllUsers(), json()); ... } } Note again the static import of JsonUtil.* in the first line. This gives us the option to create a newResponseTransformer by simply calling json(). Our response looks now like this: [{ "id": "1866d959-4a52-4409-afc8-4f09896f38b2", "name": "john", "email": "[email protected]" },{ "id": "90d965ad-5bdf-455d-9808-c38b72a5181a", "name": "anna", "email": "[email protected]" }] We still have a small problem. The response is returned with the wrong Content-Type. To fix this, we can register a Filter that sets the JSON Content-Type: after((req, res) -> { res.type("application/json"); }); Filter is again a functional interface and can therefore be implemented by a short Lambda expression. After a request is handled by our Route, the filter changes the Content-Type of every response toapplication/json. We can also use before() instead of after() to register a filter. Then, the Filterwould be called before the request is processed by the Route. The GET /users request should be working now :-) Returning a Specific User To return a specific user we simply create a new route in our UserController: get("/users/:id", (req, res) -> { String id = req.params(":id"); User user = userService.getUser(id); if (user != null) { return user; } res.status(400); return new ResponseError("No user with id '%s' found", id); }, json()); With req.params(":id") we can obtain the :id path parameter from the URL. We pass this parameter to our service to get the corresponding user object. We assume the service returns null if no user with the passed id is found. In this case, we change the HTTP status code to 400 (Bad Request) and return an error object. ResponseError is a small helper class we use to convert error messages and exceptions to JSON. It looks like this: public class ResponseError { private String message; public ResponseError(String message, String... args) { this.message = String.format(message, args); } public ResponseError(Exception e) { this.message = e.getMessage(); } public String getMessage() { return this.message; } } We are now able to query for a single user with a request like this: GET /users/5f45a4ff-35a7-47e8-b731-4339c84962be If an user with this id exists we will get a response that looks somehow like this: { "id": "5f45a4ff-35a7-47e8-b731-4339c84962be", "name": "john", "email": "[email protected]" } If we use an invalid user id, a ResponseError object will be created and converted to JSON. In this case the response looks like this: { "message": "No user with id 'foo' found" } Creating and Updating Users Creating and updating users is again very easy. Like returning the list of all users it is done using a single service call: post("/users", (req, res) -> userService.createUser( req.queryParams("name"), req.queryParams("email") ), json()); put("/users/:id", (req, res) -> userService.updateUser( req.params(":id"), req.queryParams("name"), req.queryParams("email") ), json()); To register a route for HTTP POST or PUT requests we simply use the static post() and put() methods of Spark. Inside a Route we can access HTTP POST parameters using req.queryParams(). For simplicity reasons (and to show another Spark feature) we do not do any validation inside the routes. Instead we assume that the service will throw an IllegalArgumentException if we pass in invalid values. Spark gives us the option to register ExceptionHandlers. An ExceptionHandler will be called if anException is thrown while processing a route. ExceptionHandler is another single method interface we can implement using a Java 8 Lambda expression: exception(IllegalArgumentException.class, (e, req, res) -> { res.status(400); res.body(toJson(new ResponseError(e))); }); Here we create an ExceptionHandler that is called if an IllegalArgumentException is thrown. The caught Exception object is passed as the first parameter. We set the response code to 400 and add an error message to the response body. If the service throws an IllegalArgumentException when the email parameter is empty, we might get a response like this: { "message": "Parameter 'email' cannot be empty" } The complete source the controller can be found here. Testing Because of Spark's simple nature it is very easy to write integration tests for our sample application. Let's start with this basic JUnit test setup: public class UserControllerIntegrationTest { @BeforeClass public static void beforeClass() { Main.main(null); } @AfterClass public static void afterClass() { Spark.stop(); } ... } In beforeClass() we start our application by simply running the main() method. After all tests finished we call Spark.stop(). This stops the embedded server that runs our application. After that we can send HTTP requests within test methods and validate that our application returns the correct response. A simple test that sends a request to create a new user can look like this: @Test public void aNewUserShouldBeCreated() { TestResponse res = request("POST", "/users?name=john&[email protected]"); Map json = res.json(); assertEquals(200, res.status); assertEquals("john", json.get("name")); assertEquals("[email protected]", json.get("email")); assertNotNull(json.get("id")); } request() and TestResponse are two small self made test utilities. request() sends a HTTP request to the passed URL and returns a TestResponse instance. TestResponse is just a small wrapper around some HTTP response data. The source of request() and TestResponse is included in the complete test classfound on GitHub. Conclusion Compared to other web frameworks Spark provides only a small amount of features. However, it is so simple you can build small web applications within a few minutes (even if you have not used Spark before). If you want to look into Spark you should clearly use Java 8, which reduces the amount of code you have to write a lot. You can find the complete source of the sample project on GitHub.

June 9, 2014

by Michael Scharhag

· 111,681 Views · 3 Likes

Working with ZeroMQ, Java, and JZMQ on a CentOS Platform

Recently I decided to port some of my development using ZeroMQ onto my CentOS development machine and I ran into some challenges. I’m documenting those challenges so that if someone else runs into the same pitfalls I did, they can avoid it. In this example today, we will work with the first “HelloWorld” examples in the ZeroMQ guide found here. I added a few modifications to the sample such as a package name and a try-catch around the Thread and an exception.tostring() to display any stack-trace. Source code for src/zmq/hwserver.java package zmq; import java.io.PrintWriter; import java.io.StringWriter; import org.zeromq.ZMQ; // // Hello World server in Java // Binds REP socket to tcp://*:5555 // Expects "Hello" from client, replies with "World" // public class hwserver { /** * @param args */ public static void main(String[] args) { ZMQ.Context context = ZMQ.context(1); // Socket to talk to clients ZMQ.Socket socket = context.socket(ZMQ.REP); socket.bind ("tcp://*:5555"); try { while (!Thread.currentThread ().isInterrupted ()) { byte[] reply = socket.recv(0); System.out.println("Received Hello"); String request = "World" ; socket.send(request.getBytes (), 0); Thread.sleep(1000); // Do some 'work' } } catch(Exception e) { StringWriter sw = new StringWriter(); PrintWriter pw = new PrintWriter(sw); e.printStackTrace(pw); System.out.println(sw.toString()); } socket.close(); context.term(); } } Similarly, source code for the client, src/zmq/hwclient.java package zmq; import org.zeromq.ZMQ; public class hwclient { /** * @param args */ public static void main(String[] args) { ZMQ.Context context = ZMQ.context(1); // Socket to talk to server System.out.println("Connecting to hello world server"); ZMQ.Socket socket = context.socket(ZMQ.REQ); socket.connect ("tcp://localhost:5555"); for(int requestNbr = 0; requestNbr != 10; requestNbr++) { String request = "Hello" ; System.out.println("Sending Hello " + requestNbr ); socket.send(request.getBytes (), 0); byte[] reply = socket.recv(0); System.out.println("Received " + new String (reply) + " " + requestNbr); } socket.close(); context.term(); } } Now that you have the sample code, how do you compile using the ZeroMQ? Assumption: You have installed Java (1.7 or above) Step-1: Installing ZeroMQ onto CentOS [Following steps are performed under root account] Install “Development Tools” if it’s not already installed on your CentOS as root: yum groupinstall “Development Tools” Download the “POSIX tarball” ZeroMQ source code onto your CentOS development machine from here. At the time of writing this article, ZeroMQ version 3.2.3 was the stable release. You might want to download the latest stable release. Unpack the .tar.gz source archive. Run ./configure, followed by “make” then “make install“. Run ldconfig after installation. Step-2: Installing a Language Binding for Java. In this case, we will use JZMQ from https://github.com/zeromq/jzmq Download the latest stable release from GITHub link above. (git clone git://github.com/zeromq/jzmq.git) Change directory, cd jzmq Compile and Install: 1 2 3 4 ./autogen.sh ./configure make make install Where did it install? 1 2 # JAR is located here: /usr/local/share/java/zmq.jar # .so link files are located here: /usr/local/lib Important Step: Add /usr/local/lib to a line in /etc/ld.so.conf (here is my copy after editing) 1 2 include ld.so.conf.d/*.conf /usr/local/lib Reload “ldconfig“. This clears the cache. Step-3: Compile and run the Java examples above. cd ~/dev/zeromq/example/ # Compile hwserver.java javac -classpath /usr/local/share/java/zmq.jar ./zmq/hwserver.java # Compile hwclient.java javac -classpath /usr/local/share/java/zmq.jar ./zmq/hwclient.java # Run hwserver in a separate prompt java -classpath .: /usr/local/share/java/zmq.jar -Djava.library.path=/usr/local/lib zmq.hwserver # Run hwclient in a seperate prompt java -classpath .:/usr/local/share/java/zmq.jar -Djava.library.path=/usr/local/lib zmq.hwclient Output on the hwserver console: Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello Received Hello output on the hwclient console: Connecting to hello world server Sending Hello 0 Received World 0 Sending Hello 1 Received World 1 Sending Hello 2 Received World 2 Sending Hello 3 Received World 3 Sending Hello 4 Received World 4 Sending Hello 5 Received World 5 Sending Hello 6 Received World 6 Sending Hello 7 Received World 7 Sending Hello 8 Received World 8 Sending Hello 9 Received World 9 Few interesting points to note are as follows: What happens if you started the client first and then the server? Well, the client waits until the server becomes available (or in other words, until some process connects to socket port 5555) and then sends the message. When you say socket.send(…), ZeroMQ actually enqueues a message to be sent later by a dedicated communication thread and this thread waits until a bind on port 5555 happens by “server”. Also observe that the “server” is doing the connecting, and the “client” is doing the binding. What is ZeroMQ (ØMQ)? (Excerpt from the ZeroMQ website!) ØMQ (also seen as ZeroMQ, 0MQ, zmq) looks like an embeddable networking library but acts like a concurrency framework. It gives you sockets that carry atomic messages across various transports like in-process, inter-process, TCP, and multicast. You can connect sockets N-to-N with patterns like fanout, pub-sub, task distribution, and request-reply. It’s fast enough to be the fabric for clustered products. Its asynchronous I/O model gives you scalable multicore applications, built as asynchronous message-processing tasks. It has a score of language APIs and runs on most operating systems. ØMQ is from iMatix and is LGPLv3 open source.

June 6, 2014

by Venkatt Guhesan

· 25,125 Views

MapDB: The Agile Java Data Engine

MapDB is a pure Java database, specifically designed for the Java developer. The fundamental concept for MapDB is very clever yet natural to use: provide a reliable, full-featured and “tune-able” database engine using the Java Collections API. MapDB 1.0 has just been released, this is the culmination of years of research and development to get the project to this point. Jan Kotek, the primary developer for MapDB, also worked on predecessor projects (JDBM), starting MapDB as an entire from-scratch rewrite. Jan’s expertise and dedication to low-level debugging has yielded excellent results, producting an easy-to-use database for Java with comparable performance to many C-based engines. What sets MapDB apart is the “map” concept. The idea is to leverage the totally natural Java Collections API – so familiar to Java developers that most of them literally use it daily in their work. For most database interactions with a Java application, some sort of translator is required. There are many Object-Relational Mapping (ORM) tools to name just one category of such components. The goal has always been in the direction of making it natural to code in objects in the Java language, and translate them to a specific database syntax (such as SQL). However, such efforts have always come up short, adding complexity for both the application developer and the data architect. When using MapDB there is no object “translation layer” – developers just access data in familiar structures like Maps, Sets, Queues, etc. There is no change in syntax from typical Java coding, other than a brief initialization syntax and transaction management. A developer can literally transform memory-limited maps into a high-speed persistent store in seconds (typically changing just one line of code). A MapDB Example Here is a simple MapDB example, showing how easy and intuitive it is to use in a Java application: // Initialize a MapDB database DB db = DBMaker.newFileDB(new File("testdb")) .closeOnJvmShutdown() .make(); // Create a Map: Map myMap = db.getTreeMap(“testmap”); // Work with the Map using the normal Map API. myMap.put(“key1”, “value1”); myMap.put(“key2”, “value2”); String value = myMap.get(“key1”); ... That’s all you need to do, now you have a file-backed Map of virtually any size. Note the “builder-style” initialization syntax, enabling MapDB as the agile database choice for Java. There are many builder options that let you tune your database for the specific requirements at hand. Just a small subset of options include: In-memory implementation Enable transactions Configurable caching This means that you can configure your database just for what you need, effectively making MapDB serve the job of many other databases. MapDB comes with a set of powerful configuration options, and you can even extend the product to make your own data implementations if necessary. Another very powerful feature is that MapDB utilizes some of the advanced Java Collections variants, such as ConcurrentNavigableMap. With this type of Map you can go beyond simple key-value semantics, as it is also a sorted Map allowing you to access data in order, and find values near a key. Not many people are aware of this extension to the Collections API, but it is extremely powerful and allows you to do a lot with your MapDB database (I will cover more of these capabilities in a future article). The Agile Aspect of MapDB When I first met Jan and started talking with him about MapDB he said something that made a very important impression: If you know what data structure you want, MapDB allows you to tailor the structure and database characteristics to your exact application needs. In other words, the schema and ways you can structure your data is very flexible. The configuration of the physical data store is just as flexible, making a perfect combination for meeting almost any database need. They key to this capability is inherent in MapDB’s architecture, and how it translates to the MapDB API itself. Here is a simple diagram of the MapDB architecture: As you can see from the diagram, there are 3 tiers in MapDB: Collections API: This is the familiar Java Collections API that every Java developer uses for maintaining application state. It has a simple builder-style extension to allow you to control the exact characteristics of a given database (including its internal format or record structure). Engine: The Engine is the real key to MapDB, this is where the records for a database – including their internal structure, concurrency control, transactional semantics – are controlled. MapDB ships with several engines already, and it is straightforward to add your own Engine if needed for specialized data handling. Volume: This is the physical storage layer (e.g., on-disk or in-memory). MapDB has a few standard Volume implementations, and they should suffice for most projects. The main point is that the development API is completely distinct from the Engine implementation (the heart of MapDB), and both are separate from the actual physical storage layer. This offers a very agile approach, allowing developers to exactly control what type of internal structure is needed for a given database, and what the actual data structure looks like from the top-level Collections API. To make things even more extensible and agile, MapDB uses a concept of Engine Wrappers. An Engine Wrapper allows adding additional features and options on top of a specific engine layer. For example, if the standard Map engine is utilized for creating a B-Tree backed Map, it is feasible to enable (or disable) caching support. This caching feature is done through an Engine Wrapper, and that is what shows up in the builder-style API used to configure a given database. While a whole article could be written just about this, the point here is that this adds to MapDB’s inherent agile nature. By way of example, here is how you configure a pure in-memory database, without transactional capabilities: // Initialize an in-memory MapDB database // without transactions DB db = DBMaker.newMemoryDB() .transactionDisable() .closeOnJvmShutdown() .make(); // Create a Map: Map myMap = db.getTreeMap(“testmap”); // Work with the Map using the normal Map API. myMap.put(“key1”, “value1”); myMap.put(“key2”, “value2”); String value = myMap.get(“key1”); ... That’s it! All that was needed was to change the DBMaker call to add the new options, everything else works exactly the same as in the example shown earlier. Agile Data Model In addition to customizing the features and performance characteristics of a given database instance, MapDB allows you to create an agile data model, with a schema exactly matching your application requirements. This is probably similar to how you write your code when creating standard Java in-memory structures. For example, let’s say you need to lookup a Person object by username, or by personID. Simply create a Person object and two Maps to meet your needs: public class Person { private Integer personID; private String username; ... // Setters and getters go here ... } // Create a Map of Person by username. Map personByUsernameMap = ... // Create a Map of Person by personID. Map personByPersonIDMap = ... This is a very trivial example, but now you can easily write to both maps for each new Person instance, and subsequently retrieve a Person by either key. Another interesting concept with MapDB data structures are some key extensions to the normal Java Collections API. A common requirement in applications is to have a Map with a key/value, and in addition to finding the value for a key to be able to perform the inverse: lookup the key for a given value. We can easily do this using the MapDB extension for bi-directional maps: // Create a primary map HTreeMap map = DBMaker.newTempHashMap(); // Create the inverse mapping for primary map NavigableSet> inverseMapping = new TreeSet>(); // Bind the inverse mapping to primary map, so it is auto-updated each time the primary map gets a new key/value Bind.mapInverse(map, inverseMapping); map.put(10L,"value2"); map.put(1111L,"value"); map.put(1112L,"value"); map.put(11L,"val"); // Now find a key by a given value. Long keyValue = Fun.filter(inverseMapping.get(“value2”); MapDB supports many constructs for the interaction of Maps or other collections, allowing you to create a schema of related structures that can automatically be kept in sync. This avoids a lot of scanning of structures, makes coding fast and convenient, and can keep things very fast. Wrapping it up I have shown a very brief introduction on MapDB and how the product works. As you can see its strengths are its use of the natural Java Collections API, the agile nature of the engine itself, and the support for virtually any type of data model or schema that your application needs. MapDB is freely available for any use under the Apache 2.0 license. To learn more, check out: www.mapdb.org.

June 5, 2014

by Cory Isaacson

· 28,643 Views · 3 Likes

Spring Integration Java DSL sample

A new Java based DSL has now been introduced for Spring Integration which makes it possible to define the Spring Integration message flows using pure java based configuration instead of using the Spring XML based configuration. I tried the DSL for a sample Integration flow that I have - I call it the Rube Goldberg flow, for it follows a convoluted path in trying to capitalize a string passed in as input. The flow looks like this and does some crazy things to perform a simple task: It takes in a message of this type - "hello from spring integ" splits it up into individual words(hello, from, spring, integ) sends each word to a ActiveMQ queue from the queue the word fragments are picked up by a enricher to capitalize each word placing the response back into a response queue It is picked up, resequenced based on the original sequence of the words aggregated back into a sentence("HELLO FROM SPRING INTEG") and returned back to the application. To start with Spring Integration Java DSL, a simple Xml based configuration to capitalize a String would look like this: There is nothing much going on here, a messaging gateway takes in the message passed in from the application, capitalizes it in a transformer and this is returned back to the application. Expressing this in Spring Integration Java DSL: @Configuration @EnableIntegration @IntegrationComponentScan @ComponentScan public class EchoFlow { @Bean public DirectChannel requestChannel() { return new DirectChannel(); } @Bean public IntegrationFlow simpleEchoFlow() { return IntegrationFlows.from(requestChannel()) .transform((String s) -> s.toUpperCase()) .get(); } } @MessagingGateway public interface EchoGateway { @Gateway(requestChannel = "requestChannel") String echo(String message); } Do note that @MessagingGateway annotation is not a part of Spring Integration Java DSL, it is an existing component in Spring Integration and serves the same purpose as the gateway component in XML based configuration. I like the fact that the transformation can be expressed using typesafe Java 8 lambda expressions rather than the Spring-EL expression. Note that the transformation expression could have coded in quite few alternate ways: ??.transform((String s) -> s.toUpperCase()) Or: ??.transform(s -> s.toUpperCase()) Or using method references: ??.transform(String::toUpperCase) Moving onto the more complicated Rube Goldberg flow to accomplish the same task, again starting with XML based configuration. There are two configurations to express this flow: rube-1.xml: This configuration takes care of steps 1, 2, 3, 6, 7, 8 : It takes in a message of this type - "hello from spring integ" splits it up into individual words(hello, from, spring, integ) sends each word to a ActiveMQ queue from the queue the word fragments are picked up by a enricher to capitalize each word placing the response back into a response queue It is picked up, resequenced based on the original sequence of the words aggregated back into a sentence("HELLO FROM SPRING INTEG") and returned back to the application. and rube-2.xml for steps 4, 5: It takes in a message of this type - "hello from spring integ" splits it up into individual words(hello, from, spring, integ) sends each word to a ActiveMQ queue from the queue the word fragments are picked up by a enricher to capitalize each word placing the response back into a response queue It is picked up, resequenced based on the original sequence of the words aggregated back into a sentence("HELLO FROM SPRING INTEG") and returned back to the application. Now, expressing this Rube Goldberg flow using Spring Integration Java DSL, the configuration looks like this, again in two parts: EchoFlowOutbound.java: @Bean public DirectChannel sequenceChannel() { return new DirectChannel(); } @Bean public DirectChannel requestChannel() { return new DirectChannel(); } @Bean public IntegrationFlow toOutboundQueueFlow() { return IntegrationFlows.from(requestChannel()) .split(s -> s.applySequence(true).get().getT2().setDelimiters("\\s")) .handle(jmsOutboundGateway()) .get(); } @Bean public IntegrationFlow flowOnReturnOfMessage() { return IntegrationFlows.from(sequenceChannel()) .resequence() .aggregate(aggregate -> aggregate.outputProcessor(g -> Joiner.on(" ").join(g.getMessages() .stream() .map(m -> (String) m.getPayload()).collect(toList()))) , null) .get(); } and EchoFlowInbound.java: @Bean public JmsMessageDrivenEndpoint jmsInbound() { return new JmsMessageDrivenEndpoint(listenerContainer(), messageListener()); } @Bean public IntegrationFlow inboundFlow() { return IntegrationFlows.from(enhanceMessageChannel()) .transform((String s) -> s.toUpperCase()) .get(); } Again here the code is completely typesafe and is checked for any errors at development time rather than at runtime as with the XML based configuration. Again I like the fact that transformation, aggregation statements can be expressed concisely using Java 8 lamda expressions as opposed to Spring-EL expressions. What I have not displayed here is some of the support code, to set up the activemq test infrastructure, this configuration continues to remain as xml and I have included this code in a sample github project. All in all, I am very excited to see this new way of expressing the Spring Integration messaging flow using pure Java and I am looking forward to seeing its continuing evolution and may be even try and participate in its evolution in small ways. Here is the entire working code in a github repo: https://github.com/bijukunjummen/rg-si References and Acknowledgement: Spring Integration Java DSL introduction blog article by Artem Bilan: https://spring.io/blog/2014/05/08/spring-integration-java-dsl-milestone-1-released Spring Integration Java DSL website and wiki: https://github.com/spring-projects/spring-integration-extensions/wiki/Spring-Integration-Java-DSL-Reference. A lot of code has been shamelessly copied over from this wiki by me :-). Also, a big thanks to Artem for guidance on a question that I had Webinar by Gary Russell on Spring Integration 4.0 in which Spring Integration Java DSL is covered in great detail.

June 3, 2014

by Biju Kunjummen

· 43,960 Views