The Latest and Popular SDLC Topics

The Latest Popular Topics

Introduction to Default Methods (Defender Methods) in Java 8

We all know that interfaces in Java contain only method declarations and no implementations and any non-abstract class implementing the interface had to provide the implementation. Lets look at an example: public interface SimpleInterface { public void doSomeWork(); } class SimpleInterfaceImpl implements SimpleInterface{ @Override public void doSomeWork() { System.out.println("Do Some Work implementation in the class"); } public static void main(String[] args) { SimpleInterfaceImpl simpObj = new SimpleInterfaceImpl(); simpObj.doSomeWork(); } } Now what if I add a new method in the SimpleInterface? public interface SimpleInterface { public void doSomeWork(); public void doSomeOtherWork(); } and if we try to compile the code we end up with: $javac .\SimpleInterface.java .\SimpleInterface.java:18: error: SimpleInterfaceImpl is not abstract and does not override abstract method doSomeOtherWork() in SimpleInterface class SimpleInterfaceImpl implements SimpleInterface{ ^ 1 error And this limitation makes it almost impossible to extend/improve the existing interfaces and APIs. The same challenge was faced while enhancing the Collections API in Java 8 to support lambda expressions in the API. To overcome this limitation a new concept is introduced in Java 8 called default methods which is also referred to as Defender Methods or Virtual extension methods. Default methods are those methods which have some default implementation and helps in evolving the interfaces without breaking the existing code. Lets look at an example: public interface SimpleInterface { public void doSomeWork(); //A default method in the interface created using "default" keyword default public void doSomeOtherWork(){ System.out.println("DoSomeOtherWork implementation in the interface"); } } class SimpleInterfaceImpl implements SimpleInterface{ @Override public void doSomeWork() { System.out.println("Do Some Work implementation in the class"); } /* * Not required to override to provide an implementation * for doSomeOtherWork. */ public static void main(String[] args) { SimpleInterfaceImpl simpObj = new SimpleInterfaceImpl(); simpObj.doSomeWork(); simpObj.doSomeOtherWork(); } } and the output is: Do Some Work implementation in the class DoSomeOtherWork implementation in the interface This is a very brief introduction to default methods. One can read in depth about default methods here.

March 28, 2013

by Mohamed Sanaulla

· 28,109 Views

Why We Need Lambda Expressions in Java - Part 1

Lambda expressions are coming to Java 8 and together with Raoul-Gabriel Urma and Alan Mycroft I started writing a book on this topic.

March 27, 2013

by Mario Fusco

· 180,916 Views · 11 Likes

Extracting the Elements of the Java Collection- The Java 8 Way

We all have extensively used Collection classes like List, Map and their derived versions. And each time we used them we had to iterate through them to either find some element or update the elements or find different elements matching some condition. Consider a List of Person as shown below: List personList = new ArrayList<>(); personList.add(new Person("Virat", "Kohli",22)); personList.add(new Person("Arun", "Kumar",25)); personList.add(new Person("Rajesh", "Mohan", 32)); personList.add(new Person("Rahul", "Dravid", 35)); To find out all the Person instances with age greater than 30, we would do: List olderThan30OldWay = new ArrayList<>(); for ( Person p : personList){ if ( p.age >= 30){ olderThan30OldWay.add(p); } } System.out.println(olderThan30OldWay); and this gives me the output as: [Rajesh Mohan, 32, Rahul Dravid, 35] The code is easy to write, but is it not a bit more verbose, especially the iteration part? Why would we have to iterate? Would it not be cool if there was an API which would iterate the contents and give us the end result i.e we give the source List and use a series of method calls to get us the result List we are looking for? Yes, this is possible in other languages like Scala, Groovy which support passing closures and also support internal iteration. But is there a solution for Java developers? Yes, this exact problem is being solved by introducing support for Lambda Expressions(closures) and a enhanced Collection API to leverage the lambda expression support. The sad news is that it’s going to be part of Java 8 and will take some time to be into mainstream development. Leveraging the Java 8 enhancements to the above scenario As I said before the Collections API is being enhanced to support the use of Lambda Expression and more about it can be read here. Instead of adding all the new APIs to the Collection class the JDK team created a new concept called “Stream” and added most of the APIs in that class. “Stream” is a sequence of elements obtained from the Collection from which it is created. To read more about the origin of Stream class please refer to this document. To implement the example I started with using the enhancements in Java 8 we would be using few new APIs namely: stream(), filter(), collect(), Collectors.toCollection(). stream(): Uses the collection on which this API is called to create an instance of Stream class. filter():This method accepts a lambda expression which takes in one parameter and returns a boolean value. This lambda expression is written as a replacement for implementing the Predicate class. collect(): There are 2 overloaded versions of this method. The one I am using here takes an instance of Collector. This method takes the contents of the stream and constructs another collection. This construction logic is defined by the Collector. Collectors.toCollection(): Collectors is a factory for Collector. And the toCollection() takes a Lambda expression/Method reference which should return a new instance of any derivatives of Collection class. With brief introduction to the APIs used, let me show the code which is equivalent to the first code sample: List olderThan30 = //Create a Stream from the personList personList.stream(). //filter the element to select only those with age >= 30 filter(p -> p.age >= 30). //put those filtered elements into a new List. collect(Collectors.toCollection(() -> new ArrayList())); System.out.println(olderThan30); The above code uses both Internal iteration and lambda expressions to make it intuitive, concise and soothing to the eye. If you are not familiar with the idea of Lambda Expressions, check out my previous entry which covers in brief about Lambda expressions.

March 23, 2013

by Mohamed Sanaulla

· 78,352 Views · 2 Likes

OpenJPA: Memory Leak Case Study

This article will provide the complete root cause analysis details and resolution of a Java heap memory leak (Apache OpenJPA leak) affecting an Oracle Weblogic server 10.0 production environment. This post will also demonstrate the importance to follow the Java Persistence API best practices when managing the javax.persistence.EntityManagerFactory lifecycle. Environment specifications Java EE server: Oracle Weblogic Portal 10.0 OS: Solaris 10 JDK: Oracle/Sun HotSpot JVM 1.5 32-bit @2 GB capacity Java Persistence API: Apache OpenJPA 1.0.x (JPA 1.0 specifications) RDBMS: Oracle 10g Platform type: Web Portal Troubleshooting tools Quest Foglight for Java (Java heap monitoring) MAT (Java heap dump analysis) Problem description & observations The problem was initially reported by our Weblogic production support team following production outages. An initial root cause analysis exercise did reveal the following facts and observations: Production outages were observed on regular basis after ~2 weeks of traffic. The failures were due to Java heap (OldGen) depletion e.g. OutOfMemoryError: Java heap space error found in the Weblogic logs. A Java heap memory leak was confirmed after reviewing the Java heap OldGen space utilization over time from Foglight monitoring tool along with the Java verbose GC historical data. Following the discovery of the above problems, the decision was taken to move to the next phase of the RCA and perform a JVM heap dump analysis of the affected Weblogic (JVM) instances. JVM heap dump analysis ** A video explaining the following JVM Heap Dump analysis is now available here. In order to generate a JVM heap dump, the supported team did use the HotSpot 1.5 jmap utility which generated a heap dump file (heap.bin) of about ~1.5 GB. The heap dump file was then analyzed using the Eclipse Memory Analyzer Tool. Now let’s review the heap dump analysis so we can understand the source of the OldGen memory leak. MAT provides an initial Leak Suspects report which can be very useful to highlight your high memory contributors. For our problem case, MAT was able to identify a leak suspect contributing to almost 600 MB or 40% of the total OldGen space capacity. At this point we found one instance of java.util.LinkedList using almost 600 MB of memory and loaded to one of our application parent class loader (@ 0x7e12b708). The next step was to understand the leaking objects along with the source of retention. MAT allows you to inspect any class loader instance of your application, providing you with capabilities to inspect the loaded classes & instances. Simply search for the desired object by providing the address e.g. 0x7e12b708 and then inspect the loaded classes & instances by selecting List Objects > with outgoing references. As you can see from the above snapshot, the analysis was quite revealing. What we found was one instance of org.apache.openjpa.enhance.PCRegistry at the source of the memory retention; more precisely the culprit was the _listeners field implemented as a LinkedList. For your reference, the Apache OpenJPA PCRegistry is used internally to track the registered persistence-capable classes. Find below a snippet of the PCRegistry source code from Apache OpenJPA version 1.0.4 exposing the _listeners field. /** * Tracks registered persistence-capable classes. * * @since 0.4.0 * @author Abe White */ publicclass PCRegistry { // DO NOT ADD ADDITIONAL DEPENDENCIES TO THIS CLASS privatestaticfinal Localizer _loc = Localizer.forPackage (PCRegistry.class); // map of pc classes to meta structs; weak so the VM can GC classes privatestaticfinal Map _metas = new ConcurrentReferenceHashMap (ReferenceMap.WEAK, ReferenceMap.HARD); // register class listeners privatestaticfinal Collection _listeners = new LinkedList(); …………………………………………………………………………………… Now the question is why is the memory footprint of this internal data structure so big and potentially leaking over time? The next step was to deep dive into the _listeners LinkedLink instance in order to review the leaking objects. We finally found that the leaking objects were actually the JDBC & SQL mapping definitions (metadata) used by our application in order to execute various queries against our Oracle database. A review of the JPA specifications, OpenJPA documentation and source did confirm that the root cause was associated with a wrong usage of the javax.persistence.EntityManagerFactory such of lack of closure of a newly created EntityManagerFactory instance. If you look closely at the above code snapshot, you will realize that the close() method is indeed responsible to cleanup any recently used metadata repository instance. It did also raise another concern, why are we creating such Factory instances over and over… The next step of the investigation was to perform a code walkthrough of our application code, especially around the life cycle management of the JPA EntityManagerFactory and EntityManager objects. Root cause and solution A code walkthrough of the application code did reveal that the application was creating a new instance of EntityManagerFactory on each single request and not closing it properly. public class Application { @Resource private UserTransaction utx = null; // Initialized on each application request and not closed! @PersistenceUnit(unitName = "UnitName") private EntityManagerFactory emf = Persistence.createEntityManagerFactory("PersistenceUnit"); public EntityManager getEntityManager() { return this.emf.createEntityManager(); } public void businessMethod() { // Create a new EntityManager instance via from the newly created EntityManagerFactory instance // Do something... // Close the EntityManager instance } } This code defect and improver use of JPA EntityManagerFactory was causing a leak or accumulation of metadata repository instances within the OpenJPA _listeners data structure demonstrated from the earlier JVM heap dump analysis. The solution of the problem was to centralize the management & life cycle of the thread safe javax.persistence.EntityManagerFactory via the Singleton pattern. The final solution was implemented as per below: Create and maintain only one static instance of javax.persistence.EntityManagerFactory per application class loader and implemented via the Singleton Pattern. Create and dispose new instances of EntityManager for each application request. Please review this discussion from Stackoverflow as the solution we implemented is quite similar. Following the implementation of the solution to our production environment, no more Java heap OldGen memory leak is observed. Please feel free to provide your comments and share your experience on the same.

March 21, 2013

by Pierre - Hugues Charbonneau

· 9,227 Views

Using Lambda Expression to Sort a List in Java 8 using NetBeans Lambda Support

As part of JSR 335Lambda expressions are being introduced to the Java language from Java 8 onwards and this is a major change in the Java language.

March 21, 2013

by Mohamed Sanaulla

· 346,098 Views · 6 Likes

Algorithm of the Week: Aho-Corasick String Matching Algorithm in Haskell

let’s say you have a large piece of text and a dictionary of keywords. how do you quickly locate all the keywords? aho-corasick algorithm diagram well, there are many ways really, you could even iterate through the whole thing and compare words to keywords. but it turns out that’s going to be very slow. at least o(n_keywords * n_words) complexity. essentially you’re making as many passes over the text as your dictionary is big. in 1975 a couple of ibm researchers – alfred aho and margaret corasick – discovered an algorithm that can do this in a single pass. the aho-corasick string matching algorithm . i implemented it in haskell and it takes 0.005s to find 8 different keywords in oscar wilde’s the nightingale and the rose – a 12kb text. a quick naive keyword search implemented in python takes 0.023s . not a big difference practically speaking, but imagine a situation with megabytes of text and thousands of words in the dictionary. the authors mention printing out the result as a major bottleneck in their assessment of the algorithm. yep, printing . the aho-corasick algorithm at the core of this algorithm are three functions: the three functions of aho-corasick algorithm a parser based on a state machine, which maps (state, char) pairs to states and occasionally emits an output. this is called the goto function a failure function, which tells the goto function which state to jump into when the character it just read doesn’t match anything an output function, which maps states to outputs – potentially more than one per state the algorithm works in two stages. it will first construct the goto, failure and output functions. the complexity of this operation hinges solely on the size of our dictionary. then it iterates over the input text to produce all the matches. using state machines for parsing text is a well known trick – the real genius of this algorithm rests in that failure function if you ask me. it makes lateral transitions between states when the algorithm climbs itself into a wall. say you have she and hers in the dictionary. the goto machine eats your input string one character at the time. let’s say it’s already read s h . the next input is an e so it outputs she and reaches a final state. next it reads an r , but the state didn’t expect any more inputs, so the failure function puts us on the path towards hers . this is a bit tricky to explain in text, i suggest you look at the picture from the original article and look at what’s happening. my haskell implementation the first implementation i tried, relied on manully mapping inputs to outputs for the goto, failure and output functions by using pattern recognition. not very pretty, extremely hardcoded, but it worked and was easy to make. building the functions dynamically proved a bit trickier. type goto = map (int, char) int type failure = map int int type output = map int [string] first off, we build the goto function. -- builds the goto function build_goto::goto -> string -> (goto, string) build_goto m s = (add_one 0 m s, s) -- adds one string to goto function add_one::int -> goto -> [char] -> goto add_one _ m [] = m add_one state m (c:rest) | member key m = add_one (frommaybe 0 $ map.lookup key m) m rest | otherwise = add_one max (map.insert key max m) rest where key = (state, c) max = (size m)+1 essentially this builds a flattened prefix tree in a hashmap of (state, char) pairs mapping to the next state. it makes sure to avoid adding new edges to the three as much as possible. the reason it’s not simply a prefix tree are those lateral transitions; doing them in a tree would require backtracking and repeating of steps, so we haven’t achieved anything. once we have the goto function, building the output is trivial. -- builds the output function build_output::(?m::goto) => [string] -> output build_output [] = empty build_output (s:rest) = map.insert (fin 0 s) (list.filter (\x -> elem x dictionary) $ list.tails s) $ build_output rest -- returns the state in which an input string ends without using failures fin::(?m::goto) => int -> [char] -> int fin state [] = state fin state (c:rest) = fin next rest where next = frommaybe 0 $ map.lookup (state, c) ?m we are essentially going over the dictionary, finding the final state for each word and building a hash table mapping final states to their outputs. building the failure function was trickiest, because we need a way to iterate over the depths at which nodes are position in the goto state machine. but we threw that info away by using a hashmap. -- tells us which nodes in the goto state machine are at which traversal depth nodes_at_depths::(?m::goto) => [[int]] nodes_at_depths = list.map (\i -> list.filter (>0) $ list.map (\l -> if i < length l then l!!i else -1) paths) [0..(maximum $ list.map length paths)-1] where paths = list.map (path 0) dictionary we now have a list of lists, that tells us at which depth certain nodes are. -- builds the failure function build_fail::(?m::goto) => [[int]] -> int -> failure build_fail nodes 0 = fst $ mapaccuml (\f state -> (map.insert state 0 f, state)) empty (nodes!!0) build_fail nodes d = fst $ mapaccuml (\f state -> (map.insert state (decide_fail state lower) f, state)) lower (nodes!!d) where lower = build_fail nodes (d-1) -- inner step of building the failure function decide_fail::(?m::goto) => int -> failure -> int decide_fail state lower = findwithdefault 0 (s, c) ?m where (s', c) = key' state $ assocs ?m s = findwithdefault 0 s' lower -- gives us the key associated with a certain state (how to get there) key'::int -> [((int, char), int)] -> (int, char) key' _ [] = (-1, '_') -- this is ugly, being of maybe type would be better key' state ((k, v):rest) | state == v = k | otherwise = key' state rest here we are going over the list of nodes at depths and deciding what the failure should be for each depth based on the failures of depth-1. at depth zero, all failures go to the zeroth state. an important part of this process was inverting the goto hashmap so values point to keys, which is essentially what the key’ function does. finally, we can use the whole algorithm like this: main = do let ?m = fst $ mapaccuml build_goto empty dictionary let ?f = build_fail nodes_at_depths $ (length $ nodes_at_depths)-1 ?out = build_output dictionary print $ ahocorasick text a bit more involved than the usual example of haskell found online, it’s still pretty cool you can see the whole code on github here .

March 19, 2013

by Swizec Teller

· 21,949 Views

Client For ActiveMQ

This Post explains Topics in Active MQ (Message Broker) with Subscribing and Publishing. For this we will write two java clients. As we did for wso2 Message Broker TopicSubscriber.java to Subcribe for messages TopicPublisher.java to to Publish the messages Let's Start. [1] Get Active MQ from http://activemq.apache.org/download.html [1.1] Start Active MQ from \bin\activemq.bat You can see the started server form http://localhost:8161/admin/ [2] Create Porject "Client" on IDE that you preferred [3] Add activemq-all-5.7.0.jar to lib Dir in the project (activemq-all-5.7.0.jar can be found in root folder) [4] Creat class "TopicSubscriber.java" to Subcribe for messages package simple; import java.util.Properties; import javax.jms.JMSException; import javax.jms.Message; import javax.jms.MessageListener; import javax.jms.Session; import javax.jms.TextMessage; import javax.jms.Topic; import javax.jms.TopicConnection; import javax.jms.TopicConnectionFactory; import javax.jms.TopicSession; import javax.naming.InitialContext; import javax.naming.NamingException; public class TopicSubscriber { private String topicName = "news.sport"; private String initialContextFactory = "" +"org.apache.activemq.jndi.ActiveMQInitialContextFactory"; private String connectionString = "tcp://" +"localhost:61616"; private boolean messageReceived = false; public static void main(String[] args) { TopicSubscriber subscriber = new TopicSubscriber(); subscriber.subscribeWithTopicLookup(); } public void subscribeWithTopicLookup() { Properties properties = new Properties(); TopicConnection topicConnection = null; properties.put("java.naming.factory.initial", initialContextFactory); properties.put("connectionfactory.QueueConnectionFactory", connectionString); properties.put("topic." + topicName, topicName); try { InitialContext ctx = new InitialContext(properties); TopicConnectionFactory topicConnectionFactory = (TopicConnectionFactory) ctx .lookup("QueueConnectionFactory"); topicConnection = topicConnectionFactory.createTopicConnection(); System.out .println("Create Topic Connection for Topic " + topicName); while (!messageReceived) { try { TopicSession topicSession = topicConnection .createTopicSession(false, Session.AUTO_ACKNOWLEDGE); Topic topic = (Topic) ctx.lookup(topicName); // start the connection topicConnection.start(); // create a topic subscriber javax.jms.TopicSubscriber topicSubscriber = topicSession .createSubscriber(topic); TestMessageListener messageListener = new TestMessageListener(); topicSubscriber.setMessageListener(messageListener); Thread.sleep(5000); topicSubscriber.close(); topicSession.close(); } catch (JMSException e) { e.printStackTrace(); } catch (NamingException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } } catch (NamingException e) { throw new RuntimeException("Error in initial context lookup", e); } catch (JMSException e) { throw new RuntimeException("Error in JMS operations", e); } finally { if (topicConnection != null) { try { topicConnection.close(); } catch (JMSException e) { throw new RuntimeException( "Error in closing topic connection", e); } } } } public class TestMessageListener implements MessageListener { public void onMessage(Message message) { try { System.out.println("Got the Message : " + ((TextMessage) message).getText()); messageReceived = true; } catch (JMSException e) { e.printStackTrace(); } } } } [5] Creat class "TopicPublisher.java" to to Publish the messages package simple; import javax.jms.*; import javax.naming.InitialContext; import javax.naming.NamingException; import java.util.Properties; public class TopicPublisher { private String topicName = "news.sport"; private String initialContextFactory = "org.apache.activemq" +".jndi.ActiveMQInitialContextFactory"; private String connectionString = "tcp://localhost:61616"; public static void main(String[] args) { TopicPublisher publisher = new TopicPublisher(); publisher.publishWithTopicLookup(); } public void publishWithTopicLookup() { Properties properties = new Properties(); TopicConnection topicConnection = null; properties.put("java.naming.factory.initial", initialContextFactory); properties.put("connectionfactory.QueueConnectionFactory", connectionString); properties.put("topic." + topicName, topicName); try { // initialize // the required connection factories InitialContext ctx = new InitialContext(properties); TopicConnectionFactory topicConnectionFactory = (TopicConnectionFactory) ctx .lookup("QueueConnectionFactory"); topicConnection = topicConnectionFactory.createTopicConnection(); try { TopicSession topicSession = topicConnection.createTopicSession( false, Session.AUTO_ACKNOWLEDGE); // create or use the topic System.out.println("Use the Topic " + topicName); Topic topic = (Topic) ctx.lookup(topicName); javax.jms.TopicPublisher topicPublisher = topicSession .createPublisher(topic); String msg = "Hi, I am Test Message"; TextMessage textMessage = topicSession.createTextMessage(msg); topicPublisher.publish(textMessage); System.out.println("Publishing message " +textMessage); topicPublisher.close(); topicSession.close(); Thread.sleep(20); } catch (InterruptedException e) { e.printStackTrace(); } } catch (JMSException e) { throw new RuntimeException("Error in JMS operations", e); } catch (NamingException e) { throw new RuntimeException("Error in initial context lookup", e); } } } [6] Firstly Run "TopicSubscriber.java" and then run "TopicPublisher.java" Here is out put from both TopicSubscriber:: Create Topic Connection for Topic news.sport Got the Message : Hi, I am Test Message TopicPublisher:: Use the Topic news.sport Publishing message ActiveMQTextMessage {commandId = 0, responseRequired = false, messageId = ID:Madhuka-THINK-51683-1359787878456-1:1:1:1:1, originalDestination = null, originalTransactionId = null, producerId = null, destination = topic://news.sport, transactionId = null, expiration = 0, timestamp = 1359787878729, arrival = 0, brokerInTime = 0, brokerOutTime = 0, correlationId = null, replyTo = null, persistent = true, type = null, priority = 4, groupID = null, groupSequence = 0, targetConsumerId = null, compressed = false, userID = null, content = null, marshalledProperties = null, dataStructure = null, redeliveryCounter = 0, size = 0, properties = null, readOnlyProperties = false, readOnlyBody = false, droppable = false, text = Hi, I am Test Message} [More] Here is full message that we have send to TopicSubscriber. We can get that any parameter in above. Here is sample to get TimeStamp and ID from JMS message. public class TestMessageListener implements MessageListener { public void onMessage(Message message) { try { System.out.println("Got the Message TimeStamp: " + message.getJMSTimestamp()); System.out.println("Got the Message JMS ID : " + message.getJMSMessageID()); messageReceived = true; } catch (JMSException e) { e.printStackTrace(); } } } Now go to ActiveMQ server at http://localhost:8161/admin/ See that Topic and message count for that topic. Now you time to check more in 'Active MQ'

March 16, 2013

by Madhuka Udantha

· 18,325 Views

Bring Ruby VCR to Javascript testing with Capybara and puffing-billy

Let’s say you are writing an application in Ruby. You are probably talking to every API under the sun and are happily writing tests to make sure your code isn’t failing. Because you don’t want to rely on 3rd parties or an internet connection to make your tests pass or fail you mock everything with let’s say, Webmock. This also makes your tests much much faster. After all even the fastest internet is much slower than the processor talking to its memory. If you’re too lazy to mock out every API under the sun, you might use VCR to record requests and play them back later. The main advantage being, you don’t have to worry about meticulously reimplementing everything, and you can nuke the recordings at any time to make sure your code still works against the real API. Life is good. Enter Javascript, stage left Then Javascript becomes more and more prominent. Suddenly your application’s logic is shifting from backend to browser and before you know it, most of your tests are pretty irrelevant. You’re fine for a while with Capybara or Cucumber. Launch a headless browser, click around the site from the comfort of RSpec, make sure users see what they’re supposed to. Balance restored. Then you add a payment form. Or something. Suddenly your frontend is talking to an API. In case of Stripe or Balanced it’s even a feature. A great benefit for the user. jQuery(function($) { $('#payment-form').submit(function(event) { var $form = $(this); // Disable the submit button to prevent repeated clicks $form.find('button').prop('disabled', true); Stripe.createToken($form, stripeResponseHandler); // Prevent the form from submitting with the default action return false; }); }); Well that sucks, you’re suddenly back to square one. Your tests take minutes to execute. Your tests fail without an internet connection. Your tests rely on some 3rd party service being up. Your tests suck. Who wants to code when running ~5 tests takes 3 minutes? Nobody. Enter puffing-billy, stage right The problem is that neither Webmock nor VCR can handle requests originating in a browser because they happen in a different thread and they can’t mess around with those. Luckily, a year ago Olly Smith, created puffing-billy. The idea was great – spin up a web proxy, tell your headless browser to use it, when your code makes a request it will go through the proxy, which will try to use a Webmock to handle it, otherwise pass it on to the vast internet. But who wants to mock everything out manually? Over the past few weeks I set upon the task of fixing this problem and restoring sanity to my life. Good tests are transparent to the application and I’ll be damned if I use any of the suggested solutions on the internet like “Well you just put a switch in your code that knows if you’re in a test and then doesn’t talk to Stripe” Screw that. This morning I submitted a pull request to puffing-billy. I added the ability for puffing-billy to behave like it was VCR, but for your browser. When a request is made, it gets cached. The cache is then persisted between sessions, and requests are played back to the browser as needed. It’s not as sophisticated as VCR just yet, but it gets the job done and my test runtime has gone from 3 minutes to just under a minute. That’s a big deal in my book! The caching even understands that some URL’s are needlessly different on every request (social buttons, analytics etc.) so you can configure it to normalize those requests to a single recording that is played back every time. Your tests don’t really rely on gAnalytics working right? And the best thing is, you don’t even have to change your tests. You add something like this in your spec_helper.rb: Billy.configure do |c| c.cache = true c.ignore_params = ["http://www.google-analytics.com/__utm.gif", "http://b.siftscience.com/i.gif", "https://r.twimg.com/jot", "http://p.twitter.com/t.gif", "http://p.twitter.com/f.gif", "http://www.facebook.com/plugins/like.php", "https://www.facebook.com/dialog/oauth", "http://cdn.api.twitter.com/1/urls/count.json"] c.persist_cache = true c.cache_path = 'spec/req_cache/' end # need to call this because of a race condition between persist_cache # being set and the proxy being loaded for the first time Billy.proxy.restore_cache Capybara.javascript_driver = :poltergeist_billy A test for the payment form looks the same as usual: scenario "physical product" do product = start_buying build(:product, :physical, user: @seller, active: true) VCR.use_cassette('Balanced/purchase_with_cc') do within '#new_order' do fill_in 'order_email', with: Faker::Internet.safe_email fill_in_address fill_in_card click_on 'Buy Now' end page.should have_css('#receipt', :visible => true) end validate_receipt product, @seller end Puffing-billy will transparently cache every requests the browser makes and VCR records any requests made by your backend logic. It’s pretty sweet. What do you guys think? I only have 20 days of Ruby experience and the internet has told me it really wants something like this, but I couldn’t find anyone who’s already made it.

March 15, 2013

by Swizec Teller

· 7,990 Views

From Java to PHP

We are welcoming some new colleagues that come from a Java background in the Onebip team, both from the development and operations field. Here's a primer on learning PHP in this situation, that you may find useful when introducing similar people in your PHP-based projects. The absolute basics Before being able to discuss meaningful matters over PHP code, these pages from the manual should be sent to each interested developer. What is the basic syntax and the available data types of PHP. Primitive types are diffused like in Java, but there's no autoboxing as there is no object equivalent for them. Strings are immutable and one of the most important types. They can be defined with single or double quotes, and their manipulation functions follow the C api. Arrays are the glue of the language, working as lists, sets and maps while wrapped into objects. They can always be traversed with foreach(), and keep an eye on the available functions to avoid rewriting array_search() or sort() by accident. Operators work differently on primitive values and on objects: == is different from === in both these context, but in two ways. Other everyday operators are `.` and `+=`. PHP's object paradigm is borrowed from Java: it supports concrete and abstract classes, interfaces, and the private, protected and public scopes. Type hints (which are actually not hints but strong preconditions) are what is most similar to Java static type safety mechanisms. I personally strongly favors their usage. Namespaces are packages, use statements are imports. However, you don't always have to write them. The deployment model of PHP consists of an interpreter instantiated N times, running on many shared nothing processes. Don't care about: The installation process of Apache and PHP: if you work in a team, you will get good help on that, and you're going to do this only once per project probably. For example, we provide a virtual machine ready for development. The function syntax is also not very useful by itself in 2013, skip directly to objects and their methods. Take a look at anonymous functions, however. Exceptions work mostly as in Java, so don't bother reading about them before coding. Sessions are also evil in web services nowadays, so ignore anything that start with $_SESSION or session_*(): write shared nothing services (that may be restricted to our team and projects.) In general, ignore also low-level APIs like setcookie() or exec() if you already have an higher-level abstraction in the application you're working on, being this abstraction a library or your own code. It's important to know how cookies are transmitted according to HTTP, but coming from another language you already know the protocol. So you write objects that go into a graph At least that's how I see programming these days. However, you have to exit the process sometimes, and PHP provides several APIs for that. The database APIs such as the mongo and PDO extensions, working respectively with MongoDB and relational database. However, if you already use a database abstraction layer, learn just that as there is nothing interesting to see at this lower level. The DateTime object and its cousins, at least know how to call its constructor and the format() method. You should know these APIs exist in order to look them up on demand: json_*() functions, the SimpleXMLElement class, the mcrypt extension and hash_hmac(), mail(). They're just a click away at php.net/function_or_class_name, and I think these names are pretty self-explanatory to tell you when you should read about them. Watch out for: Some topics are always controversial, and get confusing for PHP beginners. Raise your alert level when you feel you're encountering some strange behavior. the difference between == and === comes out often, and it cannot be reduced to "just use === by default" since all primitive types coming from HTTP requests in the classic x-www-form-urlencoded Content-Type are usually strings. php.ini settings may change the output of your application and the actual flow of execution depending on error reporting levels and display options. One never ceases to learn, so when you encounter some problematic directive, take a quick read of the rest of the directives for that extension. Since some people believe web development is the concatenation of strings (it is not), it is tempting to reinvent the wheel; for example, writing functions for composing URLS from paths and query strings. http_build_query() solves that problem as other Composer-ready packages do, so take a look around before starting to implement commodity algorithms yourself. Stack Overflow and the PHP manual itself are good places to start if your problem has been already solved by someone else in the last 10 years. Perform a kata To test your level of proficiency with PHP, execute a small kata with TDD which incidentally will also check your usage of PHPUnit. For example, classic katas are: The Fizz Buzz, maybe with a web interface. The Roman numbers kata: write a function that transforms 42 in XLII. The Game of Life, although it is probably too complex for your first PHP project if you didn't already solve it in another language.

March 13, 2013

by Giorgio Sironi

· 26,557 Views

In-Memory Data Grids

Introduction The IT buzzword of 2012 is without a doubt Big Data. It’s new and here to stay, and for a good reason. Big data is data that exceeds the processing capacity of conventional database systems. Great examples are CERN with the Large Hadron Collider, whose experiments generate 25 petabytes of data annually, or Walmart, which handles more than one million customer transaction every hour. Problems These vast amounts of data leave us with two problems. Problem 1: To gain value from this data, one must choose an alternative way to process it. The value of big data to an organization falls into two categories: analytical use, and enabling new products. Big data analytics can reveal insights hidden previously by data too costly to process, such as peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data, in contrast to the somewhat static nature of running predetermined reports. Problem 2: The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. Remember the CERN case where the LHC produces over 25 Petabytes of data annually? No “classic” database architecture or setup is capable of holding these amounts of data. Solutions Fortunately, both problems can be solved by implementing the correct infrastructure and rethinking data storage. There are two critical factors in Big Data environments: size and speed. We already discussed the vast amounts of data and desire to be able to access and process the data fast. The latter is the main differentiator from more traditional data warehouses. Just imagine what you can do when you can access all your data real-time. Enter big data. A common Big Data implementation is an in-memory data grid that lives in a distributed cluster, ensuring both speed, by storing data in-memory, and capacity by using scalability features provided by a cluster. As a bonus, availability is ensured by using a distributed cluster. As for the data storage, there are typically two kinds: in-memory databases and in-memory data grids. But first some background. It is not a new attempt to use main memory as a storage area instead of a disk. In our daily lives there are numerous examples of main memory databases (MMDB), as they perform much faster than disk-based databases. An every day example is a mobile phone. When you SMS or call someone most mobile service providers use MMDB to get the information on your contact as soon as possible. The same applies to your phone. When someone calls you, the caller details are looked up in the contacts application, usually providing a name and sometimes a picture. In memory data grids In Memory Data Grid (IMDG) is the same as MMDB in that it stores data in main memory, but it has a totally different architecture. The features of IMDG can be summarized as follows: Data is distributed and stored on multiple servers. Each server operates in the active mode. A data model is usually object-oriented (serialized) and non-relational. According to the necessity, you often need to add or reduce servers. No traditional database features such as tables. In other words, IMDG is designed to store data in main memory, ensure scalability and store an object itself. These days, there are many IMDG products, both commercial and open source. Some of the most commonly used products are: Hazelcast (http://www.hazelcast.com) JBoss Infinispan (http://www.jboss.org/infinispan) GridGain DataGrid (http://www.gridgain.com/features/in-memory-data-grid/) VMware Gemfire (http://www.vmware.com/nl/products/application-platform/vfabric-gemfire/overview.html) Oracle Coherence (http://www.oracle.com/technetwork/middleware/coherence/overview/index.html) Gigaspaces XAP (http://www.gigaspaces.com/datagrid) Terracotta Enterprise Suite (http://terracotta.org/products/enterprise-suite) Why Memory? The main reasons for using main memory for data storage are once again the two main themes of Big Data: speed and capacity. The processing performance of main memory is 800 times faster than an HDD and up to 40 times faster than an. Moreover, the latest x86 server supports main memory of hundreds of GB per server. It is said that the limit of a traditional processing database’s (OLTP) data capacity is approximately 1 TB and that the OLTP processing data capacity would not increment well. If servers using main memory of 1 TB or larger become more commonly used, you will be able to conduct operations with the entire data placed in main memory, at least in the field of OLTP. IMDG Architecture To use main memory as a storage area, two weak points should be overcome: Limited capacity: involves data that exceeds the maximum capacity of the main memory of the server Reliability: involves data loss in case of a (system) failure. IMDG overcomes the limit of capacity by ensuring horizontal scalability using a distributed architecture, and resolves the issue of reliability through a replication system as part of the grid (or a distributed cluster). Now let’s discuss how an IMDG actually works. First of all, it is important to understand that an IMDG is not the same as an in-memory database, also referred to as MMDB (main memory databases). Typical examples of MMDBs are Oracle TimesTen or Sap Hana. MMDBs are full database products that simply reside in memory. As a result of being a full-blown database, they also carry the weight and overhead of database management features. IMDG is different. No tables, indexes, triggers, stored procedures, process managers etc. Just plain storage. The data model used in IMDG is key-value pairs. A key-value pair is a list with only two parts: a key and a value. The key can be used for storing and retrieving the values in the list. A key can be compared to the index or primary key of a table in a database. Note that IMDG are closely tied to development environments such as Java as the key-value pairs are represented by the structures provided by such a programming environment. Most IMDGs are written in Java, and can only be used within other Java applications. Therefore, the values of key-value pairs can be anything supported by Java, ranging from simple data types such as a string or number, to complex objects. This overcomes the two important hurdles: as you can store complex Java objects as value, there’s no need to translate these objects into a relational datamodel (which is the case in more traditional applications using a database for storage). Furthermore, the seeming limitation of being able to store only one value per key, is actually no limitation at all. Large memory sizes Most of the products introduced above use Java as an implementation language. Java reserves and uses a part of the RAM (internal memory) for dynamic memory allocation. This reserved memory space is called the Java heap. All runtime objects created by a Java application are stored in heap. Using large amounts of data causes two problems. Size limitation: By default, the heap size is 128 MB, but for current business applications, this limit is reached easily. Once the heap is “full”, no new objects can be created and the Java application will show some nasty errors. Performance: It is possible to increase the size of the heap, but this introduces some new problems. When a heap reaches a size of more than 4 gigabytes, Java will have serious issues with memory managements, causing your application to slow down or even freeze. Java has a feature called Garbage Collector, which periodically scans the heap and checks each object if it is still valid and being used. If not, the garbage collector removes the object and defragments the newly available space. The problem is, the larger the heap size, the more work to do for the garbage collector, resulting in performance degradation. Imagine a large bank has a Java application that manages customers, accounts and transactions. We have seen that an IMDG allows the application to store and access all data very quickly by caching it in memory, instead of storing the data in relatively slow databases. Let’s assume the combined data has a size of 40 gigabytes. Storing it in heap is simply not possible, considering the performance penalties of Java’s memory management capabilities. The graph below illustrates the garbage collection pause time when placing cached data in heap: Terracotta’s BigMemory product has a method to overcome these limitations. The method is to use an off-heap memory (direct buffer). Data will not be stored in Java’s heap, but directly in the available internal memory (RAM). Since, this is not subject to Java’s garbage collector, there are no performance penalties. The differences on performance are significant, as can be seen in the graph below: Using off-heap storage has some major benefits: You can use all the available memory on your machine, not just the memory that is allocated to the heap (usually less that 512 Mb). This allows you to store more data in a in-memory data grid, greatly speeding up your application. The heap can be relieved by storing data in native memory, speeding up Java applications as less heap space has to be garbage collected. Clustering, fail over and high availability So far, we have seen IMDG features that are applicable to a single server. However, the real power of IMDG lies in it’s networking and clustering capabilities, providing features as data replication, data synchronization between clients, fail over and high availability. To achieve this, a cluster of servers (or server array) acts a backbone of the infrastructure. Applications (that still can have their own IMDG or off-heap cache) that are connected to the cluster can share, replicate and backup their data with either the cluster or other applications. The graph below depicts a typical setup using Terracotta's BigMemory: The caches on the application servers are usually referred to as “level 1” cache, while the data cache on the server array is referred to as “level 2” cache. There are many different scenarios possible for storing, clustering, synchronizing and replicating data. Covering all these topics goes far beyond the scope of this article. For more information, consult the technical documentation of the product of your choice. Conclusion Big Data brings us some new challenges. First of all, storing and accessing vast amounts of data makes us rethink traditional methods and technologies. Next, there’s the question what to do with all the available data. The potential value for marketing, financial and other businesses is huge. In order to facilitate Big Data, in-memory data grids are considered the best option. IMDGs with off-heap storage are even more powerful, allowing data centric enterprise application to overcome certain limits of the Java platform, such as memory and performance constraints. As the amount of data that (large) companies produce and store, grows exponentially, databases will hit a limit. Accessing your data without a performance penalty simply will not be possible. The answer to this is using an IMDG.

March 13, 2013

by Roy Prins

· 32,627 Views · 5 Likes

Database Concepts for a java Dev: Database Normalization

In this part, I will be briefing about different types of Database Normalizations using a sample data model. What is Database Normalization? Normalization is the process of efficiently organizing data in the database. Primary Goal of Normalization? Eliminating redundant data & ensuring meaningful data dependencies. Types of Normalization The following are the three most common normal forms in the database normalization process First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Sample Data Model for Demonstration The following data model will be used to demonstrate all the three normal forms First Normal Form (1NF) First Normal Form (1NF) sets the very basic rules for an organized database: Create separate set of tables for each group of related data and identify each row with a unique columns [primary key] or set of columns [composite key] Eliminate duplicate columns from the table The following data model depicts the tables after 1NF rules are applied - Second Normal Form (2NF) Second Normal Form (2NF) further addresses the concept of removing duplicate data: Meet all the requirements of the first normal form Remove subsets of data that apply to multiple rows of a table and place them in separate tables Create relationships between these new tables and their predecessors through the use of foreign keys So basically the objective of the Second Normal Form is to take that is only partly dependent on the primary key and enter that data into another table. The following data model depicts the tables after 2NF rules are applied. Data from EMPLOYEE_TABLE is split into 2 tables – EMPLOYEE_TABLE and EMPLOYEE_HR_TABLE. Similarly data from CUSTOMER_TABLE is moved to CUSTOMER_TABLE and CUSTOMER_ORDER table Third Normal Form (3NF) Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key. The following data model depicts the tables after 3NF rules are applied. Further state and country details are moved to their own tables because they are not dependent on the primary key. Advantages of Normalizing the Database There are several advantages of normalization - Data can be stored as small atomic pieces Saves space Increases speed Reduces data anomalies Easy maintenance Other parts of this series include: Part 1 – ACID Properties Part 2 – Keys Part 4 – Database Transactions [coming soon] Part 5 – Indexes [coming soon]

March 13, 2013

by Jagadeesh Motamarri

· 10,907 Views · 1 Like

Maven's Non-Resolvable Parent POM Problem

Need help dealing with Maven's non-resolvable parent problem? Check out this post to learn how.

March 12, 2013

by Roger Hughes

· 462,936 Views · 8 Likes

Compare RESTful vs. SOAP Web Services

There are currently two schools of thought in developing Web Services – one being the standards-based traditional approach [ SOAP ] and the other, simpler school of thought [ REST ]. This article quickly compares one with the other - REST SOAP Assumes a point-to-point communication model–not usable for distributed computing environment where message may go through one or more intermediaries Designed to handle distributed computing environments Minimal tooling/middleware is necessary. Only HTTP support is required Requires significant tooling/middleware support URL typically references the resource being accessed/deleted/updated The content of the message typically decides the operation e.g. doc-literal services Not reliable – HTTP DELETE can return OK status even if a resource is not deleted Reliable Formal description standards not in widespread use. WSDL 1.2, WADL are candidates. Well defined mechanism for describing the interface e.g. WSDL+XSD, WS-Policy Better suited for point-to-point or where the intermediary does not play a significant role Well suited for intermediated services No constraints on the payload Payload must comply with the SOAP schema Only the most well established standards apply e.g. HTTP, SSL. No established standards for other aspects. DELETE and PUT methods often disabled by firewalls, leads to security complexity. A large number of supporting standards for security, reliability, transactions. Built-in error handling (faults) No error handling Tied to the HTTP transport model Both SMTP and HTTP are valid application layer protocols used as Transport for SOAP Less verbose More verbose

March 8, 2013

by Jagadeesh Motamarri

· 167,193 Views · 2 Likes

SLF4J with Logback

It is common that most of the developers use their own logging frameworks at the time of development and that force organizations to maintain configuration for each logging framework. Normally switching from logging level from DEBUG to INFO sometimes requires a application restart in production. SLF4J is the latest logging facade helps to plug-in desired logging framework at deployment time. The article further talks about usage of the SLF4J with logback. SLF4J - Simple Logging Facade for Java API helps to plug-in desired logging implementation at deployment time. Logback - helps to change logging configuration through JMX at runtime with out restarting your applications in production. I hope this article helps to get high level overview of SLF4J/Logback and migrating your existing apps to common logging approach. SLF4J This is simple logging façade to abstract the various logging frameworks such as logback, log4j, commons-logging and default java logging implementation (java.util.logging). This primarily enables the user to inlcude desired logging framework at deployment time. It is lightweight and nearly adds a zero overhead on performance. Note that SLF4j doesn’t replace any logging framework; it is just a façade around any standard logging framework. If slf4j doesn’t find any logging framwork in classpath, by default it prints the logs in console. Logback This is an improved version of log4j and natively supports the slf4j, hence migrating from other logging frameworks such as log4j and java.util.logging is quite possible. Since the logback natively supports slf4j, the combination of using slf4j with this framework is relatively faster than the slf4j with other logging frameworks. Logging configuration can be done either in xml or groovy. *One important feature is that it exposes the configuration through JMX hence configuration (debug to info etc) can be changed via JMX console with out restarting the application. Also, it does print the artifact version as part of the exception stacktrace that may be helpful for debugging. java.lang.NullPointerException: null at com.fimt.poc.LoggingSample.(LoggingSample.java:16) [classes/:na] at com.fimt.poc.LoggingSample.main(LoggingSample.java:23) [fimt-logging-poc-1.0.jar/:1.0] **Reasons to prefer logback over log4j is well explainedhere SLF4J api usage in java classes (1) Import the Logger and LoggerFactory from org.slf4j package import org.slf4j.Logger; import org.slf4j.LoggerFactory; (2) Declare the logger class as, private final Logger logger = LoggerFactory.getLogger(LoggingSample.class); (3) Use debug, warn, info, error and trace with appropriate parameters. All methods by default takes the string as an input. logger.info("This is sample info statement"); SLF4J with Logback Include the following dependency pom.xml, it pulls its depedencies logback-core and slf4j-api in addition to logback-classic ch.qos.logback logback-classic 1.0.7 SLF4J can be used with existing logging framworks log4j, common-logging and java.util.logging (JUL). Required dependencies are mentioned below. SLF4J with Log4j Include the following dependency pom.xml, it pulls its depedencies log4jand slf4j-api in addition to slf4j-log4j12 artifact. org.slf4j slf4j-log4j12 1.7.2 SLF4J with JUL (java.util.logging) Include the following dependency pom.xml, it pulls its depedency slf4j-api in addition to slf4j-jdk14 artifact. org.slf4j slf4j-jdk14 1.7.2 Migrating existing projects logging to logback framework Step: 1 – Update existing project pom.xml Add the right dependency mentioned above. Also remove the unused log4j/commons logging dependencies. Step: 2 – Update java files with SLF4J API Scan all java files and replace log4j or java.util.logging classes in to SLF4J api classes. This can be done using the tool java -jar slf4j-migrator-1.7.2.jar . Detailed documentation and limitation is mentioned here This tool replaces the logging apis of log4j, commons-logging and java.util.logging in to SLF4J API classes. Step: 3 – Convert log4j.properties to logback.xml Logback provides a online translator which converts log4j properties in to logback.xml File Appenders: Like other logging frameworks, logback implementation also supports various file appenders. Daily file rolling: logFile.%d{yyyy-MM-dd}.log 30 Roll log files based on size: tests.%i.log.zip 1 10 5MB Layout %-4relative [%thread] %-5level %logger{35} - %msg%n

March 8, 2013

by Sam Kyatham

· 15,779 Views · 2 Likes

Advanced ListenableFuture Capabilities

Last time we familiarized ourselves with ListenableFuture. I promised to introduced more advanced techniques, namely transformations and chaining. Let's start from something straightforward. Say we have our ListenableFuture which we got from some asynchronous service. We also have a simple method: Document parse(String xml) {//... We don't need String, we need Document. One way would be to simply resolve Future (wait for it) and do the processing on String. But much more elegant solution is to apply transformation once the results are available and treat our method as if was always returning ListenableFuture. This is pretty straightforward: final ListenableFuture future = //... final ListenableFuture documentFuture = Futures.transform(future, new Function() { @Override public Document apply(String contents) { return parse(contents); } }); or more readable: final Function parseFun = new Function() { @Override public Document apply(String contents) { return parse(contents); } }; final ListenableFuture future = //... final ListenableFuture documentFuture = Futures.transform(future, parseFun); Java syntax is a bit limiting, but please focus on what we just did. Futures.transform() doesn't wait for underlying ListenableFuture to apply parse() transformation. Instead, under the hood, it registers a callback, wishing to be notified whenever given future finishes. This transformation is applied dynamically and transparently for us at right moment. We still have Future, but this time wrapping Document. So let's go one step further. We also have an asynchronous, possibly long-running method that calculates relevance (whatever that is in this context) of a given Document: ListenableFuture calculateRelevance(Document pageContents) {//... Can we somehow chain it with ListenableFuture we already have? First attempt: final Function> relevanceFun = new Function>() { @Override public ListenableFuture apply(Document input) { return calculateRelevance(input); } }; final ListenableFuture future = //... final ListenableFuture documentFuture = Futures.transform(future, parseFun); final ListenableFuture> relevanceFuture = Futures.transform(documentFuture, relevanceFun); Ouch! Future of future of Double, that doesn't look good. Once we resolve outer future we need to wait for inner one as well. Definitely not elegant. Can we do better? final AsyncFunction relevanceAsyncFun = new AsyncFunction() { @Override public ListenableFuture apply(Document pageContents) throws Exception { return calculateRelevance(pageContents); } }; final ListenableFuture future = //comes from ListeningExecutorService final ListenableFuture documentFuture = Futures.transform(future, parseFun); final ListenableFuture relevanceFuture = Futures.transform(documentFuture, relevanceAsyncFun); Please look very carefully at all types and results. Notice the difference between Function and AsyncFunction. Initially we got an asynchronous method returning future of String. Later on we transformed it to seamlessly turn String into XML Document. This transformation happens asynchronously, when inner future completes. Having future of Document we would like to call a method that requires Document and returns future of Double. If we call relevanceFuture.get(), our Future object will first wait for inner task to complete and having its result (String -> Document) will wait for outer task and return Double. We can also register callbacks on relevanceFuture which will fire when outer task (calculateRelevance()) finishes. If you are still here, the are even more crazy transformations. Remember that all this happens in a loop. For each web site we got ListenableFuture which we asynchronously transformed to ListenableFuture. So in the end we work with a List>. This also means that in order to extract all the results we either have to register listener for each and every ListenableFuture or wait for each of them. Which doesn't progress us at all. But what if we could easily transform from List> to ListenableFuture>? Read carefully - from list of futures to future of list. In other words, rather than having a bunch of small futures we have one future that will complete when all child futures complete - and the results are mapped one-to-one to target list. Guess what, Guava can do this! final List> relevanceFutures = //...; final ListenableFuture> futureOfRelevance = Futures.allAsList(relevanceFutures); Of course there is no waiting here as well. Wrapper ListenableFuture> will be notified every time one of its child futures complete. The moment the last child ListenableFuture completes, outer future completes as well. Everything is event-driven and completely hidden from you. Do you think that's it? Say we would like to compute the biggest relevance in the whole set. As you probably know by now, we won't wait for a List. Instead we will register transformation from List to Double! final ListenableFuture maxRelevanceFuture = Futures.transform(futureOfRelevance, new Function, Double>() { @Override public Double apply(List relevanceList) { return Collections.max(relevanceList); } }); Finally, we can listen for completion event of maxRelevanceFuture and e.g. send results (asynchronously!) using JMS. Here is a complete code if you lost track: private Document parse(String xml) { return //... } private final Function parseFun = new Function() { @Override public Document apply(String contents) { return parse(contents); } }; private ListenableFuture calculateRelevance(Document pageContents) { return //... } final AsyncFunction relevanceAsyncFun = new AsyncFunction() { @Override public ListenableFuture apply(Document pageContents) throws Exception { return calculateRelevance(pageContents); } }; //... final ListeningExecutorService pool = MoreExecutors.listeningDecorator( Executors.newFixedThreadPool(10) ); final List> relevanceFutures = new ArrayList<>(topSites.size()); for (final URL siteUrl : topSites) { final ListenableFuture future = pool.submit(new Callable() { @Override public String call() throws Exception { return IOUtils.toString(siteUrl, StandardCharsets.UTF_8); } }); final ListenableFuture documentFuture = Futures.transform(future, parseFun); final ListenableFuture relevanceFuture = Futures.transform(documentFuture, relevanceAsyncFun); relevanceFutures.add(relevanceFuture); } final ListenableFuture> futureOfRelevance = Futures.allAsList(relevanceFutures); final ListenableFuture maxRelevanceFuture = Futures.transform(futureOfRelevance, new Function, Double>() { @Override public Double apply(List relevanceList) { return Collections.max(relevanceList); } }); Futures.addCallback(maxRelevanceFuture, new FutureCallback() { @Override public void onSuccess(Double result) { log.debug("Result: {}", result); } @Override public void onFailure(Throwable t) { log.error("Error :-(", t); } }); Was it worth it? Yes and no. Yes, because we learned some really important constructs and primitives used together with futures/promises: chaining, mapping (transforming) and reducing. The solution is beautiful in terms of CPU utilization - no waiting, blocking, etc. Remember that the biggest strength of Node.js is its "no-blocking" policy. Also in Netty futures are ubiquitous. Last but not least, it feels very functional. On the other hand, mainly due to Java syntax verbosity and lack of type inference (yes, we will jump into Scala soon) code seems very unreadable, hard to follow and maintain. Well, to some degree this holds true for all message driven systems. But as long as we don't invent better APIs and primitives, we must learn to live and take advantage of asynchronous, highly parallel computations. If you want to experiment with ListenableFuture even more, don't forget to read official documentation.

March 7, 2013

by Tomasz Nurkiewicz

· 23,495 Views · 1 Like

Sequelize, the JavaScript ORM, in practice

node.js is well-know for its good connectivity with nosql databases. a less know fact is that it's also very efficient with relational databases. among the dozens orms out there in javascript, one stands out for relational databases: sequelize . it's quite easy to learn but there are not many pointers about how to organize model code with this module. here are a few tips we learned by using sequelize in a medium size project. sequelize 101 sequelize claims to supports mysql, postgresql and sqlite. the sequelize docs explain the first steps with the javascript orm. first, initialize a database connection, then a few models, without worrying about primary or foreign keys: var sequelize = new sequelize('database', 'username'[, 'password']) var project = sequelize.define('project', { title: sequelize.string, description: sequelize.text }); var task = sequelize.define('task', { title: sequelize.string, description: sequelize.text, deadline: sequelize.date }); project.hasmany(task); next, create new instances and persist them, query the database, etc. // create an instance var task = task.build({title: 'very important task'}) task.title // ==> 'very important task' // persist an instance task.save() .error(function(err) { // error callback }) .success(function() { // success callback }); // query persistence for instances var tasks = task.all({ where: ['dealdine < ?', new date()] }) .error(function(err) { // error callback }) .success(function() { // success callback }); sequelize uses promises so you can chain error and success callbacks, and it all plays well with unit tests. all that is pretty well documented, but the sequelize documentation only covers the basic usage. once you start using sequelize in real world projects, finding the right way to implement a feature gets trickier. model file structure all the examples in the sequelize documentation show all model declarations grouped in a single file. once a project reaches production size, this is not a viable approach. the alternative is to import models from a module using sequelize.import() . the problem is that relationships rely on several models. when you declare a relationship, models from both sides of the relationship must already be imported. you should not import model files from other model files because of node.js module caching policy (more on that later on); instead, you can define relationships in a standalone file. here is the file structure we've been working with: models/ index.js # import all models and creates relationships phonenumber.js task.js user.js ... and here is how the main models/index.js initializes the entire model: var sequelize = require('sequelize'); var config = require('config').database; // we use node-config to handle environments // initialize database connection var sequelize = new sequelize( config.name, config.username, config.password, config.options ); // load models var models = [ 'phonenumber', 'task', 'user' ]; models.foreach(function(model) { module.exports[model] = sequelize.import(__dirname + '/' + model); }); // describe relationships (function(m) { m.phonenumber.belongsto(m.user); m.task.belongsto(m.user); m.user.hasmany(m.task); m.user.hasmany(m.phonenumber); })(module.exports); // export connection module.exports.sequelize = sequelize; using models in code from other parts of the application, if you need a model class, require the models/index.js instead of the standalone model file. that way, you don't have to repeat the sequelize initialization. var models = require('./models'); var user = models.user; var user = user.build({ first_name: "john", last_name: "doe "}); the problem is, when you require the models/index.js file, node may use a cached version of the module... or not: from nodejs.org : multiple calls to require('foo') may not cause the module code to be executed multiple times. (...) modules are cached based on their resolved filename. since modules may resolve to a different filename based on the location of the calling module (loading from node_modules folders), it is not a guarantee that require('foo') will always return the exact same object, if it would resolve to different files. that means that using require('./models') to get the models may create more than one connection to the database. to avoid that, the models variable must be somehow singleton-esque. this can be easily achieved, if you're using a framework like expressjs , by attaching the models module to the application: app.set('models', require('./models')); and when you need to require a class of the model in a controller, use this application setting rather than a direct import: var user = app.get('models').user; accessing other models sequelize models can be extended with class and instance methods. you can add abilities to model classes, much like in a true activerecord implementation. but a problem arises when adding a method that depends on another model: how can a model access another one? // in models/user.js module.exports = function(sequelize, datatypes) { return sequelize.define('user', { first_name: datatypes.string, last_name: datatypes.string, }, { instancemethods: { counttasks: function() { // how to implement this method ? } } }); }; if the two models share a relationship, there is a way. here, one user has many tasks , that makes the task model accessible through user.associations['tasks'].target . and here is yet another problem: since sequelize doesn't use prototype-based inheritance, how can a user instance gain access to the user class? digging into the sequelize source brings the protected __factory to the light. with all this undocumented knowledge, it is now possible to write the counttasks() instance method: counttasks: function() { return this.__factory.associations['tasks'].target.count({ where: { user_id: this.id } }); } note that task.count() returns a promise, so counttasks() also returns a promise: user.counttasks().success(function(nbtasks) { // do somethig with the user task count }); extending models (a.k.a behaviors) what if you need to reuse several methods across several models? sequelize doesn't have a behavior system per se (or "concerns" in the ruby on rails terminology), although it's quite easy to implement . for now, you're condemned to import common methods before the call to sequelize.define() and use sequelize.utils._.extend() to add it to the instancemethods or classmethods object: // in models/friendlyurl.js module.exports = function(keys) { return { geturl: function() { var ret = ''; keys.foreach(function(key) { ret += this[key]; }) return ret .tolowercase() .replace(/^\s+|\s+$/g, "") // trim whitespace .replace(/[_|\s]+/g, "-") .replace(/[^a-z0-9-]+/g, "") .replace(/[-]+/g, "-") .replace(/^-+|-+$/g, ""); } }; } // in models/user.js var friendlyurlmethods = require('./friendlyurl')(['first_name', 'last_name']); module.exports = function(sequelize, datatypes) { return sequelize.define('user', { first_name: datatypes.string, last_name: datatypes.string, }, { instancemethods: sequelize.utils._.extend({}, friendlyurlmethods, { counttasks: function() { return this.__factory.associations['tasks'].target.count({ where: { user_id: this.id } }); } }); }) }; now the user model instances gain access to a geturl() method: var user = user.build({ first_name: 'john', last_name: 'doe' }); user.geturl(); // 'john_doe' a limitation of this trick is that you must define behaviors before the actual model. this forbids behaviors accessing other models. query series sequelize provides a tool called the querychainer to ease the resynchronization of queries. new sequelize.utils.querychainer() .add(user, 'find', [id]) .add(task, 'findall') .error(function(err) { /* hmm not good :> */ }) .success(function(results) { var user = results[0]; var tasks = results[1]; // do things with the results }); after using it a little, this utility turns out to be very limited. most notably, querychainer executes all queries in parallel by default. and you only get access to the results of the queries in the final callback - no way to pass values from one query to the other. we've found it much more convenient to use a generic resynchronizing module like async , which provides the wonderful async.auto() utility. this method lets you list tasks together with dependencies, and determines which task can be run in parallel, and which must be run in series. async.auto([ user: function(next) { user.find(id).complete(next); }, tasks: ['user', function(next) { tasks.findall({ where: { user_id: user.id } }).complete(next); }] ], function(err, results) { var user = results.user; var tasks = results.tasks; // do things with the results }); notice the complete() method, which is an alternative to the two success() and error() callbacks. complete() accepts a callback with the signature (err, res) , which is more natural in the node.js world, and compatible with async . prefetching one thing orms are usually good at is minimizing queries. sequelize offers a prefetching feature, allowing to group two queries in a single one using a join. for instance, if you want to retrieve a task together with the related user, write the query as follows: task.find({ where: { id: id } }, include: ['user']) .error(function(err) { // error callback }) .success(function(task) { task.getuser(); // does not trigger a new query }); this is another undocumented feature, although the documentation should be updated soon . migrations sequelize provides a migration command line utility. but because it only allows modifying the model using sequelize commands (and not calling any asynchronous method of your own ), this migration command falls short. as of now, we've been handling migrations manually using numbered sql files and a custom utility to run them in order. custom sql queries sequelize is built over database adapters, and as such provides a way to execute arbitrary sql queries against the database. here is an example: var util = require('util'); var query = 'select * from `task` ' + 'left join `user` on `task`.`userid` = `user`.`id` ' + 'where `user`.`last_name` = %s'; var escapedname = sequelize.constructor.utils.escape(last_name); querywithparams = util.format(query, escapedname); sequelize.query(querywithparams, task) .error(function(err) { // error callback }) .success(function(tasks) { task.getuser(); // does not trigger a new query }); sequelize.query() returns a promise just like other query functions. if you provide the model to use for hydration ( task in this case), the query() method returns model instances rather than a simple json. note that you must escape values by hand before concatenating them into the sql query. for strings, sequelize.constructor.utils.escape() is your friend. for integers, util.format('%d') should do the trick. conclusion is sequelize ready for prime time ? almost. the learning curve is made longer by an incomplete documentation, but most of the features required by a production-level orm are there. however, i wouldn't recommend it for production just yet if you're not ready to run on your own fork, since the rate at which prs are merged in the sequelize github repository is low.

March 5, 2013

by Francois Zaninotto

· 52,839 Views · 2 Likes

ListenableFuture in Guava

ListenableFuture in Guava is an attempt to define consistent API for Future objects to register completion callbacks. With the ability to add callback when Future completes, we can asynchronously and effectively respond to incoming events. If your application is highly concurrent with lots of future objects, I strongly recommend using ListenableFuture whenever you can. Technically ListenableFuture extends Future interface by adding simple void addListener(Runnable listener, Executor executor) method. That's it. If you get a hold of ListenableFuture you can register Runnable to be executed immediately when future in question completes. You must also supply Executor (ExecutorService extends it) that will be used to execute your listener - so that long-running listeners do not occupy your worker threads. Let's put that into action. We will start by refactoring our first example of web crawler to use ListenableFuture. Fortunately in case of thread pools it's just a matter of wrapping them using MoreExecutors.listeningDecorator(): ListeningExecutorService pool = MoreExecutors.listeningDecorator(Executors.newFixedThreadPool(10)); for (final URL siteUrl : topSites) { final ListenableFuture future = pool.submit(new Callable() { @Override public String call() throws Exception { return IOUtils.toString(siteUrl, StandardCharsets.UTF_8); } }); future.addListener(new Runnable() { @Override public void run() { try { final String contents = future.get(); //...process web site contents } catch (InterruptedException e) { log.error("Interrupted", e); } catch (ExecutionException e) { log.error("Exception in task", e.getCause()); } } }, MoreExecutors.sameThreadExecutor()); } There are several interesting observations to make. First of all notice how ListeningExecutorService wraps existing Executor. This is similar to ExecutorCompletionService approach. Later on we register custom Runnable to be notified when each and every task finishes. Secondly notice how ugly error handling becomes: we have to handle InterruptedException (which should technically never happen as Future is already resolved and get() will never throw it) and ExecutionException. We haven't covered that yet, but Future must somehow handle exceptions occurring during asynchronous computation. Such exceptions are wrapped in ExecutionException (thus the getCause() invocation during logging) thrown from get(). Finally notice MoreExecutors.sameThreadExecutor() being used. It's a handy abstraction which you can use every time some API wants to use an Executor/ExecutorService (presumably thread pool) while you are fine with using current thread. This is especially useful during unit testing - even if your production code uses asynchronous tasks, during tests you can run everything from the same thread. No matter how handy it is, whole code seems a bit cluttered. Fortunately there is a simple utility method in fantastic Futures class: Futures.addCallback(future, new FutureCallback() { @Override public void onSuccess(String contents) { //...process web site contents } @Override public void onFailure(Throwable throwable) { log.error("Exception in task", throwable); } }); FutureCallback is a much simpler abstraction to work with, resolves future and does exception handling for you. Also you can still supply custom thread pool for listeners if you want. If you are stuck with some legacy API that still returns Future you may try JdkFutureAdapters.listenInPoolThread() which is an adapter converting plain Future to ListenableFuture. But keep in mind that once you start using addListener(), each such adapter will require one thread exclusively to work so this solution doesn't scale at all and you should avoid it. Future future = //... ListenableFuture listenableFuture = JdkFutureAdapters.listenInPoolThread(future); Once we understand the basics we can dive deeply into biggest strength of listening futures: transformations and chaining. This is advanced stuff, you have been warned.

March 4, 2013

by Tomasz Nurkiewicz

· 41,201 Views · 3 Likes

8 Lessons in Deployment Tooling Lessons Learned

It didn’t take long. A few months after we released an open source continuous integration tool (Anthill) in 2001, we were asked, “It’s great that I have the build setup, now how do I deploy to the test lab?” That email started a clear transformation in our thinking. Lesson 1: Builds generally exist to be tested or released to customers.The corollary is: Continuous integration is not about build, it is about quality and checking quality generally requires a deployment or six. In 2005, we updated our AnthillPro 2.5 with a shiny new deployment capability. It worked ok, but with that generation of tool being so build oriented, it was never a clean fit. Lesson 2: Deployments are a serious challenge, and can not be bolted on to a build tool. This lesson has been reinforced over the years as we’ve watched the results of various tools tacking on deployments. In 2006, we released AnthillPro 3. Oh boy, what a change. Now deployments and other stuff you do to builds days and weeks later were their own thing. Demonstrating a stunning lack of marketing savvy, we called them “non-originating workflows” (processes that don’t make builds) and declared a new type of tool, an “Application Lifecycle Automation Server”. Today, you’d call it a Continuous Delivery tool (thank you Jez and Dave for the better name). To my knowledge, AnthillPro was the first tool to really take a build lifecycle or build pipeline seriously and had the ambition to manage from continuous integration build through to production release quickly and with a solid audit trail. That brings us to Lesson 3: When you come from the development side of the house as either an engineer or a vendor, you have a lot to learn about audit and security. Around the same time we also learned a great deal about the Dev / Ops gap. While some of our customers implemented the tool as intended, most of the time it was used either by a dev organization doing continuous integration and deployment to the early test environments or a production support / release management group that only did the “official” builds and focused on the production release. The failure of the developer initiated efforts to get to production provided a key insight. Lesson 4: When it comes to deployment automation, start with production, and work your way to the lower environments, treating them as a simpler case of the hard problem. We also learned about the limitations of a build pipeline approach in complex applications. Lesson 5: Deployments of complex systems often require a coordinated release of many builds. The need to start at production style deployments (and not care as much about the build) and the emphasis on deployment time dependencies focused our efforts on a deployment centric tool, uDeploy to compliment the pipeline approach in AnthillPro. Production and scaled usage come together to make the deployment tool a critical piece of infrastructure. If you lose a datacenter, can you recover the tool and use it to recover other applications into a backup datacenter?Lesson 6: Your release and deployment systems need to be configured with high availability and recoverability in mind. For the better part of the last decade, we’ve been trying to streamline deployments and kill off the release weekend. Many of customers have eliminated or shrunk those big events down to a more manageable size. Big releases though still can require the coordination of dozens of people across various teams. One time events like configuring the deployment tool with the password for a new database in each environment, or routine manual steps like putting eyeballs on the new production system before putting it out to the public still need to be managed. Lesson 7: Deployment is still just one part of release. With that in mind, we’ve added uRelease to the portfolio. Different people in the application delivery value chain need different tools. Developers need continuous integration and great testing. Change management needs great auditing around production deployments. Testers need their environments to be not broken, updated regularly and close enough to production to make their testing worthwhile. The patterns and needs overlap. Everyone needs deployments that work. Everyone benefits from understanding the lifecycle of a build (or other version) as well as what is in a given environment. Everyone benefits when release planning meetings are shorter. Lesson 8: No single tool is enough. An integrated toolchain that exchanges information is required. So there you go. Twelve years of tooling, eight with deployments. Summed up in six over-simplified lessons. I’m sure we’ll be updating this next year. Lesson 1: Builds generally exist to be tested or released to customers. Lesson 2: Deployments are a serious challenge, and can not be bolted on to a build tool. Lesson 3: When you come from the development side of the house as either an engineer or a vendor, you have a lot to learn about audit and security. Lesson 4: When it comes to deployment automation, start with production, and work your way to the lower environments, treating them as a simpler case of the hard problem. Lesson 5: Deployments of complex systems often require a coordinated release of many builds. Lesson 6: Release and deployment systems need to be configured with high availability and recoverability in mind. Lesson 7: Deployment is still just one part of release Lesson 8: No single tool does “DevOps”. An integrated toolchain that exchanges information is required. What are the biggest lessons you’ve learned about deployments in the last few years? Share in the comments below!

March 4, 2013

by Eric Minick

· 7,280 Views

SAP Integration with Talend Components / Connectors (BAPI, RFC, IDoc, BW, SOAP)

talend has several connectors to integrate sap systems. however, this guide is no introduction to talend’s sap components. instead, this guide helps to understand different alternatives to integrate sap systems with talend set up a local sap system configure talend studio for using sap components use talend’s sap wizard run a first talend job which connects to sap all further required information and example use cases for talend’s sap components should be available in the talend component guide at www.help.talend.com . if that’s not the case, please create a jira documentation ticket ( https://jira.talendforge.org/browse/doct )! now let’s take a look at different alternatives for integration of sap systems with talend. alternatives for sap integration three protocols exist for communication between sap and external programs: dynamic information and action gateway (diag): e.g. used by sap gui remote function call (rfc): a function call with input and output parameters (like a java interface) hypertext transfer protocol (http): internet standard the following alternatives are available for integrating sap systems using some of these protocols. file sap supports the direct import of files (call-transaction-program, batch-input, direct input). files have to be in a specific format to be imported. transformation and integration can be realized with talend’s various file components such as tfileinputdelimited. rfc remote function call is the proprietary sap ag interface for communication between a sap system and other sap or third-party compatible system over tcp/ip or cpi-c connections. remote function calls may be associated with sap software and abap programming, and provide a way for an external program (written in languages such as php, asp, java, or c, c++) to use data returned from the server. data transactions are not limited to getting data from the server, but can insert data into server records as well. sap can act as the client or server in an rfc call. a remote function call (rfc) is the call or remote execution of a remote function module in an external system. in the sap system, these functions are provided by the rfc interface system. the rfc interface system enables function calls between two sap systems, or between a sap system and an external system. tsapinput and tsapoutput are talend’s components to use rfcs. business application programming interface (bapi) a bapi is an object-oriented view on most data and transactions of a sap system (called “business objects”). object types of the business objects are stored in the business object repository (bor). bapis are always implemented as rfcs and therefore can be called the same way. additionally, they have the following characteristics (compared to rfcs): stable interface no view layer no exceptions, instead export parameter: “return” most business objects offer the following standard bapis: getlist getdetail change creationfromdata tsapinput and tsapoutput are talend’s components to use bapis. application link enabling (ale) application link enabling (ale) is used for asynchronous messaging between different systems via “intermediate documents (idoc)”. idoc is a sap document format for business transaction data transfers. it is used to realize distributed business processes. idoc is similar to xml in purpose, but differs in syntax. both serve the purpose of data exchange and automation in computer systems, but the idoc technology takes a different approach. while xml allows having some metadata about the document itself, an idoc is obligated to have information at its header like its creator, creation time, etc. while xml has a tag-like tree structure containing data and meta-data, idocs use a table with the data and meta-data. idocs also have a session that explains all the processes which the document passed or will pass, allowing one to debug and trace the status of the document. an idoc consists of control record (it contains the type of idoc, port of the partner, release of sap r/3 which produced the idoc, etc.) data records of different types. the number and type of segments is mostly fixed for each idoc type, but there is some flexibility (for example an sd order can have any number of items). status records containing messages such as 'idoc created', 'the recipient exists', 'idoc was successfully passed to the port', 'could not book the invoice because...' different idoc types are available to handle different types of messages. for example, the idoc format orders01 may be used for both purchase orders and order confirmations. tsapidocinput and tsapidocoutput are talend’s components to use ale / idoc. bapis can also be called asynchronously via ale. all new idocs are even based on bapis. soap web services sap supports soap web services. not just sap as java, but also sap as abap! integration can be realized with talend’s esb / web service components such as tesbrequest, tesbresponse, or tesbconsumer. installation of sap server and client installation can take about 6 to 8 hours, but it is an “all in one installation”, i.e. you can install it overnight. steps for installation: get yourself a windows 7 64 bit laptop or vm with 8+ gb ram and 50+gb free disc space get a sap community account (for free, just register): http://scn.sap.com/welcome download sap netweaver (software downloads --> sap netweaver main releases: http://www.sdn.sap.com/irj/scn/nw-downloads download current version of sap netweaver application server abap 64-bit trial install sap server: follow installation guide – a html website included in the download in root of extracted download folder (start.htm --> there click on “installation” link) install sap gui (rich client frontend): start.htm --> there click on “install sap gui” link and follow instructions download the sap jco for the operating system on which your connector is running. the sap jco is available for download from sap's website at http://service.sap.com/connectors . you must have an sapnet account to access the sap jco (if you do not already have one, contact your local sap basis administrator). usage of sap server hint: you have to use a windows user which has a password (as you need to enter windows credentials when stopping sap). if you have a windows user without a password (for instance if you use windows within a vm on your mac), sap cannot process these credentials (i.e. it cannot process an empty password field) --> change your windows password before starting sap start the management console (windows startmenu --> programs --> sap management console) start and stop the sap server (right click on “nsp” --> start / stop) default user: sap* (sap system super user) password: the one which you entered at installation of sap netweaver, e.g. admin123 usage of sap client a sap client should be used to get information about the sap system (functions, data, etc.) similarly to using e.g. mysql workbench to get information from a mysql database. sap gui (view layer) communicates with sap as abap (business logic layer). the application server communicates with the relational database (db layer). different clients are available for sap: sap gui windows sap gui java web browser external rfc-program for local development demos, sap gui windows is probably the best alternative. start sap gui windows by: clicking shortcut “windows start menu --> sap frontend --> sap logon” entering username and password clicking logon sap transactions in sap, you call sap programs via sap transaction codes. important transactions codes are for example: bapi: bapi explorer, view all sap bapi's se16: data browser, view/add table data se38: program editor here is a list of several other important transaction codes: http://www.sapdev.co.uk/tcodes/tcodes.htm installation of demo data the sap installation includes some demo data. as most people do not want to install “real” sap modules such as sap fi, sap crm or sap bi on their local system, this demo data is perfect for demos using talend’s sap connectors. to install the flight demo on a local sap system, you just have to open the abap editor (transaction: se38) and execute the program sapbc_data_generator. this program generates example data within the flight tables and does some further initializations. here is a good tutorial with more information and how to test the flight application: http://help.sap.com/saphelp_erp60_sp/helpdata/de/db/7c623cf568896be10000000a11405a/content.htm configuration of talend studio to use sap components talend’s sap components are already included in the studio. however, two further steps are required to be able to use them: copy sapjco3.dll to the directory c:/windows/system32 sap java connector jar must be added copy sapjco3.jar to the directory “talend/studio/lib/java” (re-) start talend studio check if sap library is added successfully open view “talend modules” (eclipse --> windows --> show view --> talend --> modules) sort by column “context” look for “tsap*” contexts and check if sapjco3.jar has status “installed” usage of sap components with talend studio this section describes how to use talend’s sap components and the sap wizard in general (using one specific example for calling a bapi). detailed descriptions of all sap components (for using bapis, rfcs, idocs, bw, etc.) are available in the documentation talend_components_rg_x.y.z.pdf at www.help.talend.com . connection to a sap system a connection to a sap system can be done “built-in” or via “metadata --> sap connections” (the latter only in enterprise version). using the latter has several advantages: reuse connection configuration quick check if connection to sap works wizards for retrieving functions from sap (instead of handwriting without wizard) quick test with test parameters if function works before finishing development lifecycle for a sap job development lifecycle for sap job: create connection (if not existing yet) right click on metadata --> sap connection create sap connection follow wizard sap jco version: 3 client: “001” userid: “sap*” password: “admin123” --> as you defined it while installation language: “en” hostname: “localhost” system number: “00” retrieve function (bapi / rfc) right click on created connection click on “retrieve sap function” enter search filter (e.g. bapi_fl*) click on “search” select and double click on your function (e.g bapi_flcust_getlist) you see all input, output and table parameters for this sap function click on “test in” --> here you see parameters in more detail: you now have to define which input and output parameters you want to use --> remove all other by selecting them and clicking “remove” button hint: if you do not remove an input parameter, you usually have to enter a value for it! select the output type - can be a single (single record), a table (list of records), or a structure output hint: difference between table and structure in sap: http://www.sapfans.com/forums/viewtopic.php?f=12&t=119794 if you want to do a quick test: enter values for input parameters (if there are any for your function call), then click “launch” button in this example, there is only an optional input parameter max_rows you should see data in the output fields in this example, you see the record with custname “sap ag” and street “neurottstr. 16” click “finish” button under “metadata --> sap connections --> “your connection” --> sap functions: there you can now see your function (in this example: bapi_flcust_getlist) create sap job drag&drop the created function into a job (without the wizard, you also can enter all data by hand) tsapinput component is proposed automatically. click ok to add it to your job go to “initialize input” and add parameter values in this example, there is just the parameter “max_rows” hint: the parameter value can be changed from a hardcoded value to a variable, of course (just click control space on your keyboard to get access to all available variables via code completion in your studio) go to the tsapinput component and add the desired output mapping (i.e. which values you want to process further with other components scroll to the bottom to “outputs” add the correct table / structure name (in this example: "customer_list") click on mapping (which is empty and has to be filled) click on “mapping”, then click on “…” add the wanted output columns of your sap function add the same names at the column “schema xpathqueries” (do not forget the double quotes here!) click “ok” button connect the tsapinput component to a tlogrowcomponent and synchronize the schema hint: always try out if this works before adding further logic to your job! run and test your job (you will see five rows logged (as you have configured max_rows = 5 that's it. now enjoy talend's sap components :-) best regards, kai wähner (twitter: @kaiwaehner) content from my blog: http://www.kai-waehner.de/blog/2013/03/03/sap-integration-with-talend-components-connectors-bapi-rfc-idoc-bw-soap/

March 4, 2013

by Kai Wähner

CORE

· 32,877 Views · 1 Like

JUnit testing of Spring MVC application: Testing the Service Layer

In continuation of my earlier blogs on Introduction to Spring MVC and Testing DAO layer in Spring MVC, in this blog I will demonstrate how to test Service layer in Spring MVC. The objective of this demo is 2 fold, to build the Service layer using TDD and increase the code coverage during JUnit testing of Service layer. For people in hurry, get the latest code from Github and run the below command mvn clean test -Dtest=com.example.bookstore.service.AccountServiceTest Since in my earlier blog, we have already tested the DAO layer, in this blog we only need to focus on testing service layer. We need to mock the DAO layer so that we can control the behavior in Service layer and cover various scenarios. Mockito is a good framework which is used to mock a method and return known data and assert that in the JUnit. As a first step we define the AccountServiceTestContextConfiguration class with AccountServiceTest class. If you notice there are 2 beans defined in that class and we marked the as a @Configuration which shows that it is a Spring Context class. In the JUnit test we @Autowired AccountService class. And AccountServiceImpl @Autowired the AccountRepository class. When creating the Bean in the configuration file we also stubbed the AccountRepository class using Mockito, @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration public class AccountServiceTest { @Configuration static class AccountServiceTestContextConfiguration { @Bean public AccountService accountService() { return new AccountServiceImpl(); } @Bean public AccountRepository accountRepository() { return Mockito.mock(AccountRepository.class); } } //We Autowired the AccountService bean so that it is injected from the configuration @Autowired private AccountService accountService; @Autowired private AccountRepository accountRepository; During the setup of the JUnit we use Mockito mock findByUsername method to return a predefined account object as below @Before public void setup() { Account account = new AccountBuilder() { { address("Herve", "4650", "Rue de la gare", "1", null, "Belgium"); credentials("john", "secret"); name("John", "Doe"); } }.build(true); Mockito.when(accountRepository.findByUsername("john")).thenReturn(account); } Now we write the tests as below and test both the positive and negative scenarios, @Test(expected = AuthenticationException.class) public void testLoginFailure() throws AuthenticationException { accountService.login("john", "fail"); } @Test() public void testLoginSuccess() throws AuthenticationException { Account account = accountService.login("john", "secret"); assertEquals("John", account.getFirstName()); assertEquals("Doe", account.getLastName()); } } Finally we verify if the findByUsername method is called only once successfully as below in the teardown, @After public void verify() { Mockito.verify(accountRepository, VerificationModeFactory.times(1)).findByUsername(Mockito.anyString()); // This is allowed here: using container injected mocks Mockito.reset(accountRepository); } AccountService class looks as below, @Service @Transactional(readOnly = true) public class AccountServiceImpl implements AccountService { @Autowired private AccountRepository accountRepository; @Override public Account login(String username, String password) throws AuthenticationException { Account account = this.accountRepository.findByUsername(username, password); } else { throw new AuthenticationException("Wrong username/password", "invalid.username"); } return account; } } I hope this blog helped you. In my next blog, I will demo how to build a controller JUnit test. Reference: Pro Spring MVC: With Web Flow by by Marten Deinum, Koen Serneels

March 3, 2013

by Krishna Prasad

· 81,613 Views · 3 Likes