DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Testing, Tools, and Frameworks Topics

article thumbnail
Long-Term Log Analysis with AWS Redshift
You will aggregate a lot of logs over the lifetime of your product and codebase, so it’s important to be able to search through them. In the rare case of a security issue, not having that capability is incredibly painful. You might be able to use services that allow you to search through the logs of the last two weeks quickly. But what if you want to search through the last six months, a year, or even further? That availability can be rather expensive or not even an option at all with existing services. Many hosted log services provide S3 archival support which we can use to build a long-term log analysis infrastructure with AWS Redshift. Recently I’ve set up scripts to be able to create that infrastructure whenever we need it at Codeship. AWS Redshift AWS Redshift is a data warehousing solution by AWS. It has an easy clustering and ingestion mechanism ideal for loading large log files and then searching through them with SQL. As it automatically balances your log files across several machines, you can easily scale up if you need more speed. As I said earlier, looking through large amounts of log files is a relatively rare occasion; you don’t need this infrastructure to be around all the time, which makes it a perfect use case for AWS. Setting Up Your Log Analysis Let’s walk through the scripts that drive our long-term log analysis infrastructure. You can check them out in the flomotlik/redshift-logging GitHub repository. I’ll take you step by step through configuring the whole setup of the environment variables needed, as well as starting the creation of the cluster and searching the logs. But first, let’s get a high-level overview of what the setup script is doing before going into all the different options that you can set: Creates an AWS Redshift cluster. You can configure the number of servers and which server type should be used. Waits for the cluster to become ready. Creates a SQL table inside the Redshift cluster to load the log files into. Ingests all log files into the Redshift cluster from AWS S3. Cleans up the database and prints the psql access command to connect into the cluster. Be sure to check out the script on GitHub before we go into all the different options that you can set through the .env file. Options to set The following is a list of all the options available to you. You can simply copy the .env.template file to .env and then fill in all the options to get picked up. AWS_ACCESS_KEY_ID AWS key of the account that should run the Redshift cluster. AWS_SECRET_ACCESS_KEY AWS secret key of the account that should run the Redshift cluster. AWS_REGION=us-east-1 AWS region the cluster should run in, default us-east-1. Make sure to use the same region that is used for archiving your logs to S3 to have them close. REDSHIFT_USERNAME Username to connect with psql into the cluster. REDSHIFT_PASSWORD Password to connect with psql into the cluster. S3_AWS_ACCESS_KEY_ID AWS key that has access to the S3 bucket you want to pull your logs from. We run the log analysis cluster in our AWS Sandbox account but pull the logs from our production AWS account so the Redshift cluster doesn’t impact production in any way. S3_AWS_SECRET_ACCESS_KEY AWS secret key that has access to the S3 bucket you want to pull your logs from. PORT=5439 Port to connect to with psql. CLUSTER_TYPE=single-node The cluster type can be single-node or multi-node. Multi-node clusters get auto-balanced which gives you more speed at a higher cost. NODE_TYPE Instance type that’s used for the nodes of the cluster. Check out the Redshift Documentation for details on the instance types and their differences. NUMBER_OF_NODES=10 Number of nodes when running in multi-mode. CLUSTER_IDENTIFIER=log-analysis DB_NAME=log-analysis S3_PATH=s3://your_s3_bucket/papertrail/logs/862693/dt=2015 Database format and failed loads When ingesting log statements into the cluster, make sure to check the amount of failed loads that are happening. You might have to edit the database format to fit to your specific log output style. You can debug this easily by creating a single-node cluster first that only loads a small subset of your logs and is very fast as a result. Make sure to have none or nearly no failed loads before you extend to the whole cluster. In case there are issues, check out the documentation of the copy command which loads your logs into the database and the parameters in the setup script for that. Example and benchmarks It’s a quick thing to set up the whole cluster and run example queries against it. For example, I’ll load all of our logs of the last nine months into a Redshift cluster and run several queries against it. I haven’t spent any time on optimizing the table, but you could definitely gain some more speed out of the whole system if necessary. It’s just fast enough already for us out of the box. As you can see here, loading all logs of May — more than 600 million log lines — took only 12 minutes on a cluster of 10 machines. We could easily load more than one month into that 10-machine cluster since there’s more than enough storage available, but for this post, one month is enough. After that, we’re able to search through the history of all of our applications and past servers through SQL. We connect with our psql client and send of SQL queries against the “events’ database. For example, what if we want to know how many build servers reported logs in May: loganalysis=# select count(distinct(source_name)) from events where source_name LIKE 'i-%'; count ------- 801 (1 row) So in May, we had 801 EC2 build servers running for our customers. That query took ~3 seconds to finish. Or let’s say we want to know how many people accessed the configuration page of our main repository (the project ID is hidden with XXXX): loganalysis=# select count(*) from events where source_name = 'mothership' and program LIKE 'app/web%' and message LIKE 'method=GET path=/projects/XXXX/configure_tests%'; count ------- 15 (1 row) So now we know that there were 15 accesses on that configuration page throughout May. We can also get all the details, including who accessed it when through our logs. This could help in case of any security issues we’d need to look into. The query took about 40 seconds to go though all of our logs, but it could be optimized on Redshift even more. Those are just some of the queries you could use to look through your logs, gaining more insight into your customers’ use of your system. And you et all of that with a setup that costs $2.50 an hour, can be shut down immediately, and recreated any time you need access to that data again. Conclusions Being able to search through and learn from your history is incredibly important for building a large infrastructure. You need to be able to look into your history easily, especially when it comes to security issues. With AWS Redshift, you have a great tool in hand that allows you to start an ad hoc analytics infrastructure that’s fast and cheap for short-term reviews. Of course, Redshift can do a lot more as well. Let us know what your processes and tools around logging, storage, and search are in the comments.
June 21, 2015
by Florian Motlik
· 1,451 Views
article thumbnail
Spring Integration Tests with MongoDB Rulez
Spring integration tests allow you to test functionality against a running application. This article shows proper database set- and clean-up with MongoDB.
June 10, 2015
by Ralf Stuckert
· 21,465 Views · 2 Likes
article thumbnail
Regular Expressions Denial of the Service (ReDOS) Attacks: From the Exploitation to the Prevention
autors :michael hidalgo, dinis cruz introduction when it comes to web application security, one of the recommendations to write software that is resilient to attacks is to perform a correct input data validation. however, as mobile applications and apis (application programming interface) proliferates, the number of untrusted sources where data comes from goes up, and a potential attacker can take advantage of the lack of validations to compromise our applications. regular expressions provides a versatile mechanism to perform input data validation. developers use them to validate email addresses, zip codes, phone numbers and many other task that are easily implemented thought them. unfortunately most of the time software engineers don't fully understand how regular expressions works in the background and by choosing a wrong regular expression pattern they can introduce a risk in the application. in this article we are going to discuss about the so called regular expression denial of the service (redos) vulnerability and how we can identify this problems early in the software development life cycle (sdlc) stages by enforcing a culture focused on unit testing. hardware features for this article in order to provide information about execution time, performance, cpu utilisation and other facts, we are relying on virtual machine that uses windows 7 32-bit operating system, 5.22 gb ram. intel(r) core (tm) it-3820qm cpu @2.7 ghz. we are also using 4 cores. understanding the problem. the owasp foundation (2012) defines a regular regular expression denial of service attack as follows: "the regular expression denial of service (redos) is a denial of service attack, that exploits the fact that most regular expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). an attacker can then cause a program using a regular expression to enter these extreme situations and then hang for a very long time." although a broad explanation about regular expression engines is out of the scope of this article,it is important to understand that, according to stubblebine,t (regular expressions pocket reference), a pattern matching consist of finding a section of text that is described (matched) by a regular expression. two main rules are used to match results: the earliest (leftmost) wins : the regular expression is applied to the input starting at the first character and moving toward the last. as soon as the regular expression engine finds a match,it returns. standard quantifiers are greedy : according to stubblebine, "quantifiers specify how many times something can be repeated. the standard quantifiers attempt to match as many times as possible. the process of giving up characters and trying less-greedy matches is called backtracking." for this article we are focused a regular expression engine called nondeterministic finite automaton (nfa).this engines usually compare each element of the regex to the input string, keeping track of positions where it chose between two options in the regex. if an option fails, the engine backtracks to the most recently saved position.(stubblebine,t 2007). it is important to note that this engine is also implemented in .net, java, python, php and ruby on rails. this article is focused on c# and therefore we are relying on the microsoft .net framework system.text.regularexpression classes which at the heart uses nfa engines. according to bryan sullivan "one important side effect of backtracking is that while the regex engine can fairly quickly confirm a positive match (that is, an input string does match a given regex), confirming a negative match (the input string does not match the regex) can take quite a bit longer. in fact, the engine must confirm that none of the possible “paths” through the input string match the regex, which means that all paths have to be tested. with a simple non-grouping regular expression, the time spent to confirm negative matches is not a huge problem." in order to illustrate the problem, let's use this regular expression (\w+\d+)+c which basically performs the following checks: between one and unlimited times, as many times as possible, giving back as needed. \w+ match any word character a-za-z0-9_ . \d+ match a digit 0-9 matches the character c literally (case sensitive) so matching values are 12c,1232323232c and !!!!cd4c and non matching values are for instance !!!!!c,aaaaaac and abababababc . the following unit test was created to verify both cases. const string regexpattern = @"(\w+\d+)+c"; public void testregularexpression() { var validinput = "1234567c"; var invalidinput = "aaaaaaac"; regex.ismatch(validinput, regexpattern).assert_is_true(); regex.ismatch(invalidinput, regexpattern).assert_is_false(); } execution time : 6 milliseconds now that we've verified that our regular expression works well, let's write a new unit test to understand the backtracking problem and the performance effects. note that the longer the string, the longer the time the regular expression engine will take to resolve it. we will generate 10 random strings, starting at the length of 15 characters, incrementing the length until get to 25 characters,and then we will see the execution times. const string regexpattern = @"(\w+\d+)+c"; [testmethod] public void isvalidinput() { var sw = new stopwatch(); int16 maxiterations = 25; for (var index = 15; index < maxiterations; index++) { sw.start(); //generating x random numbers using fluentsharp api var input = index.randomnumbers() + "!"; regex.ismatch(input, regexpattern).assert_false(); sw.stop(); sw.reset(); } } now let's take a look at the test results: random string character length elapsed time (ms) 360817709111694! 16 16ms 2639383945572745! 17 23ms 57994905459869261! 18 50ms 327218096525942566! 19 106ms 4700367489525396856! 20 207ms 24889747040739379138! 21 394ms 156014309536784168029! 22 795ms 8797112169446577775348! 23 1595ms 41494510101927739218368! 24 3200ms 112649159593822679584363! 25 6323ms by looking at this results we can understand that the execution time (total time to resolve the input text against the regular expression) goes up exponentially to the size of the input. we can also see that when we append a new character, the execution time almost duplicates. this is an important finding because shows how expensive this process is, if we do not have a correct input data validation we can introduce performance issues in our application. a real-life use-case and an appeal for a unit testing approach now that we have seen the problems we can face by selecting a wrong (evil) regular expression, let's discuss about a realistic scenario where we need to validate input data thought regular expressions. we strongly believe that unit testing techniques can not only help to write quality code but also we can use them to find vulnerabilities in the code we are writing. by writing unit test that performs security checks (like input data validation) a common task in web applications consist on request an email address to the user signing in our application. from a ux (user experience perspective) complaining browsers support friendly error messages when an input, that was supposed to be an email address, does not match with the requirements in terms of format. here is a ui validation when a input textbox (with the email type is set) and the value is not a valid email address. however relying on a ui validation is not longer enough. an eavesdropper can easily perform an http request without using a browser (namely by using a proxy to capture data in transit) and then send a payload that can compromise our application. in the following use case, we are using a backend validation for the email address by using a regular expression. we will show you the real power of regular expressions here, we are not only testing that the regular expression validates the input but also how it behaves when it receives any arbitrary input. we are using this evil regular expression to validate the email: ^( 0-9a-za-z @([0-9a-za-z][-\w][0-9a-za-z].)+[a-za-z]{2,9})$ . with the following test we are verifying that a valid email and invalid emails formats are correctly processed by the regular expression, which is the functional aspect from a development point of view. const string emailregex = @"^([0-9a-za-z]([-.\w]*[0-9a-za-z])*@([0-9a-za-z][-\w]*[0-9a-za-z]\.)+[a-za-z]{2,9})$"; [testmethod] public void validateemailaddress() { var validemailaddress = "[email protected]"; var invalidemailaddress = new string[] { "a", "abc.com", "1212", "aa.bb.cc", "aabcr@s" }; regex.ismatch(validemailaddress, emailregex).assert_is_true(); //looping throught invalid email address foreach (var email in invalidemailaddress) { regex.ismatch(email, emailregex).assert_is_false(); } } elapsed time: 6ms. so both cases are validate correctly. one could state that both scenarios supported by the unit test are enough to select this regular expression for our input data validations. however we can do a more extensive testing as you'll see. the exploit so far the previous regular expression selected to valid an email address seems to work well, we have added some unit test that verifies valid an invalid inputs. but how does it behaves when we send an arbitrary input?, from a variable length, do we face a denial of the service attack?. this kind of questions can be solved wit unit testing technique like this one: const string emailregex = @"^([0-9a-za-z]([-.\w]*[0-9a-za-z])*@([0-9a-za-z][-\w]*[0-9a-za-z]\.)+[a-za-z]{2,9})$"; [testmethod] public void validateemailaddress() { var validemailaddress = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"; var watch = new stopwatch(); watch.start(); validemailaddress.regex(emailregex).assert_is_false(); watch.stop(); console.writeline("elapsed time {0}ms", watch.elapsedmilliseconds); watch.reset(); } **elapsed time : ~23 minutes (1423127 milliseconds).** results are disturbing. we can clearly see the performance problem introduced by evaluating the given input.it takes roughly 23 minutes to validate the input given the hardware characteristics described before. in the following images you will see the cpu behaviour when running this unit test. here is another cpu utlization: and this is another image from the cpu utilization while the test is running. fuzzing and unit testing: a perfect combination of techniques in the previous unit test we found that a given input string can lead to have denial of the service issue in our application. note that we didn't need an extreme large payload, in our scenario 34 characters can illustrate this problem or even less. when using any regular expression it is recomendable to always test it against unit testing to cover most of the possible ways a user (which can be a potential attacker) can send. here is where we can use fuzzing. tobias klein in his book a bug hunter's diary a guide tour throught the wilds of sofware security defines fuzzing as "a complete different approach to bug hunting is known as fuzzing. fuzzing is a dynamic-analysis technique that consist of testing an application by providing it with malformed or unexpected input. then klein continues adding that: "it isn't easy to identify the entry points of such complex applications, but complex software often tends to crash while processing malformed input data. page 05" mano paul in his book official (isc)2 guide to the csslp talking about fuzzing states that: "also known as fuzz testing or fault injection testing, fuzzing is a brute-force type of testing in which faults (random and pseudo-random input data) are injected into the software and it's behaviour is observed. it is a test whose results are indicative of the extended and effectiveness of the input validation.page 336". taking previous definitions into consideration, we are going to implement a new unit test that can allow us to generate random input data and test our regular expression. in this case, we are using this email regular expression "^[\w-.]{1,}\@([\w]{1,}.){1,}[a-z]{2,4}$"; and by doing an exhaustive testing we will see if we are not introducing a denial of the service problem. we want to make sure that the elapsed time to resolve if the random string matches the regular expression is evaluated in less than 3 seconds: const string emailregex = @"^[\w-\.]{1,}\@([\w]{1,}\.){1,}[a-z]{2,4}$"; //number of random strings to generte. const int maxiterations = 10000; [testmethod] public void fuzz_emailaddress() { //valid email should return true "[email protected]".regex(emailregex).assert_is_true(); //invalid email should return false "abce" .regex(emailregex).assert_is_false(); //testing maxiterations times for (int index = 0; index < maxiterations; index++) { //generating a random string var fuzzinput = (index * 5).randomstring(); var sw = new stopwatch(); sw.start(); fuzzinput.regex(emailregex).assert_is_false(); //elapsed time should be less than 3 seconds per input. sw.elapsed.seconds().assert_size_is_smaller_than(3); } } under the hardware features described before, this test passes. considering that we are using this computation (index * 5), the largest string generate is of 49995 character (which is 9999 *5). having said that we were able to test a large string against the regular expression and we confirmed that even thought it is quite large input value, the time involved to verify if it was or not a valid email, it was less than 3 seconds. now assuming that a check for the length of the email in the first place, it will guarantee that a malicious user can't inject a large payload in our application. countermeasures provided in microsoft .net 4.5 and upper if you are developing applications in microsoft .net 4.5 then you can take advantage of a new implementation on top of the ismatch method from the regex class . starting from .net 4.5 the ismatch method provides an overload that allows you to enter a timeout. note that this overload is not available in .net 4.0 . this new parameter is called matchtimeout and according to microsoft : "the matchtimeout parameter specifies how long a pattern matching method should try to find a match before it times out. setting a time-out interval prevents regular expressions that rely on excessive backtracking from appearing to stop responding when they process input that contains near matches. for more information, see best practices for regular expressions in the .net framework and backtracking in regular expressions . if no match is found in that time interval, the method throws a regexmatchtimeoutexception exception. matchtimeout overrides any default time-out value defined for the application domain in which the method executes." taken from here . we've written a new unit test where we're using a regular expression that we know can lead to denial of the service. in this case we'll test an email address that previously generated a significant side effect in the performance of the application. we'll see then how we can reduce the impact of this process by setting up a timeout. const string emailregexpattern = @"^([0-9a-za-z]([-.\w]*[0-9a-za-z])*@([0-9a-za-z][-\w]*[0-9a-za-z]\.)+[a-za-z]{2,9})$"; [testmethod] public void validateemailaddress() { var emailaddress = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"; var watch = new stopwatch(); watch.start(); //timeout of 5 seconds try { regex.ismatch(emailaddress, emailregexpattern, regexoptions.ignorecase, timespan.fromseconds(5)); } catch (exception ex) { ex.message.assert_not_null(); ex.gettype().assert_is(typeof(regexmatchtimeoutexception)); } finally { watch.stop(); watch.elapsed.seconds().assert_size_is_smaller_than(5); watch.reset(); } } running this test in visual studio we can confirm it passes, which means that the backtracking mechanism is taking longer than 5 seconds to resolve. it will throw a regexmatchtimeoutexception exception indicating that it might take longer than 5 seconds to evaluate the input. ideally one would expect this process to take less than a second, however several conditions or requirements might lead to allow a timeout in seconds. note how this model provides a very needed defensive programming style where the software engineers make informed decisions on the code they write, in this case we can establish the next steps when our method times and that way we can decrease any denial of the service attack. final thoughts no one size fits all is so cliché that has to be true. we are not sure if the regular expressions you are currently using in your applications are vulnerable to this attack. what we can do for sure is to show you how you can take advantage of unit testing to write secure code. when we write code we want to make sure that each single line of code is covered by a unit testing, which at the end of the day will guarantee early detections of error. however if we can combine this exercise with the adoption and implementation of test that can also try to attack/compromise the application (and we are not talking about anything fancy) like sending random strings, using fuzzing techniques, using combination of characters, exceeding the expected length, we will be helping to write software that is resilient to attacks. as a recommendation always test your regular expressions agains uni test, make sure that they are resilient to the attack we have covered in this article and if you are able to identify those problematic patterns out there, do a contribution and report them so we are not introduce them in the software we write. references 1.cruz,dinis(2013) the email regex that (could had) dosed a site. 2.hollos,s. hollos,r (2013) finite automata and regular expressions problems and solutions. 3.kirrage,j. rathnayake , thielecke, h.: static analysis for regular expression denial-of-service attacks. university of birmingham, uk 4.klein, t. a bug hunter's diary a guided tour through the wilds of software security (2011). 5.the owasp foundation (2012) regular expression denial of service - redos. 6.stubblebine, t(2007) regular expression pocket reference, second edition. 7.sullivan, b (2010) regular expression denial of service attacks and defenses
June 7, 2015
by Michael Hidalgo
· 34,158 Views · 5 Likes
article thumbnail
Easy SQLite on Android with RxJava
Whenever I consider using an ORM library on my Android projects, I always end up abandoning the idea and rolling my own layer instead for a few reasons: My database models have never reached the level of complexity that ORM’s help with. Every ounce of performance counts on Android and I can’t help but fear that the SQL generated will not be as optimized as it should be. Recently, I started using a pretty simple design pattern that uses Rx to offer what I think is a fairly simple way of managing your database access with RxJava. Easy reads One of the important design principles on Android is to never perform I/O on the main thread, and this obviously applies to database access. RxJava turns out to be a great fit for this problem. I usually create one Java class per table and these tables are then managed by my SQLiteOpenHelper. With this new approach, I decided to extend my use of the helper and make it the only point of access to anything that needs to read or write to my SQL tables. Let’s consider a simple example: a USERS table managed by the UserTable class: // MySqliteOpenHelper.java Observable> getUsers(String userId) { return makeObservable(mUserTable.getUsers(getReadableDatabase(), userId)) .subscribeOn(Schedulers:io()) } The problem with this method is that if you’re not careful, you will call it on the main thread, so it’s up to the caller to make sure they are always invoking this method on a background thread (and then to post their UI update back on the main thread, if they are updating the UI). Instead of relying on managing yet another thread pool or, worse, using AsyncTask, we are going to rely on RxJava to take care of the threading model for us. Let’s rewrite this method to return a callable instead: // MySqliteOpenHelper.java private static Observable makeObservable(final Callable func) { return Observable.create( new Observable.OnSubscribe() { @Override public void call(Subscriber subscriber) { try { subscriber.onNext(func.call()); } catch(Exception ex) { Log.e(TAG, "Error reading from the database", ex); } } }); } In effect, we simply refactored our method to return a lazy result, which makes it possible for the database helper to turn this result into an Observable: // MySqliteOpenHelper.java Observable> getUsers(String userId) { return makeObservable(mUserTable.getUsers(getReadableDatabase(), userId)) .subscribeOn(Schedulers:io()) } Notice that on top of turning the lazy result into an Observable, the helper forces the subscription to happen on a background thread (the IO thread here, since we’re accessing the database). This guarantees that callers don’t have to worry about ever blocking the main thread. Finally, the makeObservable method is pretty straightforward (and completely generic): // MySqliteOpenHelper.java private static Observable makeObservable(final Callable func) { return Observable.create( new Observable.OnSubscribe() { @Override public void call(Subscriber subscriber) { try { subscriber.onNext(func.call()); } catch(Exception ex) { Log.e(TAG, "Error reading from the database", ex); } } }); } At this point, all our database reads have become observables that guarantee that the queries run on a background thread. Accessing the database is now pretty standard Rx code: // DisplayUsersFragment.java @Inject MySqliteOpenHelper mDbHelper; // ... mDbHelper.getUsers(userId) .observeOn(AndroidSchedulers.mainThread()) .subscribe(new Action1>()) { @Override public void onNext(List users) { // Update our UI with the users } } } And if you don’t need to update your UI with the results, just observe on a background thread. Since your database layer is now returning observables, it’s trivial to compose and transform these results as they come in. For example, you might decide that your ContactTable is a low layer class that should not know anything about your model (the User class) and that instead, it should only return low level objects (maybe a Cursor or ContentValues). Then you can use use Rx to map these low level values into your model classes for an even cleaner separation of layers. Two additional remarks: Your Table Java classes should contain no public methods: only package protected methods (which are accessed exclusively by your Helper, located in the same package) and private methods. No other classes should ever access these Table classes directly. This approach is extremely compatible with dependency injection: it’s trivial to have both your database helper and your individual tables injected (additional bonus: with Dagger 2, your tables can have their own component since the database helper is the only refence needed to instantiate them). This is a very simple design pattern that has scaled remarkably well for our projects while fully enabling the power of RxJava. I also started extending this layer to provide a flexible update notification mechanism for list view adapters (not unlike what SQLBrite offers), but this will be for a future post. This is still a work in progress, so feedback welcome!
June 4, 2015
by Cedric Beust
· 16,229 Views
article thumbnail
Mounting an EBS Volume to Docker on AWS Elastic Beanstalk
Mounting an EBS volume to a Docker instance running on Amazon Elastic Beanstalk (EB) is surprisingly tricky. The good news is that it is possible. I will describe how to automatically create and mount a new EBS volume (optionally based on a snapshot). If you would prefer to mount a specific, existing EBS volume, you should check out leg100’s docker-ebs-attach (using AWS API to mount the volume) that you can use either in a multi-container setup or just include the relevant parts in your own Dockerfile. The problem with EBS volumes is that, if I am correct, a volume can only be mounted to a single EC2 instance – and thus doesn’t play well with EB’s autoscaling. That is why EB supports only creating and mounting a fresh volume for each instance. Why would you want to use an auto-created EBS volume? You can already use a docker VOLUME to mount a directory on the host system’s ephemeral storage to make data persistent across docker restarts/redeploys. The only advantage of EBS is that it survives restarts of the EC2 instance but that is something that, I suppose, happens rarely. I suspect that in most cases EB actually creates a new EC2 instance and then destroys the old one. One possible benefit of an EBS volume is that you can take a snapshot of it and use that to launch future instances. I’m now inclined to believe that a better solution in most cases is to set up automatic backup to and restore from S3, f.ex. using duplicity with its S3 backend (as I do for my NAS). Anyway, here is how I got EBS volume mounting working. There are 4 parts to the solution: Configure EB to create an EBS mount for your instances Add custom EB commands to format and mount the volume upon first use Restart the Docker daemon after the volume is mounted so that it will see it (see this discussion) Configure Docker to mount the (mounted) volume inside the container 1-3.: .ebextensions/01-ebs.config: # .ebextensions/01-ebs.config commands: 01format-volume: command: mkfs -t ext3 /dev/sdh test: file -sL /dev/sdh | grep -v 'ext3 filesystem' # ^ prints '/dev/sdh: data' if not formatted 02attach-volume: ### Note: The volume may be renamed by the Kernel, e.g. sdh -> xvdh but # /dev/ will then contain a symlink from the old to the new name command: | mkdir /media/ebs_volume mount /dev/sdh /media/ebs_volume service docker restart # We must restart Docker daemon or it wont' see the new mount test: sh -c "! grep -qs '/media/ebs_volume' /proc/mounts" option_settings: # Tell EB to create a 100GB volume and mount it to /dev/sdh - namespace: aws:autoscaling:launchconfiguration option_name: BlockDeviceMappings value: /dev/sdh=:100 4.: Dockerrun.aws.json and Dockerfile: Dockerrun.aws.json: mount the host’s /media/ebs_volume as /var/easydeploy/share inside the container: { "AWSEBDockerrunVersion": "1", "Volumes": [ { "HostDirectory": "/media/ebs_volume", "ContainerDirectory": "/var/easydeploy/share" } ] } Dockerfile: Tell Docker to use a directory on the host system as /var/easydeploy/share – either a randomly generated one or the one given via the -m mount option to docker run: ... VOLUME ["/var/easydeploy/share"] ...
June 3, 2015
by Jakub Holý
· 14,771 Views
article thumbnail
How to Setup Intellij IDE War Exploded Artifact with Multiple CDI Dependent Projects
I have a large Java project with many sub modules, and they have simple top down dependencies like this: ProjectX +-ModuleLibA +-ModuleLibB +-ModuleCdiLibC +-ModuleCdiLibC2 +-ModuleLibD +-ModuleCdiLibE +-ModuleCdiLibE2 +-ModuleCdiLibE3 +-ModuleLibF +-ModuleLibG +-ModuleWebAppX Each of these modules has their own third party dependency jars. When I say top down, it simply means Module from bottom have all the one above sub module and its third party dependencies as well. The project is large, and with many files, but the structure is straight forward. It does have large amount of third party jars though. At the end, the webapp would have over 100 jars packaged in WEB-INF/lib folder! When you create this project structure in IntelliJ IDE (no I do not have the luxury of using Maven in this case), all the third party dependencies are nicely exported and managed from one Module to another as I create my Project with existing source and third parties jars. I do not need to re-define any redudant jars libraries definitions between Modules. When it come to define ModuleWebAppX at the end, all I have to do is to add ModuleLibG as a project dependency, and it brings all the other "transitives" dependent jars in! This is all done by IntelliJ IDE, which is very nice! IntelliJ IDE also let you setup Artifacts from your project to prepare for package and deployment that can run inside your IDE servers. By default, any web application will have an option to create a war:exploded artifact definition, and the IDE will automatically copy and update your project build artifacts into this output folder, and it can be deploy/redeploy into any EE server as exploded mode nicely. All these work really smoothly, until there is one problem that hit hard! The way IntelliJ IDE package default war:exploded artifact is that it will copy all the .class files generated from each Modules into a single "out/artifact/ProjectX_war_exploded" output folder. This works most of the time when our Java package and classes are unique, but not so with resource files that's not unique! My project uses several dependent CDI based modules. As you might know, each CDI module suppose to define their own, one and single location at META-INF/beans.xml to enable it and to customize CDI behavior. But becuase IntelliJ IDE flatten everything into a single output directory, I've lost the unique beans.xml file per each Module! This problem is hard to troubleshoot since it doesn't produce any error at first, nor it stops the web app from running. It just not able to load certain CDI beans that you have customized in the beans.xml!!! To resolve this, I have to make the IntelliJIDE artifact dependent modules to generate it's JAR instead of all copy into a single output. But we still want it to auto copy generated build files into the JAR archive automatically when we make a change. Lukcly IntelliJ has this feature. This is how I do it: 1. Open your project settings then select Artifacts on left. 2. Choose your war:exploded artifacts and look to your right. 3. Under OutputLayout tab, expand WEB-INF/lib, then right clik and "Create Archive" > Enter your moduleX name.jar. 4. Right click this newly created archive moduleX.jar name, then "Add Copy of" > "Module Output" and select one of your dependent module. 5. Repeat for each of the CDI based Modules! I wish there is a easier way to do across all Modules for this, but at least this manual solution works!
June 2, 2015
by Zemian Deng
· 16,533 Views
article thumbnail
Four Ways to Quickly Test Swift Code
As developers, we are always looking for a better, faster way of doing things. Whenever I am learning a new language that typically runs in an IDE, then I begin to look for ways to test code snippets through either the Terminal for Mac or the command prompt on Windows. Swift is no exception. As I’ve been working more and more with this language, I’ve uncovered four ways to quickly test Swift code that are not only great for your day-to-day job, but can be used to collaborate and help others learn this new language. #1 : REPL (Read-Eval-Print-Loop) Xcode’s debugger includes an interactive version of the Swift language, known as the REPL (Read-Eval-Print-Loop). This allows you to try out the Swift language within LLDB in Xcode’s console, or from Terminal. If you have at least Xcode 6.1 or higher, then you can simply open your terminal and type: swift You can also invoke it with the following commands on earlier versions of Xcode 6 : xcrun swift lldb --repl It looks like the following: This is great for quick code snippets that you might want to try without launching Xcode. #2 : Swift playgrounds Swift playgrounds are a way to compile and run Swift code live as you type. The results of each line are presented in a timeline as they execute, and variables can be inspected at any point. Playgrounds are typically created as a standalone project (as the image below indicates), but they can be created within an existing Xcode project as well. There are plenty of sample playgrounds out there, and you are free to usemine to get started. Below you will see an example of the timeline in action, providing a visual look of arrays, for loops and more. The obvious reason to use Swift playgrounds is the rich editor that includes syntax highlighting, code completion and more. The disadvantage is that you have to open Xcode in order to do so. #3 : Using an Online Editor SwiftStub has become one of the most popular ways to compile and run Swift code on the fly without requiring a Mac. All you need is a web-browser open to SwiftStub and off you go. It includes the functionality that you would expect, such as a custom URLs and uploading or saving a playground, but it also supports team collaboration. You can easily add people to your current Swift project and even add audio and group chat if neccessary. #4 : Using iTerm2 with Guard-shell This is my preferred environment, but it is geared towards power users that don’t mind spending a few extra minutes setting it up. Don’t worry if you have never done this before as I’ll walk you through the process, step-by-step. I prefer to use iTerm2. Think of it as a replacement for the Terminal app on Mac. In the words of the authors, “iTerm2 brings the terminal into the modern age with features you never knew you always wanted.” I’ve been using it for a couple of months and couldn’t agree more. We are also going to use the help of Guard-shell to automatically run shell commands when watched files are modified. In this case, we’ll be watching files with the .swift extention. Once you have these applications downloaded, you only need to remember a few commands to get started… Within iTerm2, press ⌘D to get a Vertical Split and ⇧⌘D for a horizontal split. Navigate to your home directory and type: vim Guardfile Once you are inside the Guardfile, you will need to switch to “Insert” mode. Simply type the following and when you are finished press “esc” and then type :w to save the file. Type :x to save and exit vim. source 'https://rubygems.org' gem 'guard-shell' You will now have a file named Gemfile and it is time to install the gem. Simply type: bundle install You should then see the following: Fetching gem metadata from https://rubygems.org/............ Fetching version metadata from https://rubygems.org/.. Resolving dependencies... Using hitimes 1.2.2 Using timers 4.0.1 Using celluloid 0.16.0 Installing coderay 1.1.0 Using ffi 1.9.8 Installing formatador 0.2.5 Using rb-fsevent 0.9.4 Using rb-inotify 0.9.5 Using listen 2.9.0 Installing lumberjack 1.0.9 Installing nenv 0.2.0 Installing shellany 0.0.1 Installing notiffany 0.0.6 Installing method_source 0.8.2 Installing slop 3.6.0 Installing pry 0.10.1 Installing thor 0.19.1 Installing guard 2.12.5 Installing guard-compat 1.2.1 Installing guard-shell 0.7.1 Using bundler 1.8.5 Bundle complete! 1 Gemfile dependency, 21 gems now installed. Now would be a good time to create a directory where you want guard-shell to be monitoring for .swift files that have changed. I created a folder called Swift, then ran the following command : bundle exec guard init shell A new file called Guardfile will be created in that folder. Now type vim Guardfile, enter the following lines and save the file the same way you did before. guard :shell do watch(/(.*).swift/) do |m| puts puts puts puts "Running #{m[0]}" puts `swift #{m[0]}` end end Finally type: bundle exec guard If everything worked successfully, then Guard-shell will inform you that it is watching a folder as shown below: Switch over to your left-hand panel and make sure you are in the folder that Guard is watching and type “vim test.swift” and type the following Swift code: var first = "hello" var second = "world" println("\(first) \(second)") Use :w to save the file and see the output in the right-hand panel as shown below. Wrap-up Hopefully you can find a solution that works for your development process out of the four options that I presented today. I assume that, since you are interested in testing Swift code snippets, you are building Swift apps as well. You may be interested in my article on how to build a task app in Swift as well. In addition, Telerik provides several powerful UI componentsfor iOS such as Charts, Calandar, ListView and more. Thanks for reading and sound off in the comments below with your ideal environment.
May 27, 2015
by Michael Crump
· 9,029 Views
article thumbnail
Server and Storage I/O Benchmark Tools: Microsoft Diskspd (Part I)
A key to improving performance is benchmarking. Read about Microsoft Diskspd's tools for storage and server benchmarking, and boost your I/O performance.
May 22, 2015
by Greg Schulz
· 14,750 Views
article thumbnail
Why Android Studio Is Better For Android Developers Instead Of Eclipse
Besides, Android Studio platform developers also use Eclipse to develop applications, but always thought of Eclipse like a "Student-Project IDE " and learned about it.
May 20, 2015
by Mehul Rajput
· 68,311 Views · 1 Like
article thumbnail
Use RegEx to Test Password Strength in JavaScript
In this post, we learn how to combine JavaScript and RegEx to create scripts that can help us test our password strength.
May 16, 2015
by Nic Raboy
· 93,721 Views · 1 Like
article thumbnail
Log Collection With Graylog on AWS
Log collection is essential to properly analyzing issues in production. An interface to search and be notified about exceptions on all your servers is a must. Well, if you have one server, you can easily ssh to it and check the logs, of course, but for larger deployments, collecting logs centrally is way more preferable than logging to 10 machines in order to find “what happened”. There are many options to do that, roughly separated in two groups – 3rd party services and software to be installed by you. 3rd party (or “cloud-based” if you want) log collection services include Splunk,Loggly, Papertrail, Sumologic. They are very easy to setup and you pay for what you use. Basically, you send each message (e.g. via a custom logback appender) to a provider’s endpoint, and then use the dashboard to analyze the data. In many cases that would be the preferred way to go. In other cases, however, company policy may frown upon using 3rd party services to store company-specific data, or additional costs may be undesired. In these cases extra effort needs to be put into installing and managing an internal log collection software. They work in a similar way, but implementation details may differ (e.g. instead of sending messages with an appender to a target endpoint, the software, using some sort of an agent, collects local logs and aggregates them). Open-source options include Graylog, FluentD, Flume, Logstash. After a very quick research, I considered graylog to fit our needs best, so below is a description of the installation procedure on AWS (though the first part applies regardless of the infrastructure). The first thing to look at are the ready-to-use images provided by graylog, including docker, openstack, vagrant and AWS. Unfortunately, the AWS version has two drawbacks – it’s using Ubuntu, rather than the Amazon AMI. That’s not a huge issue, although some generic scripts you use in your stack may have to be rewritten. The other was the dealbreaker – when you start it, it doesn’t run a web interface, although it claims it should. Only mongodb, elasticsearch and graylog-server are started. Having 2 instances – one web, and one for the rest would complicate things, so I opted for manual installation. Graylog has two components – the server, which handles the input, indexing and searching, and the web interface, which is a nice UI that communicates with the server. The web interface uses mongodb for metadata, and the server uses elasticsearch to store the incoming logs. Below is a bash script (CentOS) that handles the installation. Note that there is no “sudo”, because initialization scripts are executed as root on AWS. #!/bin/bash # install pwgen for password-generation yum upgrade ca-certificates --enablerepo=epel yum --enablerepo=epel -y install pwgen # mongodb cat >/etc/yum.repos.d/mongodb-org.repo <<'EOT' [mongodb-org] name=MongoDB Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/ gpgcheck=0 enabled=1 EOT yum -y install mongodb-org chkconfig mongod on service mongod start # elasticsearch rpm --import https://packages.elasticsearch.org/GPG-KEY-elasticsearch cat >/etc/yum.repos.d/elasticsearch.repo <<'EOT' [elasticsearch-1.4] name=Elasticsearch repository for 1.4.x packages baseurl=http://packages.elasticsearch.org/elasticsearch/1.4/centos gpgcheck=1 gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch enabled=1 EOT yum -y install elasticsearch chkconfig --add elasticsearch # configure elasticsearch sed -i -- 's/#cluster.name: elasticsearch/cluster.name: graylog2/g' /etc/elasticsearch/elasticsearch.yml sed -i -- 's/#network.bind_host: localhost/network.bind_host: localhost/g' /etc/elasticsearch/elasticsearch.yml service elasticsearch stop service elasticsearch start # java yum -y update yum -y install java-1.7.0-openjdk update-alternatives --set java /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java # graylog wget https://packages.graylog2.org/releases/graylog2-server/graylog-1.0.1.tgz tar xvzf graylog-1.0.1.tgz -C /opt/ mv /opt/graylog-1.0.1/ /opt/graylog/ cp /opt/graylog/bin/graylogctl /etc/init.d/graylog sed -i -e 's/GRAYLOG2_SERVER_JAR=\${GRAYLOG2_SERVER_JAR:=graylog.jar}/GRAYLOG2_SERVER_JAR=\${GRAYLOG2_SERVER_JAR:=\/opt\/graylog\/graylog.jar}/' /etc/init.d/graylog sed -i -e 's/LOG_FILE=\${LOG_FILE:=log\/graylog-server.log}/LOG_FILE=\${LOG_FILE:=\/var\/log\/graylog-server.log}/' /etc/init.d/graylog cat >/etc/init.d/graylog <<'EOT' #!/bin/bash # chkconfig: 345 90 60 # description: graylog control sh /opt/graylog/bin/graylogctl $1 EOT chkconfig --add graylog chkconfig graylog on chmod +x /etc/init.d/graylog # graylog web wget https://packages.graylog2.org/releases/graylog2-web-interface/graylog-web-interface-1.0.1.tgz tar xvzf graylog-web-interface-1.0.1.tgz -C /opt/ mv /opt/graylog-web-interface-1.0.1/ /opt/graylog-web/ cat >/etc/init.d/graylog-web <<'EOT' #!/bin/bash # chkconfig: 345 91 61 # description: graylog web interface sh /opt/graylog-web/bin/graylog-web-interface > /dev/null 2>&1 & EOT chkconfig --add graylog-web chkconfig graylog-web on chmod +x /etc/init.d/graylog-web #configure mkdir --parents /etc/graylog/server/ cp /opt/graylog/graylog.conf.example /etc/graylog/server/server.conf sed -i -e 's/password_secret =.*/password_secret = '$(pwgen -s 96 1)'/' /etc/graylog/server/server.conf sed -i -e 's/root_password_sha2 =.*/root_password_sha2 = '$(echo -n password | shasum -a 256 | awk '{print $1}')'/' /etc/graylog/server/server.conf sed -i -e 's/application.secret=""/application.secret="'$(pwgen -s 96 1)'"/g' /opt/graylog-web/conf/graylog-web-interface.conf sed -i -e 's/graylog2-server.uris=""/graylog2-server.uris="http:\/\/127.0.0.1:12900\/"/g' /opt/graylog-web/conf/graylog-web-interface.conf service graylog start sleep 30 service graylog-web start You may also want to set a TTL (auto-expiration) for messages, so that you don’t store old logs forever. Here’s how # wait for the index to be created INDEXES=$(curl --silent "http://localhost:9200/_cat/indices") until [[ "$INDEXES" =~ "graylog2_0" ]]; do sleep 5 echo "Index not yet created. Indexes: $INDEXES" INDEXES=$(curl --silent "http://localhost:9200/_cat/indices") done # set each indexed message auto-expiration (ttl) curl -XPUT "http://localhost:9200/graylog2_0/message/_mapping" -d'{"message": {"_ttl" : { "enabled" : true, "default" : "15d" }}' Now you have everything running on the instance. Then you have to do some AWS-specific things (if using CloudFormation, that would include a pile of JSON). Here’s the list: you can either have an auto-scaling group with one instance, or a single instance. I prefer the ASG, though the other one is a bit simpler. The ASG gives you auto-respawn if the instance dies. set the above script to be invoked in the UserData of the launch configuration of the instance/asg (e.g. by getting it from s3 first) allow UDP port 12201 (the default logging port). That should happen for the instance/asg security group (inbound), for the application nodes security group (outbound), and also as a network ACL of your VPC. Test the UDP connection to make sure it really goes through. Keep the access restricted for all sources, except for your instances. you need to pass the private IP address of your graylog server instance to all the application nodes. That’s tricky on AWS, as private IP addresses change. That’s why you need something stable. You can’t use an ELB (load balancer), because it doesn’t support UDP. There are two options: Associate an Elastic IP with the node on startup. Pass that IP to the application nodes. But there’s a catch – if they connect to the elastic IP, that would go via NAT (if you have such), and you may have to open your instance “to the world”. So, you must turn the elastic IP into its corresponding public DNS. The DNS then will be resolved to the private IP. You can do that by manually and hacky: 1 GRAYLOG_ADDRESS="ec2-$GRAYLOG_ADDRESS//./-}.us-west-1.compute.amazonaws.com" or you can use the AWS EC2 CLI to obtain the instance details of the instance that the elastic IP is associated with, and then with another call obtain its Public DNS. Instead of using an Elastic IP, which limits you to a single instance, you can use Route53 (the AWS DNS manager). That way, when a graylog server instance starts, it can append itself to a route53 record, that way allowing for a round-robin DNS of multiple graylog instances that are in a cluster. Manipulating the Route53 records is again done via the AWS CLI. Then you just pass the domain name to applications nodes, so that they can send messages. alternatively, you can install graylog-server on all the nodes (as an agent), and point them to an elasticsearch cluster. But that’s more complicated and probably not the intended way to do it configure your logging framework to send messages to graylog. There are standard GELF (the greylog format) appenders, e.g. this one, and the only thing you have to do is use the Public DNS environment variable in the logback.xml (which supports environment variable resolution). You should make the web interface accessible outside the network, so you can use an ELB for that, or the round-robin DNS mentioned above. Just make sure the security rules are tight and not allowing external tampering with your log data. If you are not running a graylog cluster (which I won’t cover), then the single instance can potentially fail. That isn’t a great loss, as log messages can be obtained from the instances, and they are short-lived anyway. But the metadata of the web interface is important – dashboards, alerts, etc. So it’s good to do regular backups (e.g. with mongodump). Using an EBS volume is also an option. Even though you send your log messages to the centralized log collector, it’s a good idea to also keep local logs, with the proper log rotation and cleanup. It’s not a trivial process, but it’s essential to have log collection, so I hope the guide has been helpful.
May 14, 2015
by Bozhidar Bozhanov
· 19,980 Views
article thumbnail
Inspecting Thread Dumps of Hung Python Processes and Test Runs
Sometimes, moderately complex Python applications with several threads tend to hang on exit. The application refuses to quit and just idles there waiting for something. Often this is because if any of the Python threads are alive when the process tries to exit it will wait any alive thread to terminate, unless Thread.daemon is set to true. In the past, it used to be little painful to figure out which thread and function causes the application to hang, but no longer! Since Python 3.3 CPython interpreter comes with a faulthandler module. faulthandler is a mechanism to tell the Python interpreter to dump the stack trace of every thread upon receiving an external UNIX signal. Here is an example how to figure out why the unit test run, executed with pytest, does not exit cleanly. All tests finish, but the test suite refuses to quit. First we run the tests and set a special environment variable PYTHONFAULTHANDLER telling CPython interpreter to activate the fault handler. This environment variable works regardless how your Python application is started (you run python command, you run a script directly, etc.). PYTHONFAULTHANDLER=true py.test And then the test suite has finished, printing out the last dot… but nothing happens despite our ferocious sipping of coffee. dotdotdotmoredotsthenthenthedotsstopappearing .. How to proceed: Press CTRL-Z to suspend the current active process in UNIX shell. Use the following command to send SIGABRT signal to the suspended process. kill -SIGABRT %1 Voilá – you get the traceback. In this case, it instantly tells SQLAlchemy is waiting for something and most likely the database has deadlocked due to open conflicting transactions. atal Python error: Aborted Thread 0x0000000103538000 (most recent call first): File "/opt/local/Library/Fra% meworks/Python.framework/Versions/3.4/lib/python3.4/socketserver.py", line 154 in _eintr_retry File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/socketserver.py", line 236 in serve_forever File "/Users/mikko/code/trees/pyramid_web20/pyramid_web20/tests/functional.py", line 40 in run File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 921 in _bootstrap_inner File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 889 in _bootstrap Current thread 0x00007fff75128310 (most recent call first): File "/Users/mikko/code/trees/venv/lib/python3.4/site-packages/SQLAlchemy-1.0.0b5-py3.4-macosx-10.9-x86_64.egg/sqlalchemy/engine/default.py", line 442 in do_execute ... File "/Users/mikko/code/trees/venv/lib/python3.4/site-packages/SQLAlchemy-1.0.0b5-py3.4-macosx-10.9-x86_64.egg/sqlalchemy/sql/schema.py", line 3638 in drop_all File "/Users/mikko/code/trees/pyramid_web20/pyramid_web20/tests/conftest.py", line 124 in teardown ... File "/Users/mikko/code/trees/venv/lib/python3.4/site-packages/_pytest/config.py", line 41 in main File "/Users/mikko/code/trees/venv/bin/py.test", line 9 in
April 27, 2015
by Mikko Ohtamaa
· 13,421 Views
article thumbnail
How to Work with Merged Cells in Word Documents Table inside Android Apps
This technical tip shows how developers can work with merged cells in a Word documents inside Android applications. Several cells in a table can be merged together into a single cell. This is useful when crows require a title or large blocks of text which span across the width of the table. This can only be achieved by merging cells in the table into a single cell. Aspose.Words supports merged cells when working with all input formats including when importing HTML content. In Aspose.Words, merged cells are represented by CellFormat.HorizontalMerge and CellFormat.VerticalMerge. The CellFormat.HorizontalMerge property describes if the cell is part of a horizontal merge of cells. Likewise the CellFormat.VerticalMerge property describes if the cell is a part of a vertical merge of cells. The values of these properties are what define the merge behavior of cells. The first cell in a sequence of merged cells will have CellMerge.First. Any subsequent merged cells has CellMerge.Previous. A cell which is not merged has CellMerge.None. Sometimes when you load an existing document cells in a table will appear merged. However these can be in fact one long cell. Microsoft Word at times is known to export merged cells in this way. This can cause confusion when attempting to work with individual cells. There appears to be no particular pattern as to when this happens. //Checking if a Cell is Merged // Prints the horizontal and vertical merge type of a cell. public void checkCellsMerged() throws Exception { Document doc = new Document(getMyDir() + "Table.MergedCells.doc"); // Retrieve the first table in the document. Table table = (Table)doc.getChild(NodeType.TABLE, 0, true); for (Row row : table.getRows()) { for (Cell cell : row.getCells()) { System.out.println(printCellMergeType(cell)); } } } public String printCellMergeType(Cell cell) { boolean isHorizontallyMerged = cell.getCellFormat().getHorizontalMerge() != CellMerge.NONE; boolean isVerticallyMerged = cell.getCellFormat().getVerticalMerge() != CellMerge.NONE; String cellLocation = MessageFormat.format("R{0}, C{1}", cell.getParentRow().getParentTable().indexOf(cell.getParentRow()) + 1, cell.getParentRow().indexOf(cell) + 1); if (isHorizontallyMerged && isVerticallyMerged) return MessageFormat.format("The cell at {0} is both horizontally and vertically merged", cellLocation); else if (isHorizontallyMerged) return MessageFormat.format("The cell at {0} is horizontally merged.", cellLocation); else if (isVerticallyMerged) return MessageFormat.format("The cell at {0} is vertically merged", cellLocation); else return MessageFormat.format("The cell at {0} is not merged", cellLocation); } //Merging Cells in a Table //Creates a table with two rows with cells in the first row horizontally merged. Document doc = new Document(); DocumentBuilder builder = new DocumentBuilder(doc); builder.insertCell(); builder.getCellFormat().setHorizontalMerge(CellMerge.FIRST); builder.write("Text in merged cells."); builder.insertCell(); // This cell is merged to the previous and should be empty. builder.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS); builder.endRow(); builder.insertCell(); builder.getCellFormat().setHorizontalMerge(CellMerge.NONE); builder.write("Text in one cell."); builder.insertCell(); builder.write("Text in another cell."); builder.endRow(); builder.endTable(); //Example: Merging Cells Vertically //Creates a table with two columns with cells merged vertically in the first column. Document doc = new Document(); DocumentBuilder builder = new DocumentBuilder(doc); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.FIRST); builder.write("Text in merged cells."); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.NONE); builder.write("Text in one cell"); builder.endRow(); builder.insertCell(); // This cell is vertically merged to the cell above and should be empty. builder.getCellFormat().setVerticalMerge(CellMerge.PREVIOUS); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.NONE); builder.write("Text in another cell"); builder.endRow(); builder.endTable(); //Merging all Cells in a Range //A method which merges all cells of a table in the specified range of cells /** * Merges the range of cells found between the two specified cells both horizontally and vertically. Can span over multiple rows. */ public static void mergeCells(Cell startCell, Cell endCell) { Table parentTable = startCell.getParentRow().getParentTable(); // Find the row and cell indices for the start and end cell. Point startCellPos = new Point(startCell.getParentRow().indexOf(startCell), parentTable.indexOf(startCell.getParentRow())); Point endCellPos = new Point(endCell.getParentRow().indexOf(endCell), parentTable.indexOf(endCell.getParentRow())); // Create the range of cells to be merged based off these indices. Inverse each index if the end cell if before the start cell. Rectangle mergeRange = new Rectangle(Math.min(startCellPos.x, endCellPos.x), Math.min(startCellPos.y, endCellPos.y), Math.abs(endCellPos.x - startCellPos.x) + 1, Math.abs(endCellPos.y - startCellPos.y) + 1); for (Row row : parentTable.getRows()) { for(Cell cell : row.getCells()) { Point currentPos = new Point(row.indexOf(cell), parentTable.indexOf(row)); // Check if the current cell is inside our merge range then merge it. if (mergeRange.contains(currentPos)) { if (currentPos.x == mergeRange.x) cell.getCellFormat().setHorizontalMerge(CellMerge.FIRST); else cell.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS); if (currentPos.y == mergeRange.y) cell.getCellFormat().setVerticalMerge(CellMerge.FIRST); else cell.getCellFormat().setVerticalMerge(CellMerge.PREVIOUS); } } } } //Merging Cells between Two Cells // Merges the range of cells between the two specified cells // We want to merge the range of cells found inbetween these two cells. Cell cellStartRange = table.getRows().get(2).getCells().get(2); Cell cellEndRange = table.getRows().get(3).getCells().get(3); // Merge all the cells between the two specified cells into one. mergeCells(cellStartRange, cellEndRange);
April 22, 2015
by David Zondray
· 6,378 Views
article thumbnail
Mockito & DBUnit: Implementing a Mocking Structure Focused and Independent to Automated Tests on Java
On this post, we will make a hands-on about Mockito and DBUnit, two libraries from Java's open source ecosystem which can help us in improving our JUnit tests on focus and independence. But why mocking is so important on our unit tests? Focusing the tests Let's imagine a Java back-end application with a tier-like architecture. On this application, we could have 2 tiers: The service tier, which have the business rules and make as a interface for the front-end; The entity tier, which have the logic responsible for making calls to a database, utilizing techonologies like JDBC or JPA; Of course, on a architecture of this kind, we will have the following dependence of our tiers: Service >>> Entity On this kind of architecture, the most common way of building our automated tests is by creating JUnit Test Classes which test each tier independently, thus we can make running tests that reflect only the correctness of the tier we want to test. However, if we simply create the classes without any mocking, we will got problems like the following: On the JUnit tests of our service tier, for example, if we have a problem on the entity tier, we will have also our tests failed, because the error from the entity tier will reverberate across the tiers; If we have a project where different teams are working on the same system, and one team is responsible for the construction of the service tier, while the other is responsible for the construction of the entity tier, we will have a dependency of one team with the other before the tests could be made; To resolve such issues, we could mock the entity tier on the service tier's unit test classes, so we can have independence and focus of our tests on the service tier, which it belongs. independence One point that it is specially important when we make our JUnit test classes in the independence department is the entity tier. Since in our example this tier is focused in the connection and running of SQL commands on a database, it makes a break on our independence goal, since we will need a database so we can run our tests. Not only that, if a test breaks any structure that it is used by the subsequent tests, all of them will also fail. It is on this point that enters our other library, DBUnit. With DBUnit, we can use embedded databases, such as HSQLDB, to make our database exclusive to the running of our tests. So, without further delay, let's begin our hands-on! Hands-on For this lab, we will create a basic CRUD for a Client entity. The structure will follow the simple example we talked about previously, with the DAO (entity) and Service tiers. We will use DBUnit and JUnit to test the DAO tier, and Mockito with JUnit to test the Service tier. First, let's create a Maven project, without any archetype and include the following dependencies on pom.xml: . . . junit junit 4.12 org.dbunit dbunit 2.5.0 org.mockito mockito-all 1.10.19 org.hibernate hibernate-entitymanager 4.3.8.Final org.hsqldb hsqldb 2.3.2 org.springframework spring-core 4.1.4.RELEASE org.springframework spring-context 4.1.5.RELEASE org.springframework spring-test 4.1.5.RELEASE org.springframework spring-tx 4.1.5.RELEASE org.springframework spring-orm 4.1.5.RELEASE . . . On the previous snapshot, we included not only the Mockito, DBUnit and JUnit libraries, but we also included Hibernate to implement the persistence layer and Spring 4 to use the IoC container and the transaction management. We also included the Spring Test library, which includes some features that we will use later on this lab. Finally, to simplify the setup and remove the need of installing a database to run the code, we will use HSQLDB as our database. Our lab will have the following structure: One class will represent the application itself, as a standalone class, where we will consume the tiers, like a real application would do; We will have another 2 classes, each one with JUnit tests, that will test each tier independently; First, we define a persistence unit, where we define the name of the unit and the properties to make Hibernate create the table for us and populate her with some initial rows. The code of the persistence.xml can be seen bellow: com.alexandreesl.handson.model.Client And the initial data to populate the table can be seen bellow: insert into Client(id,name,sex, phone) values (1,'Alexandre Eleuterio Santos Lourenco','M','22323456'); insert into Client(id,name,sex, phone) values (2,'Lucebiane Santos Lourenco','F','22323876'); insert into Client(id,name,sex, phone) values (3,'Maria Odete dos Santos Lourenco','F','22309456'); insert into Client(id,name,sex, phone) values (4,'Eleuterio da Silva Lourenco','M','22323956'); insert into Client(id,name,sex, phone) values (5,'Ana Carolina Fernandes do Sim','F','22123456'); In order to not making the post burdensome, we will not discuss the project structure during the lab, but just show the final structure at the end. The code can be found on a Github repository, at the end of the post. With the persistence unit defined, we can start coding! First, we create the entity class: package com.alexandreesl.handson.model; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.Id; import javax.persistence.Table; @Table(name = "Client") @Entity public class Client { @Id private long id; @Column(name = "name", nullable = false, length = 50) private String name; @Column(name = "sex", nullable = false) private String sex; @Column(name = "phone", nullable = false) private long phone; public long getId() { return id; } public void setId(long id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getSex() { return sex; } public void setSex(String sex) { this.sex = sex; } public long getPhone() { return phone; } public void setPhone(long phone) { this.phone = phone; } } In order to create the persistence-related beans to enable Hibernate and the transaction manager, alongside all the rest of the beans necessary for the application, we use a Java-based Spring configuration class. The code of the class can be seen bellow: package com.alexandreesl.handson.core; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.jdbc.datasource.DriverManagerDataSource; import org.springframework.orm.jpa.JpaTransactionManager; import org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean; import org.springframework.orm.jpa.vendor.Database; import org.springframework.orm.jpa.vendor.HibernateJpaDialect; import org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter; import org.springframework.transaction.annotation.EnableTransactionManagement; @Configuration @EnableTransactionManagement @ComponentScan({ "com.alexandreesl.handson.dao", "com.alexandreesl.handson.service" }) public class AppConfiguration { @Bean public DriverManagerDataSource dataSource() { DriverManagerDataSource dataSource = new DriverManagerDataSource(); dataSource.setDriverClassName("org.hsqldb.jdbcDriver"); dataSource.setUrl("jdbc:hsqldb:mem://standalone"); dataSource.setUsername("sa"); dataSource.setPassword(""); return dataSource; } @Bean public JpaTransactionManager transactionManager() { JpaTransactionManager transactionManager = new JpaTransactionManager(); transactionManager.setEntityManagerFactory(entityManagerFactory() .getNativeEntityManagerFactory()); transactionManager.setDataSource(dataSource()); transactionManager.setJpaDialect(jpaDialect()); return transactionManager; } @Bean public HibernateJpaDialect jpaDialect() { return new HibernateJpaDialect(); } @Bean public HibernateJpaVendorAdapter jpaVendorAdapter() { HibernateJpaVendorAdapter jpaVendor = new HibernateJpaVendorAdapter(); jpaVendor.setDatabase(Database.HSQL); jpaVendor.setDatabasePlatform("org.hibernate.dialect.HSQLDialect"); return jpaVendor; } @Bean public LocalContainerEntityManagerFactoryBean entityManagerFactory() { LocalContainerEntityManagerFactoryBean entityManagerFactory = new LocalContainerEntityManagerFactoryBean(); entityManagerFactory .setPersistenceXmlLocation("classpath:META-INF/persistence.xml"); entityManagerFactory.setPersistenceUnitName("persistence"); entityManagerFactory.setDataSource(dataSource()); entityManagerFactory.setJpaVendorAdapter(jpaVendorAdapter()); entityManagerFactory.setJpaDialect(jpaDialect()); return entityManagerFactory; } } And finally, we create the classes that represent the tiers itself. This is the DAO class: package com.alexandreesl.handson.dao; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; import org.springframework.stereotype.Component; import org.springframework.transaction.annotation.Transactional; import com.alexandreesl.handson.model.Client; @Component public class ClientDAO { @PersistenceContext private EntityManager entityManager; @Transactional(readOnly = true) public Client find(long id) { return entityManager.find(Client.class, id); } @Transactional public void create(Client client) { entityManager.persist(client); } @Transactional public void update(Client client) { entityManager.merge(client); } @Transactional public void delete(Client client) { entityManager.remove(client); } } And this is the service class: package com.alexandreesl.handson.service; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import com.alexandreesl.handson.dao.ClientDAO; import com.alexandreesl.handson.model.Client; @Component public class ClientService { @Autowired private ClientDAO clientDAO; public ClientDAO getClientDAO() { return clientDAO; } public void setClientDAO(ClientDAO clientDAO) { this.clientDAO = clientDAO; } public Client find(long id) { return clientDAO.find(id); } public void create(Client client) { clientDAO.create(client); } public void update(Client client) { clientDAO.update(client); } public void delete(Client client) { clientDAO.delete(client); } } The reader may notice that we created a getter/setter to the DAO class on the Service class. This is not necessary for the Spring injection, but we made this way to get easier to change the real DAO by a Mockito's mock on the tests class. Finally, we code the class we talked about previously, the one that consume the tiers: package com.alexandreesl.handson.core; import org.springframework.context.ApplicationContext; import org.springframework.context.annotation.AnnotationConfigApplicationContext; import com.alexandreesl.handson.model.Client; import com.alexandreesl.handson.service.ClientService; public class App { public static void main(String[] args) { ApplicationContext context = new AnnotationConfigApplicationContext( AppConfiguration.class); ClientService service = (ClientService) context .getBean(ClientService.class); System.out.println(service.find(1).getName()); System.out.println(service.find(3).getName()); System.out.println(service.find(5).getName()); Client client = new Client(); client.setId(6); client.setName("Celina do Sim"); client.setPhone(44657688); client.setSex("F"); service.create(client); System.out.println(service.find(6).getName()); System.exit(0); } } If we run the class, we can see that the console print all the clients we searched for and that Hibernate is initialized properly, proving our implementation is a success: Mar 28, 2015 1:09:22 PM org.springframework.context.annotation.AnnotationConfigApplicationContext prepareRefresh INFO: Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@6433a2: startup date [Sat Mar 28 13:09:22 BRT 2015]; root of context hierarchy Mar 28, 2015 1:09:22 PM org.springframework.jdbc.datasource.DriverManagerDataSource setDriverClassName INFO: Loaded JDBC driver: org.hsqldb.jdbcDriver Mar 28, 2015 1:09:22 PM org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean createNativeEntityManagerFactory INFO: Building JPA container EntityManagerFactory for persistence unit 'persistence' Mar 28, 2015 1:09:22 PM org.hibernate.jpa.internal.util.LogHelper logPersistenceUnitInformation INFO: HHH000204: Processing PersistenceUnitInfo [ name: persistence ...] Mar 28, 2015 1:09:22 PM org.hibernate.Version logVersion INFO: HHH000412: Hibernate Core {4.3.8.Final} Mar 28, 2015 1:09:22 PM org.hibernate.cfg.Environment INFO: HHH000206: hibernate.properties not found Mar 28, 2015 1:09:22 PM org.hibernate.cfg.Environment buildBytecodeProvider INFO: HHH000021: Bytecode provider name : javassist Mar 28, 2015 1:09:22 PM org.hibernate.annotations.common.reflection.java.JavaReflectionManager INFO: HCANN000001: Hibernate Commons Annotations {4.0.5.Final} Mar 28, 2015 1:09:23 PM org.hibernate.dialect.Dialect INFO: HHH000400: Using dialect: org.hibernate.dialect.HSQLDialect Mar 28, 2015 1:09:23 PM org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory INFO: HHH000397: Using ASTQueryTranslatorFactory Mar 28, 2015 1:09:23 PM org.hibernate.tool.hbm2ddl.SchemaExport execute INFO: HHH000227: Running hbm2ddl schema export Mar 28, 2015 1:09:23 PM org.hibernate.tool.hbm2ddl.SchemaExport execute INFO: HHH000230: Schema export complete Alexandre Eleuterio Santos Lourenco Maria Odete dos Santos Lourenco Ana Carolina Fernandes do Sim Celina do Sim Now, let's move on for the tests themselves. For the DBUnit tests, we create a Base class, which will provide the base DB operations which all of our JUnit tests will benefit. On the @PostConstruct method, which is fired after all the injections of the Spring context are made - reason why we couldn't use the @BeforeClass annotation, because we need Spring to instantiate and inject the EntityManager first - we use DBUnit to make a connection to our database, with the class DatabaseConnection and populate the table using the DataSet class we created, passing a XML structure that represents the data used on the tests. This operation of populating the table is made by the DatabaseOperation class, which we use with the CLEAN_INSERT operation, that truncate the table first and them insert the data on the dataset. Finally, we use one of JUnit's event listeners, the @After event, which is called after every test case. On our scenario, we use this event to call the clear() method on the EntityManager, which forces Hibernate to query against the Database for the first time at every test case, thus eliminating possible problems we could have between our test cases because of data that it is different on the second level cache than it is on the DB. The code for the base class is the following: package com.alexandreesl.handson.dao.test; import java.io.InputStream; import java.sql.SQLException; import javax.annotation.PostConstruct; import javax.persistence.EntityManager; import javax.persistence.EntityManagerFactory; import javax.persistence.PersistenceUnit; import org.dbunit.DatabaseUnitException; import org.dbunit.database.DatabaseConfig; import org.dbunit.database.DatabaseConnection; import org.dbunit.database.IDatabaseConnection; import org.dbunit.dataset.IDataSet; import org.dbunit.dataset.xml.FlatXmlDataSetBuilder; import org.dbunit.ext.hsqldb.HsqldbDataTypeFactory; import org.dbunit.operation.DatabaseOperation; import org.hibernate.HibernateException; import org.hibernate.internal.SessionImpl; import org.junit.After; public class BaseDBUnitSetup { private static IDatabaseConnection connection; private static IDataSet dataset; @PersistenceUnit public EntityManagerFactory entityManagerFactory; private EntityManager entityManager; @PostConstruct public void init() throws HibernateException, DatabaseUnitException, SQLException { entityManager = entityManagerFactory.createEntityManager(); connection = new DatabaseConnection( ((SessionImpl) (entityManager.getDelegate())).connection()); connection.getConfig().setProperty( DatabaseConfig.PROPERTY_DATATYPE_FACTORY, new HsqldbDataTypeFactory()); FlatXmlDataSetBuilder flatXmlDataSetBuilder = new FlatXmlDataSetBuilder(); InputStream dataSet = Thread.currentThread().getContextClassLoader() .getResourceAsStream("test-data.xml"); dataset = flatXmlDataSetBuilder.build(dataSet); DatabaseOperation.CLEAN_INSERT.execute(connection, dataset); } @After public void afterTests() { entityManager.clear(); } } The xml structure used on the test cases is the following: And the code of our test class of the DAO tier is the following: package com.alexandreesl.handson.dao.test; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNull; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import org.springframework.test.context.transaction.TransactionConfiguration; import org.springframework.transaction.annotation.Transactional; import com.alexandreesl.handson.core.test.AppTestConfiguration; import com.alexandreesl.handson.dao.ClientDAO; import com.alexandreesl.handson.model.Client; @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(classes = AppTestConfiguration.class) @TransactionConfiguration(defaultRollback = true) public class ClientDAOTest extends BaseDBUnitSetup { @Autowired private ClientDAO clientDAO; @Test public void testFind() { Client client = clientDAO.find(1); assertNotNull(client); client = clientDAO.find(2); assertNotNull(client); client = clientDAO.find(3); assertNull(client); client = clientDAO.find(4); assertNull(client); client = clientDAO.find(5); assertNull(client); } @Test @Transactional public void testInsert() { Client client = new Client(); client.setId(3); client.setName("Celina do Sim"); client.setPhone(44657688); client.setSex("F"); clientDAO.create(client); } @Test @Transactional public void testUpdate() { Client client = clientDAO.find(1); client.setPhone(12345678); clientDAO.update(client); } @Test @Transactional public void testRemove() { Client client = clientDAO.find(1); clientDAO.delete(client); } } The code is very self explanatory so we will just focus on explaining the annotations at the top-level class. The @RunWith(SpringJUnit4ClassRunner.class) annotationchanges the JUnit base class that runs our test cases, using rather one made by Spring that enable support of the IoC container and the Spring's annotations. The @TransactionConfiguration(defaultRollback = true) annotation is from Spring's test library and change the behavior of the @Transactional annotation, making the transactions to roll back after execution, instead of a commit. That ensures that our test cases wont change the structure of the DB, so a test case wont break the execution of his followers. The reader may notice that we changed the configuration class to another one, exclusive for the test cases. It is essentially the same beans we created on the original configuration class, just changing the database bean to point to a different DB then the previously one, showing that we can change the database of our tests without breaking the code. On a real world scenario, the configuration class of the application would be pointing to a relational database like Oracle, DB2, etc and the test cases would use a embedded database such as HSQLDB, which we are using on this case: package com.alexandreesl.handson.core.test; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.jdbc.datasource.DriverManagerDataSource; import org.springframework.orm.jpa.JpaTransactionManager; import org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean; import org.springframework.orm.jpa.vendor.Database; import org.springframework.orm.jpa.vendor.HibernateJpaDialect; import org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter; import org.springframework.transaction.annotation.EnableTransactionManagement; @Configuration @EnableTransactionManagement @ComponentScan({ "com.alexandreesl.handson.dao", "com.alexandreesl.handson.service" }) public class AppTestConfiguration { @Bean public DriverManagerDataSource dataSource() { DriverManagerDataSource dataSource = new DriverManagerDataSource(); dataSource.setDriverClassName("org.hsqldb.jdbcDriver"); dataSource.setUrl("jdbc:hsqldb:mem://standalone-test"); dataSource.setUsername("sa"); dataSource.setPassword(""); return dataSource; } @Bean public JpaTransactionManager transactionManager() { JpaTransactionManager transactionManager = new JpaTransactionManager(); transactionManager.setEntityManagerFactory(entityManagerFactory() .getNativeEntityManagerFactory()); transactionManager.setDataSource(dataSource()); transactionManager.setJpaDialect(jpaDialect()); return transactionManager; } @Bean public HibernateJpaDialect jpaDialect() { return new HibernateJpaDialect(); } @Bean public HibernateJpaVendorAdapter jpaVendorAdapter() { HibernateJpaVendorAdapter jpaVendor = new HibernateJpaVendorAdapter(); jpaVendor.setDatabase(Database.HSQL); jpaVendor.setDatabasePlatform("org.hibernate.dialect.HSQLDialect"); return jpaVendor; } @Bean public LocalContainerEntityManagerFactoryBean entityManagerFactory() { LocalContainerEntityManagerFactoryBean entityManagerFactory = new LocalContainerEntityManagerFactoryBean(); entityManagerFactory .setPersistenceXmlLocation("classpath:META-INF/persistence.xml"); entityManagerFactory.setPersistenceUnitName("persistence"); entityManagerFactory.setDataSource(dataSource()); entityManagerFactory.setJpaVendorAdapter(jpaVendorAdapter()); entityManagerFactory.setJpaDialect(jpaDialect()); return entityManagerFactory; } } If we run the test class, we can see that it runs the test cases successfully, showing that our code is a success. If we see the console, we can see that transactions were created and rolled back, respecting our configuration: . . . ar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext startTransaction INFO: Began transaction (1) for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@1a411233, testMethod = testInsert@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]; transaction manager [org.springframework.orm.jpa.JpaTransactionManager@7c2327fa]; rollback [true] Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext endTransaction INFO: Rolled back transaction for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@1a411233, testMethod = testInsert@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]. Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext startTransaction INFO: Began transaction (1) for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@2adddc06, testMethod = testRemove@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]; transaction manager [org.springframework.orm.jpa.JpaTransactionManager@7c2327fa]; rollback [true] Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext endTransaction INFO: Rolled back transaction for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@2adddc06, testMethod = testRemove@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]. Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext startTransaction INFO: Began transaction (1) for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@4905c46b, testMethod = testUpdate@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]; transaction manager [org.springframework.orm.jpa.JpaTransactionManager@7c2327fa]; rollback [true] Mar 28, 2015 2:29:55 PM org.springframework.test.context.transaction.TransactionContext endTransaction INFO: Rolled back transaction for test context [DefaultTestContext@644abb8f testClass = ClientDAOTest, testInstance = com.alexandreesl.handson.dao.test.ClientDAOTest@4905c46b, testMethod = testUpdate@ClientDAOTest, testException = [null], mergedContextConfiguration = [MergedContextConfiguration@70325d20 testClass = ClientDAOTest, locations = '{}', classes = '{class com.alexandreesl.handson.core.test.AppTestConfiguration}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{}', contextLoader = 'org.springframework.test.context.support.DelegatingSmartContextLoader', parent = [null]]]. Now let's move on to the Service tests, with the help of Mockito. The class to test the Service tier is very simple, as we can see bellow: package com.alexandreesl.handson.service.test; import static org.junit.Assert.assertEquals; import org.junit.BeforeClass; import org.junit.Test; import org.mockito.Mockito; import org.mockito.invocation.InvocationOnMock; import org.mockito.stubbing.Answer; import com.alexandreesl.handson.dao.ClientDAO; import com.alexandreesl.handson.model.Client; import com.alexandreesl.handson.service.ClientService; public class ClientServiceTest { private static ClientDAO clientDAO; private static ClientService clientService; @BeforeClass public static void beforeClass() { clientService = new ClientService(); clientDAO = Mockito.mock(ClientDAO.class); clientService.setClientDAO(clientDAO); Client client = new Client(); client.setId(0); client.setName("Mocked client!"); client.setPhone(11111111); client.setSex("M"); Mockito.when(clientDAO.find(Mockito.anyLong())).thenReturn(client); Mockito.doThrow(new RuntimeException("error on client!")) .when(clientDAO).delete((Client) Mockito.any()); Mockito.doNothing().when(clientDAO).create((Client) Mockito.any()); Mockito.doAnswer(new Answer
April 14, 2015
by Alexandre Lourenco
· 21,780 Views · 2 Likes
article thumbnail
Patterns of API Virtualization
[This article was written by Matthew Heusser.] When Christopher Alexander wrote A Pattern Language in 1977, he was looking for a more powerful way to describe how towns and buildings were laid out. These patterns would allow architects, builders and planners to work together, to use the same words, mean the same thing, and create systems that were beautiful and worked, instead of more urban sprawl. Twenty years later, Gamma, Helms, Johnson and Vlissdes took the pattern idea and applied it to object-oriented software, which at the time was struggling to figure out how to create windows-based applications. Today the struggle is figuring out how to break software into small components that can be tested independently, and then having those components interact, typically over internet protocols. Raw SQL commands are giving way to service oriented systems that interact through APIs, sometimes all within one company, sometimes outside with Microsoft, Google, Amazon, or other APIs like a manufacturing company or supplier. While I do not claim to be Christopher Alexander or the Gang of Four, I am seeing some patterns emerge – a set of solutions to a defined problem – and would like to share a few of those today. What do you mean API? Alistair Cockburn’s Hexagonal Architecture (below) presents a way to think about APIs. The application we want to develop is in the middle and has a set of adapters to the external world. Those adapters might be an API we expose, like a ‘search’ interface to an online catalog, or the API’s we call, including the database, an email gateway, or the ‘permissions’ service, to see what types of search results we should show to this user. Cockburn’s Hexagonal Architecture gives us two ways to think about APIs: Our own, and the services we call. (Source: http://alistair.cockburn.us/Hexagonal+architecture) That’s a lot of APIs. Let’s explore about some ways to virtualize these services – and why. Automated Build and Continuous Integration Say, for example, you are working on a piece of software to analyze trending terms on social media – such as a customer complaint that is being liked and tweeted. You want companies to find these problems when they start to trend up, then reach out to the customer and solve it, or, perhaps, reach out to say “thank you” and amplify it. Modern build systems, like Jenkins, TFS, and TeamCity can compile, deploy, and even run the system to check for known scenarios. The trouble is those pesky adapters to external systems, like Twitter and Facebook. The software could do its job, but there is no way to know if the application is correct in its guesses about trends and importance. Getting the data from the providers can turn a quick build into a slow process that uses a lot of network traffic. By recording and storing known answers to predictable requests, then simulating the service and playing back known (“canned”) data, API Virtualization allows build systems to do more, with faster, more predictable results. This does not remove the need for end-to-end testing, but it does allow the team to have more confidence with each build. Performance Testing Your Application Like build/deploy systems, performance testing the application (the inside of the hexagon) with live, external services can cause problems. All that extra traffic can cause problems with the actual company network infrastructure; it could cause bandwidth problems at the point of the ISP. Some 3rd Party APIs charge a micro-fee per transaction, or limit bandwidth. Many of them lack a ‘test’ sandbox to develop in, so performance testing could interact with real, production work. Standing up a virtual server to return pre-planned data means you can performance test your application – not the third party – prevent bandwidth throttles, not step on production data, and avoid paying fees intended for real (production) use that is actually being used to test our environment. Avoid Integration Environment Inconsistency A few years ago I worked at a large organization that was wrapping old code in proxy services, so they could be consumed by other teams. Login, add-to-cart, search catalog, create custom catalog, permissions, all of it was possible to access through API calls, most of it as simple as a web URL that returned some text. The problem was the “System Integration Test” environment, or SIT. Every team tested its services in SIT, which meant about a third of the time, something was broken. After finding a bug in the current build, we would track it back to the catalog service, walk over to that team, bring up the issue, and they would say “thanks, we are testing a new build of catalog.” We expected catalog to work in SIT. Anything else meant a waste of someone’s time. Automated tools reporting false errors were even worse. When teams performance tested their services, everything calling the service got slow, if it worked at all. By virtualizing services we could test our application end-to-end against known data, without the troubles of SIT, or having to build additional expensive test-lab-like copies of production. Best of all, creating the virtual services is a snap – just record the live service with a tool and instruct it to play back similar requests. Flip Integration Tests from Virtual To Real for Final Checking All this API virtualization creates a risk that the team will move from test to production and something will be different between the Virtual API and the live one. If the Virtual API server is just returning the same thing product did when we recorded it and we have automated checks in place, we can change our test server to point to the real service and re-run all the automated checks. As long as the source data hasn’t changed and we are reading, not writing, from production, the checks should all pass. If the production API has changed, we will get failures, and they will be easy enough to fix and retest. Simulate Slow or Unresponsive Service In The Middle Of A Long Running Transaction Sometimes you want to test if a server is overloaded or down. Calling Facebook and asking them to turn off their servers is unlikely to work; even just coordinating with the team down the hall could create a lot of overhead. You also might want to test this often – every day or every hour – and manually pulling a plug or coordinating with the Login team every hour might not be realistic. The trick is to bring the service down once and record the exact behavior of the system, then use a virtual server to simulate that behavior, over and over again, every day. That means you’ll get the exact behavior, not a guess, and know exactly how the application under test can deal with it. Early Development of System against an Undeployed API Sometimes the API you are testing against does not exist, even in test. It’s still possible to create a Virt (virtual API) which returns some roughly equivalent data, and makes it possible to move forward on the core application without introducing new risks. Avoid Configuration and Copying Hassles Many companies use a test system that is a copy of production, and then refresh the system periodically. Sometimes, you want test scenarios that do not exist in production, so you have to create them … and lose them during a refresh. The same problem happens with 3rd party APIs, when, for example, a part is discontinued, and you are testing ordering that part, or the sample person you check for insurance coverage leaves the company. If the request for the part of the coverage goes through an API, you can record known good results that don’t change, even after a database refresh – then leave the real, end-to-end testing for an exploratory step that will be lighter, quicker, more accurate, and have more confidence. A Fistful of Techniques Today we discussed a half-dozen common patterns to API virtualization, mostly around testing systems in isolation that consume data through an API, like a 3rd party or an internal service. These ideas are new, and evolving. What are a few of your favorites?
April 9, 2015
by Denis Goodwin
· 4,126 Views
article thumbnail
Package by Component and Architecturally-aligned Testing
i've seen and had lots of discussion about "package by layer" vs "package by feature" over the past couple of weeks. they both have their benefits but there's a hybrid approach i now use that i call "package by component". to recap... package by layer let's assume that we're building a web application based upon the web-mvc pattern. packaging code by layer is typically the default approach because, after all, that's what the books, tutorials and framework samples tell us to do. here we're organising code by grouping things of the same type. there's one top-level package for controllers, one for services (e.g. "business logic") and one for data access. layers are the primary organisation mechanism for the code. terms such as "separation of concerns" are thrown around to justify this approach and generally layered architectures are thought of as a "good thing". need to switch out the data access mechanism? no problem, everything is in one place. each layer can also be tested in isolation to the others around it, using appropriate mocking techniques, etc. the problem with layered architectures is that they often turn into a big ball of mud because, in java anyway, you need to mark your classes as public for much of this to work. package by feature instead of organising code by horizontal slice, package by feature seeks to do the opposite by organising code by vertical slice. now everything related to a single feature (or feature set) resides in a single place. you can still have a layered architecture, but the layers reside inside the feature packages. in other words, layering is the secondary organisation mechanism. the often cited benefit is that it's "easier to navigate the codebase when you want to make a change to a feature", but this is a minor thing given the power of modern ides. what you can do now though is hide feature specific classes and keep them out of sight from the rest of the codebase. for example, if you need any feature specific view models, you can create these as package-protected classes. the big question though is what happens when that new feature set c needs to access data from features a and b? again, in java, you'll need to start making classes publicly accessible from outside of the packages and the big ball of mud will again emerge. package by layer and package by feature both have their advantages and disadvantages. to quote jason gorman from schools of package architecture - an illustration , which was written seven years ago. to round off, then, i would urge you to be mindful of leaning to far towards either school of package architecture. don't just mindlessly put socks in the sock draw and pants in the pants draw, but don't be 100% driven by package coupling and cohesion to make those decisions, either. the real skill is finding the right balance, and creating packages that make stuff easier to find but are as cohesive and loosely coupled as you can make them at the same time. package by component this is a hybrid approach with increased modularity and an architecturally-evident coding style as the primary goals. the basic premise here is that i want my codebase to be made up of a number of coarse-grained components, with some sort of presentation layer (web ui, desktop ui, api, standalone app, etc) built on top. a "component" in this sense is a combination of the business and data access logic related to a specific thing (e.g. domain concept, bounded context, etc). as i've described before , i give these components a public interface and package-protected implementation details, which includes the data access code. if that new feature set c needs to access data related to a and b, it is forced to go through the public interface of components a and b. no direct access to the data access layer is allowed, and you can enforce this if you use java's access modifiers properly. again, "architectural layering" is a secondary organisation mechanism. for this to work, you have to stop using the public keyword by default . this structure raises some interesting questions about testing, not least about how we mock-out the data access code to create quick-running "unit tests". architecturally-aligned testing the short answer is don't bother, unless you really need to. i've spoken about and written about this before, but architecture and testing are related. instead of the typical testing triangle (lots of "unit" tests, fewer slower running "integration" tests and even fewer slower ui tests), consider this. i'm trying to make a conscious effort to not use the term "unit testing" because everybody has a different view of how big a "unit" is. instead, i've adopted a strategy where some classes can and should be tested in isolation. this includes things like domain classes, utility classes, web controllers (with mocked components), etc. then there are some things that are easiest to test as components, through the public interface. if i have a component that stores data in a mysql database, i want to test everything from the public interface right back to the mysql database. these are typically called "integration tests", but again, this term means different things to different people. of course, treating the component as a black box is easier if i have control over everything it touches. if you have a component that is sending asynchronous messages or using an external, third-party service, you'll probably still need to consider adding dependency injection points (e.g. ports and adapters) to adequately test the component, but this is the exception not the rule. all of this still applies if you are building a microservices style of architecture. you'll probably have some low-level class tests, hopefully a bunch of service tests where you're testing your microservices though their public interface, and some system tests that run scenarios end-to-end. oh, and you can still write all of this in a test-first, tdd style if that's how you work. i'm using this strategy for some systems that i'm building and it seems to work really well. i have a relatively simple, clean and (to be honest) boring codebase with understandable dependencies, minimal test-induced design damage and a manageable quantity of test code. this strategy also bridges the model-code gap , where the resulting code actually reflects the architectural intent. in other words, we often draw "components" on a whiteboard when having architecture discussions, but those components are hard to find in the resulting codebase. packaging code by layer is a major reason why this mismatch between the diagram and the code exists. those of you who are familiar with my c4 model will probably have noticed the use of the terms "class" and "component". this is no coincidence. architecture and testing are more related than perhaps we've admitted in the past. p.s. i'll be speaking about this topic over the next few months at events across europe, the us and (hopefully) australia
April 4, 2015
by Simon Brown
· 11,063 Views
article thumbnail
Fork/Join Framework vs. Parallel Streams vs. ExecutorService: The Ultimate Fork/Join Benchmark
How does the Fork/Join framework act under different configurations? Just like the upcoming episode of Star Wars, there has been a lot of excitement mixed with criticism around Java 8 parallelism. The syntactic sugar of parallel streams brought some hype almost like the new lightsaber we’ve seen in the trailer. With many ways now to do parallelism in Java, we wanted to get a sense of the performance benefits and the dangers of parallel processing. After over 260 test runs, some new insights rose from the data and we wanted to share these with you in this post. Fork/Join Framework vs. Parallel Streams vs. ExecutorService: The Ultimate Fork/Join Benchmark http://t.co/CMNfYZe58Z pic.twitter.com/6WExlmbyo6 — Takipi (@takipid) January 20, 2015 ExecutorService vs. Fork/Join Framework vs. Parallel Streams A long time ago, in a galaxy far, far away.... I mean, some 10 years ago concurrency was available in Java only through 3rd party libraries. Then came Java 5 and introduced the java.util.concurrent library as part of the language, strongly influenced by Doug Lea. The ExecutorService became available and provided us a straightforward way to handle thread pools. Of course java.util.concurrent keeps evolving and in Java 7 the Fork/Join framework was introduced, building on top of the ExecutorService thread pools. With Java 8 streams, we’ve been provided an easy way to use Fork/Join that remains a bit enigmatic for many developers. Let’s find out how they compare to one another. We’ve taken 2 tasks, one CPU-intensive and the other IO-intensive, and tested 4 different scenarios with the same basic functionality. Another important factor is the number of threads we use for each implementation, so we tested that as well. The machine we used had 8 cores available so we had variations of 4, 8, 16 and 32 threads to get a sense of the general direction the results are going. For each of the tasks, we’ve also tried a single threaded solution, which you’ll not see in the graphs since, well, it took much much longer to execute. To learn more about exactly how the tests ran you can check out the groundwork section below. Now, let’s get to it. Indexing a 6GB file with 5.8M lines of text In this test, we’ve generated a huge text file, and created similar implementations for the indexing procedure. Here’s what the results looked like: ** Single threaded execution: 176,267msec, or almost 3 minutes. ** Notice the graph starts at 20000 milliseconds. 1. Fewer threads will leave CPUs unutilized, too many will add overhead The first thing you notice in the graph is the shape the results are starting to take - you can get an impression of how each implementation behaves from only these 4 data points. The tipping point here is between 8 and 16 threads, since some threads are blocking in file IO, and adding more threads than cores helped utilize them better. When 32 threads are in, performance got worse because of the additional overhead. 2. Parallel Streams are the best! Almost 1 second better than the runner up: using Fork/Join directly Syntactic sugar aside (lambdas! we didn’t mention lambdas), we’ve seen parallel streams perform better than the Fork/Join and the ExecutorService implementations. 6GB of text indexed in 24.33 seconds. You can trust Java here to deliver the best result. 3. But… Parallel Streams also performed the worst: The only variation that went over 30 seconds This is another reminder of how parallel streams can slow you down. Let’s say this happens on machines that already run multithreaded applications. With a smaller number of threads available, using Fork/Join directly could actually be better than going through parallel streams - a 5 second difference, which makes for about an 18% penalty when comparing these 2 together. 4. Don’t go for the default pool size with IO in the picture When using the default pool size for Parallel Streams, the same number of cores on the machine (which is 8 here), performed almost 2 seconds worse than the 16 threads version. That’s a 7% penalty for going with the default pool size. The reason this happens is related with blocking IO threads. There’s more waiting going on, so introducing more threads lets us get more out of the CPU cores involved while other threads wait to be scheduled instead of being idle. How do you change the default Fork/Join pool size for parallel streams? You can either change the common Fork/Join pool size using a JVM argument: [java] -Djava.util.concurrent.ForkJoinPool.common.parallelism=16 [/java] (All Fork/Join tasks are using a common static pool the size of the number of your cores by default. The benefit here is reducing resource usage by reclaiming the threads for other tasks during periods of no use.) Or... You can use this trick and run Parallel Streams within a custom Fork/Join pool. This overrides the default use of the common Fork/Join pool and lets you use a pool you’ve set up yourself. Pretty sneaky. In the tests, we’ve used the common pool. 5. Single threaded performance was 7.25x worse than the best result Parallelism provided a 7.25x improvement, and considering the machine had 8 cores, it got pretty close to the theoretic 8x prediction! We can attribute the rest to overhead. With that being said, even the slowest parallelism implementation we tested, which this time was parallel streams with 4 threads (30.24sec), performed 5.8x better than the single threaded solution (176.27sec). What happens when you take IO out of the equation? Checking if a number is prime For the next round of tests, we’ve eliminated IO altogether and examined how long it would take to determine if some really big number is prime or not. How big? 19 digits. 1,530,692,068,127,007,263, or in other words: one quintillion seventy nine quadrillion three hundred sixty four trillion thirty eight billion forty eight million three hundred five thousand thirty three. Argh, let me get some air. Anyhow, we haven’t used any optimization other than running to its square root, so we checked all even numbers even though our big number doesn’t divide by 2 just to make it process longer. Spoiler alert: it’s a prime, so each implementation ran the same number of calculations. Here’s how it turned out: ** Single threaded execution: 118,127msec, or almost 2 minutes. ** Notice the graph starts at 20000 milliseconds 1. Smaller differences between 8 and 16 threads Unlike the IO test, we don’t have IO calls here so the performance of 8 and 16 threads was mostly similar, except for the Fork/Join solution. We’ve actually ran a few more sets of tests to make sure we’re getting good results here because of this “anomaly” but it turned out very similar time after time. We’d be glad to hear your thoughts about this in the comment section below. 2. The best results are similar for all methods We see that all implementations share a similar best result of around 28 seconds. No matter which way we tried to approach it, the results came out the same. This doesn’t mean that we’re indifferent to which method to use. Check out the next insight. 3. Parallel streams handle the thread overload better than other implementations This is the more interesting part. With this test, we see again that the the top results for running 16 threads are coming from using parallel streams. Moreover, in this version, using parallel streams was a good call for all variations of thread numbers. 4. Single threaded performance was 4.2x worse than the best result In addition, the benefit of using parallelism when running computationally intensive tasks is almost 2 times worse than the IO test with file IO. This makes sense since it’s a CPU intensive test, unlike the previous one where we could get an extra benefit from cutting down the time our cores were waiting on threads stuck with IO. Conclusion I’d recommend going to the source to learn more about when to use parallel streams and applying careful judgement anytime you do parallelism in Java. The best path to take would be running similar tests to these in a staging environment where you can try and get a better sense of what you’re up against. The factors you have to be mindful of are of course the hardware you’re running on (and the hardware you’re testing on), and the total number of threads in your application. This includes the common Fork/Join pool and code other developers on your team are working on. So try to keep those in check and get a full view of your application before adding parallelism of your own. Groundwork To run this test we’ve used an EC2 c3.2xlarge instance with 8 vCPUs and 15GB of RAM. A vCPU means there’s hyperthreading in place so in fact we have here 4 physical cores that each act as if it were 2. As far as the OS scheduler is concerned, we have 8 cores here. To try and make it as fair as we could, each implementation ran 10 times and we’ve taken the average run time of runs 2 through 9. That’s 260 test runs, phew! Another thing that was important is the processing time. We’ve chosen tasks that would take well over 20 seconds to process so the differences will be easier to spot and less affected by external factors. What’s next? The raw results are available right here, and the code is on GitHub. Please feel free to tinker around with it and let us know what kind of results you’re getting. If you have any more interesting insights or explanations for the results that we’ve missed, we’d be happy to read them and add it to the post. Originally posted on Takipi's blog
April 1, 2015
by Chen Harel
· 16,718 Views
article thumbnail
CompletableFuture Can't Be Interrupted
I wrote a lot about InterruptedException and interrupting threads already. In short if you call Future.cancel() not inly given Future will terminate pending get(), but also it will try to interrupt underlying thread. This is a pretty important feature that enables better thread pool utilization. I also wrote to always prefer CompletableFuture over standardFuture. It turns out the more powerful younger brother of Future doesn't handle cancel() so elegantly. Consider the following task, which we'll use later throughout the tests: class InterruptibleTask implements Runnable { private final CountDownLatch started = new CountDownLatch(1) private final CountDownLatch interrupted = new CountDownLatch(1) @Override void run() { started.countDown() try { Thread.sleep(10_000) } catch (InterruptedException ignored) { interrupted.countDown() } } void blockUntilStarted() { started.await() } void blockUntilInterrupted() { assert interrupted.await(1, TimeUnit.SECONDS) } } Client threads can examine InterruptibleTask to see whether it has started or was interrupted. First let's see how InterruptibleTask reacts to cancel() from outside: def "Future is cancelled without exception"() { given: def task = new InterruptibleTask() def future = myThreadPool.submit(task) task.blockUntilStarted() and: future.cancel(true) when: future.get() then: thrown(CancellationException) } def "CompletableFuture is cancelled via CancellationException"() { given: def task = new InterruptibleTask() def future = CompletableFuture.supplyAsync({task.run()} as Supplier, myThreadPool) task.blockUntilStarted() and: future.cancel(true) when: future.get() then: thrown(CancellationException) } So far so good. Clearly both Future and CompletableFuture work pretty much the same way - retrieving result after it was canceled throws CancellationException. But what about thread in myThreadPool? I thought it will be interrupted and thus recycled by the pool, how wrong was I! def "should cancel Future"() { given: def task = new InterruptibleTask() def future = myThreadPool.submit(task) task.blockUntilStarted() when: future.cancel(true) then: task.blockUntilInterrupted() } @Ignore("Fails with CompletableFuture") def "should cancel CompletableFuture"() { given: def task = new InterruptibleTask() def future = CompletableFuture.supplyAsync({task.run()} as Supplier, myThreadPool) task.blockUntilStarted() when: future.cancel(true) then: task.blockUntilInterrupted() } First test submits ordinary to and waits until it's started. Later we cancel and wait until is observed. will return when underlying thread is interrupted. Second test, however, fails. will never interrupt underlying thread, so despite looking as if it was cancelled, backing thread is still running and no is thrown from . Bug or a feature? , so unfortunately a feature: Parameters:mayInterruptIfRunning - this value has no effect in this implementation because interrupts are not used to control processing. RTFM, you say, but why CompletableFuture works this way? First let's examine how "old" Future implementations differ from CompletableFuture. FutureTask, returned from ExecutorService.submit() has the following cancel() implementation (I removed Unsafe with similar non-thread safe Java code, so treat it as pseudo code only): public boolean cancel(boolean mayInterruptIfRunning) { if (state != NEW) return false; state = mayInterruptIfRunning ? INTERRUPTING : CANCELLED; try { if (mayInterruptIfRunning) { try { Thread t = runner; if (t != null) t.interrupt(); } finally { // final state state = INTERRUPTED; } } } finally { finishCompletion(); } return true; } FutureTask has a state variable that follows this state diagram: In case of cancel() we can either enter CANCELLED state or go to INTERRUPTEDthrough INTERRUPTING. The core part is where we take runner thread (if exists, i.e. if task is currently being executed) and we try to interrupt it. This branch takes care of eager and forced interruption of already running thread. In the end we must notify all threads blocked on Future.get() in finishCompletion() (irrelevant here). So it's pretty obvious how old Future cancels already running tasks. What aboutCompletableFuture? Pseudo-code of cancel(): public boolean cancel(boolean mayInterruptIfRunning) { boolean cancelled = false; if (result == null) { result = new AltResult(new CancellationException()); cancelled = true; } postComplete(); return cancelled || isCancelled(); } Quite disappointing, we barely set result to CancellationException, ignoringmayInterruptIfRunning flag. postComplete() has a similar role tofinishCompletion() - notifies all pending callbacks registered on that future. Its implementation is rather unpleasant (using non-blocking Treiber stack) but it definitely doesn't interrupt any underlying thread. Reasons and implications Limited cancel() in case of CompletableFuture is not a bug, but a design decision.CompletableFuture is not inherently bound to any thread, while Future almost always represents background task. It's perfectly fine to create CompletableFuture from scratch (new CompletableFuture<>()) where there is simply no underlying thread to cancel. Still I can't help the feeling that majority of CompletableFutures will have an associated task and background thread. In that case malfunctioning cancel() is a potential problem. I no longer advice blindly replacing Future with CompletableFutureas it might change the behavior of applications relying on cancel(). This meansCompletableFuture intentionally breaks Liskov substitution principle - and this is a serious implication to consider.
March 30, 2015
by Tomasz Nurkiewicz
· 17,530 Views · 7 Likes
article thumbnail
How to Read Call Logs Programmatically From Android
It’s fairly easy. You need to add the following uses-permission in the Android manifest to get call history programmatically. interface in your activity. It has three methods. abstract Loader onCreateLoader(int id, Bundle args) //Instantiate and return a new Loader for the given ID. abstract void onLoadFinished(Loader loader, D data) //Called when a previously created loader has finished its load. abstract void onLoaderReset(Loader loader) //Called when a previously created loader is being reset, and thus making its data unavailable. To initialize a query, we need to call LoaderManager.initLoader() at the very first place. We are going to add a button and call this in that button events here and after this background framework will be initialized. As soon as the background framework is initialized, it calls your implementation of onCreateLoader(). To start the query, we have to return a CursorLoader from this method. @Override public Loader onCreateLoader(int loaderID, Bundle args) { Log.d(TAG, "onCreateLoader() >> loaderID : " + loaderID); switch (loaderID) { case URL_LOADER: // Returns a new CursorLoader return new CursorLoader( this, // Parent activity context CallLog.Calls.CONTENT_URI, // Table to query null, // Projection to return null, // No selection clause null, // No selection arguments null // Default sort order ); default: return null; } } We are going access our expected data from a Cursor. And we will get this in theonLoadFinished() method. @Override public void onLoadFinished(Loader loader, Cursor managedCursor) { Log.d(TAG, "onLoadFinished()"); StringBuilder sb = new StringBuilder(); int number = managedCursor.getColumnIndex(CallLog.Calls.NUMBER); int type = managedCursor.getColumnIndex(CallLog.Calls.TYPE); int date = managedCursor.getColumnIndex(CallLog.Calls.DATE); int duration = managedCursor.getColumnIndex(CallLog.Calls.DURATION); sb.append("Call Log Details "); sb.append("\n"); sb.append("\n"); sb.append(""); while (managedCursor.moveToNext()) { String phNumber = managedCursor.getString(number); String callType = managedCursor.getString(type); String callDate = managedCursor.getString(date); Date callDayTime = new Date(Long.valueOf(callDate)); String callDuration = managedCursor.getString(duration); String dir = null; int callTypeCode = Integer.parseInt(callType); switch (callTypeCode) { case CallLog.Calls.OUTGOING_TYPE: dir = "Outgoing"; break; case CallLog.Calls.INCOMING_TYPE: dir = "Incoming"; break; case CallLog.Calls.MISSED_TYPE: dir = "Missed"; break; } sb.append("") .append("Phone Number: ") .append("") .append(phNumber) .append(""); sb.append(""); sb.append(""); sb.append("") .append("Call Type:") .append("") .append(dir) .append(""); sb.append(""); sb.append(""); sb.append("") .append("Date & Time:") .append("") .append(callDayTime) .append(""); sb.append(""); sb.append(""); sb.append("") .append("Call Duration (Seconds):") .append("") .append(callDuration) .append(""); sb.append(""); sb.append(""); sb.append(""); } sb.append(""); managedCursor.close(); callLogsTextView.setText(Html.fromHtml(sb.toString())); } Output: Full Source code: https://github.com/rokon12/call-log
March 27, 2015
by A N M Bazlur Rahman DZone Core CORE
· 40,416 Views · 2 Likes
article thumbnail
Spock 1.0 with Groovy 2.4 Configuration Comparison in Maven and Gradle
Spock 1.0 has been finally released. About new features and enhancements I already wrote two blog posts. One of the recent changes was a separation on artifacts designed for specific Groovy versions: 2.0, 2.2, 2.3 and 2.4 to minimize a chance to come across a binary incompatibility in runtime (in the past there were only versions for Groovy 1.8 and 2.0+). That was done suddenly and based on the messages on the mailing list it confused some people. After being twice asked to help properly configure two projects I decided to write a short post presenting how to configure Spock 1.0 with Groovy 2.4 in Maven and Gradle. It is also a great place to compare how much work is required to do it in those two very popular build systems. Maven Maven does not natively support other JVM languages (like Groovy or Scala). To use it in the Maven project it is required to use a third party plugin. For Groovy the best option seems to be GMavenPlus (a rewrite of no longer maintained GMaven plugin). An alternative is a plugin which allows to use Groovy-Eclipse compiler with Maven, but it is not using official groovyc and in the past there were problems with being up-to-date with the new releases/features of Groovy. Sample configuration of GMavenPlus plugin could look like: org.codehaus.gmavenplus gmavenplus-plugin 1.4 compile testCompile As we want to write tests in Spock which recommends to name files with Spec suffix (from specification) in addition it is required to tell Surefire to look for tests also in those files: maven-surefire-plugin ${surefire.version} **/*Spec.java **/*Test.java Please notice that it is needed to include **/*Spec.java not **/*Spec.groovy to make it work. Also dependencies have to be added: org.codehaus.groovy groovy-all 2.4.1 org.spockframework spock-core 1.0-groovy-2.4 test It is very important to take a proper version of Spock. For Groovy 2.4 version 1.0-groovy-2.4 is required. For Groovy 2.3 version 1.0-groovy-2.3. In case of mistake Spock protests with a clear error message: Could not instantiate global transform class org.spockframework.compiler.SpockTransform specified at jar:file:/home/foo/.../spock-core-1.0-groovy-2.3.jar!/META-INF/services/org.codehaus.groovy.transform.ASTTransformation because of exception org.spockframework.util.IncompatibleGroovyVersionException: The Spock compiler plugin cannot execute because Spock 1.0.0-groovy-2.3 is not compatible with Groovy 2.4.0. For more information, see http://versioninfo.spockframework.org Together with other mandatory pom.xml elements the file size increased to over 50 lines of XML. Quite much just for Groovy and Spock. Let’s see how complicated it is in Gradle. Gradle Gradle has built-in support for Groovy and Scala. Without further ado Groovy plugin just has to be applied. apply plugin: 'groovy' Next the dependencies has to be added: compile 'org.codehaus.groovy:groovy-all:2.4.1' testCompile 'org.spockframework:spock-core:1.0-groovy-2.4' and the information where Gradle should look for them: repositories { mavenCentral() } Together with defining package group and version it took 15 lines of code in Groovy-based DSL. Btw, in case of Gradle it is also very important to match Spock and Groovy version, e.g. Groovy 2.4.1 and Spock 1.0-groovy-2.4. Summary Thanks to embedded support for Groovy and compact DSL Gradle is preferred solution to start playing with Spock (and Groovy in general). Nevertheless if you prefer Apache Maven with a help of GMavenPlus (and XML) it is also possible to build project tested with Spock. The minimal working project with Spock 1.0 and Groovy 2.4 configured in Maven and Gradle can be cloned from my GitHub. Note 1. I haven’t been using Maven in my project for over 2 years (I prefer Gradle), so if there is a better/easier way to configure Groovy and Spock with Maven just let me know in the comments. Note 2. The configuration examples assume that Groovy is used only for tests and the production code is written in Java. It is possible to mix Groovy and Java code together, but then the configuration is a little more complicated.
March 19, 2015
by Marcin Zajączkowski
· 12,459 Views · 3 Likes
  • Previous
  • ...
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×