Databases Resources

The Latest Databases Topics

RESTEasy (and Jersey as well) contain a minimal web server within their libraries which enables their users to start up a tiny web server.

March 13, 2015

by Mark Paluch

· 311,668 Views · 6 Likes

R/dplyr: Extracting Data Frame Column Value for Filtering With %in%

I’ve been playing around with dplyr over the weekend and wanted to extract the values from a data frame column to use in a later filtering step. I had a data frame: library(dplyr) df = data.frame(userId = c(1,2,3,4,5), score = c(2,3,4,5,5)) And wanted to extract the userIds of those people who have a score greater than 3. I started with: highScoringPeople = df %>% filter(score > 3) %>% select(userId) > highScoringPeople userId 1 3 2 4 3 5 And then filtered the data frame expecting to get back those 3 people: > df %>% filter(userId %in% highScoringPeople) [1] userId score <0 rows> (or 0-length row.names) No rows! I created vector with the numbers 3-5 to make sure that worked: > df %>% filter(userId %in% c(3,4,5)) userId score 1 3 4 2 4 5 3 5 5 That works as expected so highScoringPeople obviously isn’t in the right format to facilitate an ‘in lookup’. Let’s explore: > str(c(3,4,5)) num [1:3] 3 4 5 > str(highScoringPeople) 'data.frame': 3 obs. of 1 variable: $ userId: num 3 4 5 Now it’s even more obvious why it doesn’t work – highScoringPeople is still a data frame when we need it to be a vector/list. One way to fix this is to extract the userIds using the $ syntax instead of the select function: highScoringPeople = (df %>% filter(score > 3))$userId > str(highScoringPeople) num [1:3] 3 4 5 > df %>% filter(userId %in% highScoringPeople) userId score 1 3 4 2 4 5 3 5 5 Or if we want to do the column selection using dplyr we can extract the values for the column like this: highScoringPeople = (df %>% filter(score > 3) %>% select(userId))[[1]] > str(highScoringPeople) num [1:3] 3 4 5 Not so difficult after all.

March 12, 2015

by Mark Needham

· 15,228 Views

Why I Use OrientDB on Production Applications

Like many other Java developers, when i start a new Java development project that requires a database, i have hopes and dreams of what my database looks like: Java API (of course) Embeddable Pure Java Simple jar file for inclusion in my project Database stored in a directory on disk Faster than a rocket First I’m going to review these points, and then i’m going to talk about the database i chose for my latest project, which is in production now with hundreds of users accessing the web application each month. What I Want from My Database Here’s what i’m looking for in my database. These are the things that literally make me happy and joyous when writing code. Java API I code in Java. It’s natural for me to want to use a modern Java API for my database work. Embeddable My development productivity and programming enjoyment skyrocket when my database is embedded. The database starts and stops with my application. It’s easy to destroy my database and restart from scratch. I can upgrade my database by updating my database jar file. It’s easy to deploy my application into testing and production, because there’s no separate database server to startup and manage. (I know about the issue with clustering and an embedded database, but i’ll get to that.) Pure Java Back when i developed software that would be deployed on all manner of hardware, i was a stickler that all my code be pure Java, so that i could be confident that my code would run wherever customers and users deployed it. In this day of SaaS, i’m less picky. I develop on the Mac. I test and run in production on Linux. Those are the systems i care about, so if my database has some platform-specific code in it to make it run fast and well, i’m fine with that. Just as long as that platform-specific configuration is not exposed to me as the developer. Simple Jar File for Inclusion in My Project I really just want one database jar file to add to my project. And i don’t want that jar file messing with my code or the dependencies i include in my project. If the database uses Guava 1.2, and i’m using Guava 0.8, that can mess me up. I want my database to not interfere with jars that i use by introducing newer or older versions of class files that i already reference in my project’s jars. Database Stored in a Directory on Disk I like to destroy my database by deleting a directory. I like to run multiple, simultaneous databases by configuring each database to use a separate directory. That makes me super productive during development, and it makes it more fun for me to program to a database. Faster Than a Rocket I think that’s just a given. My Latest Project That Needs a Database My latest project is Floify.com. Floify is a Mortgage Borrower Portal, automating the process of collecting mortgage loan documents from borrowers and emailing milestone loan status updates to real estate agents and borrowers. Mortgage loan originators use Floify to automate the labor-intensive parts of their loan processes. The web application receives about 500 unique visitors per month. Floify experienced 28% growth in january 2015. Floify’s vital statistics are: 38,301 loan documents under management 3,619 registered users 3,113 loan packages under management The Database I Chose for My Latest Project When i started Floify, i looked for a database that met all the criteria i’ve described above. I decided against databases that were server-based (Postgres, etc). I decided against databases that weren’t Java-based (MongoDB, etc). I decided against databases that didn’t support ACID transactions. I narrowed my choices to OrientDB and Neo4j. It’s been a couple years since that decision process occurred, but i distinctly remember a few reasons why i ultimately chose OrientDB over Neo4j: Performance benchmarks for OrientDB were very impressive. The OrientDB development team was very active. Cost. OrientDB is free. Neo4j cost more than what i was willing to pay or what i could afford. I forget which it was. My Favourite OrientDB Features Here are some of my favourite features in OrientDB. These are not competitive advantages to OrientDB. It’s just some of the things that make me happy when coding against an embeddable database. I can create the database in code. I don’t have to use SQL for querying, but most of the time, i do. I already know SQL, and it’s just easy for me. I use the document database, and it’s very pleasant inserting new documents in Java. I can store multi-megabyte binary objects directly in the database. My database is stored in a directory on disk. When scalability demands it, i can upgrade to a two-server distributed database. I haven’t been there yet. Speed. For me, OrientDB is very fast, and in the few years i’ve been using it, it’s become faster. OrientDB doesn’t come in a single jar file, as would be my ideal. I have to include a few different jars, but that’s an easy tradeoff for me. Future In the future, as Floify’s performance and scalability needs demand it, i’ll investigate a multi-server database configuration on OrientDB. In the meantime, i’m preparing to upgrade to OrientDB 2.0, which was recently released and promises even more speed. Go speed. :-)

March 5, 2015

by Dave Sims

· 18,361 Views · 6 Likes

Using MongoDB with Hadoop & Spark: Part 2 - Hive Example

Originally Written by Matt Kalan Welcome to part two of our three-part series on MongoDB and Hadoop. In part one, we introduced Hadoop and how to set it up. In this post, we'll look at a Hive example. Introduction & Setup of Hadoop and MongoDB Hive Example Spark Example & Key Takeaways For more detail on the use case, see the first paragraph of part 1. Summary Use case: aggregating 1 minute intervals of stock prices into 5 minute intervals Input:: 1 minute stock prices intervals in a MongoDB database Simple Analysis: performed in: - Hive - Spark Output: 5 minute stock prices intervals in Hadoop Hive Example I ran the following example from the Hive command line (simply typing the command “hive” with no parameters), not Cloudera’s Hue editor, as that would have needed additional installation steps. I immediately noticed the criticism people have with Hive, that everything is compiled into MapReduce which takes considerable time. I ran most things with just 20 records to make the queries run quickly. This creates the definition of the table in Hive that matches the structure of the data in MongoDB. MongoDB has a dynamic schema for variable data shapes but Hive and SQL need a schema definition. CREATE EXTERNAL TABLE minute_bars ( id STRUCT, Symbol STRING, Timestamp STRING, Day INT, Open DOUBLE, High DOUBLE, Low DOUBLE, Close DOUBLE, Volume INT ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id", "Symbol":"Symbol", "Timestamp":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minbars'); Recent changes in the Apache Hive repo make the mappings necessary even if you are keeping the field names the same. This should be changed in the MongoDB Hadoop Connector soon if not already by the time you read this. Then I ran the following command to create a Hive table for the 5 minute bars: CREATE TABLE five_minute_bars ( id STRUCT, Symbol STRING, Timestamp STRING, Open DOUBLE, High DOUBLE, Low DOUBLE, Close DOUBLE ); This insert statement uses the SQL windowing functions to group 5 1-minute periods and determine the OHLC for the 5 minutes. There are definitely other ways to do this but here is one I figured out. Grouping in SQL is a little different from grouping in the MongoDB aggregation framework (in which you can pull the first and last of a group easily), so it took me a little while to remember how to do it with a subquery. The subquery takes each group of 5 1-minute records/documents, sorts them by time, and takes the open, high, low, and close price up to that record in each 5-minute period. Then the outside WHERE clause selects the last 1-minute bar in that period (because that row in the subquery has the correct OHLC information for its 5-minute period). I definitely welcome easier queries to understand but you can run the subquery by itself to see what it’s doing too. INSERT INTO TABLE five_minute_bars SELECT m.id, m.Symbol, m.OpenTime as Timestamp, m.Open, m.High, m.Low, m.Close FROM (SELECT id, Symbol, FIRST_VALUE(Timestamp) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as OpenTime, LAST_VALUE(Timestamp) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as CloseTime, FIRST_VALUE(Open) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as Open, MAX(High) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as High, MIN(Low) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as Low, LAST_VALUE(Close) OVER ( PARTITION BY floor(unix_timestamp(Timestamp, 'yyyy-MM-dd HH:mm')/(5*60)) ORDER BY Timestamp) as Close FROM minute_bars) as m WHERE unix_timestamp(m.CloseTime, 'yyyy-MM-dd HH:mm') - unix_timestamp(m.OpenTime, 'yyyy-MM-dd HH:mm') = 60*4; I can definitely see the benefit of being able to use SQL to access data in MongoDB and optionally in other databases and file formats, all with the same commands, while the mapping differences are handled in the table declarations. The downside is that the latency is quite high, but that could be made up some with the ability to scale horizontally across many nodes. I think this is the appeal of Hive for most people - they can scale to very large data volumes using traditional SQL, and latency is not a primary concern. Post #3 in this blog series shows similar examples using Spark. Introduction & Setup of Hadoop and MongoDB Hive Example Spark Example & Key Takeaways To learn more, watch our video on MongoDB and Hadoop. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights. WATCH MONGODB & HADOOP << Read Part 1

March 2, 2015

by Francesca Krihely

· 10,932 Views

Standing Up a Local Netflix Eureka

Here I will consider two different ways of standing up a local instance of Netflix Eureka. If you are not familiar with Eureka, it provides a central registry where (micro)services can register themselves and client applications can use this registry to look up specific instances hosting a service and to make the service calls. Approach 1: Native Eureka Library The first way is to simply use the archive file generated by the Netflix Eureka build process: 1. Clone the Eureka source repository here: https://github.com/Netflix/eureka 2. Run "./gradlew build" at the root of the repository, this should build cleanly generating a war file in eureka-server/build/libs folder 3. Grab this file, rename it to "eureka.war" and place it in the webapps folder of either tomcat or jetty. For this exercise I have used jetty. 4. Start jetty, by default jetty will boot up at port 8080, however I wanted to instead bring it up at port 8761, so you can start it up this way, "java -jar start.jar -Djetty.port=8761" The server should start up cleanly and can be verified at this endpoint - "http://localhost:8761/eureka/v2/apps" Approach 2: Spring-Cloud-Netflix Spring-Cloud-Netflix provides a very neat way to bootstrap Eureka. To bring up Eureka server using Spring-Cloud-Netflix the approach that I followed was to clone the sample Eureka server application available here: https://github.com/spring-cloud-samples/eureka 1. Clone this repository 2. From the root of the repository run "mvn spring-boot:run", and that is it!. The server should boot up cleanly and the REST endpoint should come up here: "http://localhost:8761/eureka/apps". As a bonus, Spring-Cloud-Netflix provides a neat UI showing the various applications who have registered with Eureka at the root of the webapp at "http://localhost:8761/". Just a few small issues to be aware of, note that the context url's are a little different in the two cases "eureka/v2/apps" vs "eureka/apps", this can be adjusted on the configurations of the services which register with Eureka. Conclusion Your mileage with these approaches may vary. I have found Spring-Cloud-Netflix a little unstable at times but it has mostly worked out well for me. The documentation at the Spring-Cloud site is also far more exhaustive than the one provided at the Netflix Eureka site.

February 26, 2015

by Biju Kunjummen

· 13,332 Views

Redirecting All Kinds of stdout in Python

A common task in Python (especially while testing or debugging) is to redirect sys.stdout to a stream or a file while executing some piece of code. However, simply "redirecting stdout" is sometimes not as easy as one would expect; hence the slightly strange title of this post. In particular, things become interesting when you want C code running within your Python process (including, but not limited to, Python modules implemented as C extensions) to also have its stdout redirected according to your wish. This turns out to be tricky and leads us into the interesting world of file descriptors, buffers and system calls. But let's start with the basics. Pure Python The simplest case arises when the underlying Python code writes to stdout, whether by calling print, sys.stdout.write or some equivalent method. If the code you have does all its printing from Python, redirection is very easy. With Python 3.4 we even have a built-in tool in the standard library for this purpose - contextlib.redirect_stdout. Here's how to use it: from contextlib import redirect_stdout f = io.StringIO() with redirect_stdout(f): print('foobar') print(12) print('Got stdout: "{0}"'.format(f.getvalue())) When this code runs, the actual print calls within the with block don't emit anything to the screen, and you'll see their output captured by in the stream f. Incidentally, note how perfect the with statement is for this goal - everything within the block gets redirected; once the block is done, things are cleaned up for you and redirection stops. If you're stuck on an older and uncool Python, prior to 3.4 [1], what then? Well, redirect_stdout is really easy to implement on your own. I'll change its name slightly to avoid confusion: from contextlib import contextmanager @contextmanager def stdout_redirector(stream): old_stdout = sys.stdout sys.stdout = stream try: yield finally: sys.stdout = old_stdout So we're back in the game: f = io.StringIO() with stdout_redirector(f): print('foobar') print(12) print('Got stdout: "{0}"'.format(f.getvalue())) Redirecting C-level streams Now, let's take our shiny redirector for a more challenging ride: import ctypes libc = ctypes.CDLL(None) f = io.StringIO() with stdout_redirector(f): print('foobar') print(12) libc.puts(b'this comes from C') os.system('echo and this is from echo') print('Got stdout: "{0}"'.format(f.getvalue())) I'm using ctypes to directly invoke the C library's puts function [2]. This simulates what happens when C code called from within our Python code prints to stdout - the same would apply to a Python module using a C extension. Another addition is the os.system call to invoke a subprocess that also prints to stdout. What we get from this is: this comes from C and this is from echo Got stdout: "foobar 12 " Err... no good. The prints got redirected as expected, but the output from puts and echo flew right past our redirector and ended up in the terminal without being caught. What gives? To grasp why this didn't work, we have to first understand what sys.stdout actually is in Python. Detour - on file descriptors and streams This section dives into some internals of the operating system, the C library, and Python [3]. If you just want to know how to properly redirect printouts from C in Python, you can safely skip to the next section (though understanding how the redirection works will be difficult). Files are opened by the OS, which keeps a system-wide table of open files, some of which may point to the same underlying disk data (two processes can have the same file open at the same time, each reading from a different place, etc.) File descriptors are another abstraction, which is managed per-process. Each process has its own table of open file descriptors that point into the system-wide table. Here's a schematic, taken from The Linux Programming Interface: File descriptors allow sharing open files between processes (for example when creating child processes with fork). They're also useful for redirecting from one entry to another, which is relevant to this post. Suppose that we make file descriptor 5 a copy of file descriptor 4. Then all writes to 5 will behave in the same way as writes to 4. Coupled with the fact that the standard output is just another file descriptor on Unix (usually index 1), you can see where this is going. The full code is given in the next section. File descriptors are not the end of the story, however. You can read and write to them with the read and write system calls, but this is not the way things are typically done. The C runtime library provides a convenient abstraction around file descriptors - streams. These are exposed to the programmer as the opaque FILE structure with a set of functions that act on it (for example fprintf and fgets). FILE is a fairly complex structure, but the most important things to know about it is that it holds a file descriptor to which the actual system calls are directed, and it provides buffering, to ensure that the system call (which is expensive) is not called too often. Suppose you emit stuff to a binary file, a byte or two at a time. Unbuffered writes to the file descriptor with write would be quite expensive because each write invokes a system call. On the other hand, using fwrite is much cheaper because the typicall call to this function just copies your data into its internal buffer and advances a pointer. Only occasionally (depending on the buffer size and flags) will an actual write system call be issued. With this information in hand, it should be easy to understand what stdout actually is for a C program. stdout is a global FILE object kept for us by the C library, and it buffers output to file descriptor number 1. Calls to functions like printf and puts add data into this buffer. fflush forces its flushing to the file descriptor, and so on. But we're talking about Python here, not C. So how does Python translate calls to sys.stdout.write to actual output? Python uses its own abstraction over the underlying file descriptor - a file object. Moreover, in Python 3 this file object is further wrapper in an io.TextIOWrapper, because what we pass to print is a Unicode string, but the underlying write system calls accept binary data, so encoding has to happen en route. The important take-away from this is: Python and a C extension loaded by it (this is similarly relevant to C code invoked via ctypes) run in the same process, and share the underlying file descriptor for standard output. However, while Python has its own high-level wrapper around it - sys.stdout, the C code uses its own FILE object. Therefore, simply replacing sys.stdout cannot, in principle, affect output from C code. To make the replacement deeper, we have to touch something shared by the Python and C runtimes - the file descriptor. Redirecting with file descriptor duplication Without further ado, here is an improved stdout_redirector that also redirects output from C code [4]: from contextlib import contextmanager import ctypes import io import os, sys import tempfile libc = ctypes.CDLL(None) c_stdout = ctypes.c_void_p.in_dll(libc, 'stdout') @contextmanager def stdout_redirector(stream): # The original fd stdout points to. Usually 1 on POSIX systems. original_stdout_fd = sys.stdout.fileno() def _redirect_stdout(to_fd): """Redirect stdout to the given file descriptor.""" # Flush the C-level buffer stdout libc.fflush(c_stdout) # Flush and close sys.stdout - also closes the file descriptor (fd) sys.stdout.close() # Make original_stdout_fd point to the same file as to_fd os.dup2(to_fd, original_stdout_fd) # Create a new sys.stdout that points to the redirected fd sys.stdout = io.TextIOWrapper(os.fdopen(original_stdout_fd, 'wb')) # Save a copy of the original stdout fd in saved_stdout_fd saved_stdout_fd = os.dup(original_stdout_fd) try: # Create a temporary file and redirect stdout to it tfile = tempfile.TemporaryFile(mode='w+b') _redirect_stdout(tfile.fileno()) # Yield to caller, then redirect stdout back to the saved fd yield _redirect_stdout(saved_stdout_fd) # Copy contents of temporary file to the given stream tfile.flush() tfile.seek(0, io.SEEK_SET) stream.write(tfile.read()) finally: tfile.close() os.close(saved_stdout_fd) There are a lot of details here (such as managing the temporary file into which output is redirected) that may obscure the key approach: using dup and dup2 to manipulate file descriptors. These functions let us duplicate file descriptors and make any descriptor point at any file. I won't spend more time on them - go ahead and read their documentation, if you're interested. The detour section should provide enough background to understand it. Let's try this: f = io.BytesIO() with stdout_redirector(f): print('foobar') print(12) libc.puts(b'this comes from C') os.system('echo and this is from echo') print('Got stdout: "{0}"'.format(f.getvalue().decode('utf-8'))) Gives us: Got stdout: "and this is from echo this comes from C foobar 12 " Success! A few things to note: The output order may not be what we expected. This is due to buffering. If it's important to preserve order between different kinds of output (i.e. between C and Python), further work is required to disable buffering on all relevant streams. You may wonder why the output of echo was redirected at all? The answer is that file descriptors are inherited by subprocesses. Since we rigged fd 1 to point to our file instead of the standard output prior to forking to echo, this is where its output went. We use a BytesIO here. This is because on the lowest level, the file descriptors are binary. It may be possible to do the decoding when copying from the temporary file into the given stream, but that can hide problems. Python has its in-memory understanding of Unicode, but who knows what is the right encoding for data printed out from underlying C code? This is why this particular redirection approach leaves the decoding to the caller. The above also makes this code specific to Python 3. There's no magic involved, and porting to Python 2 is trivial, but some assumptions made here don't hold (such as sys.stdout being a io.TextIOWrapper). Redirecting the stdout of a child process We've just seen that the file descriptor duplication approach lets us grab the output from child processes as well. But it may not always be the most convenient way to achieve this task. In the general case, you typically use the subprocess module to launch child processes, and you may launch several such processes either in a pipe or separately. Some programs will even juggle multiple subprocesses launched this way in different threads. Moreover, while these subprocesses are running you may want to emit something to stdout and you don't want this output to be captured. So, managing the stdout file descriptor in the general case can be messy; it is also unnecessary, because there's a much simpler way. The subprocess module's swiss knife Popen class (which serve as the basis for much of the rest of the module) accepts a stdout parameter, which we can use to ask it to get access to the child's stdout: import subprocess echo_cmd = ['echo', 'this', 'comes', 'from', 'echo'] proc = subprocess.Popen(echo_cmd, stdout=subprocess.PIPE) output = proc.communicate()[0] print('Got stdout:', output) The subprocess.PIPE argument can be used to set up actual child process pipes (a la the shell), but in its simplest incarnation it captures the process's output. If you only launch a single child process at a time and are interested in its output, there's an even simpler way: output = subprocess.check_output(echo_cmd) print('Got stdout:', output) check_output will capture and return the child's standard output to you; it will also raise an exception if the child exist with a non-zero return code. Conclusion I hope I covered most of the common cases where "stdout redirection" is needed in Python. Naturally, all of the same applies to the other standard output stream - stderr. Also, I hope the background on file descriptors was sufficiently clear to explain the redirection code; squeezing this topic in such a short space is challenging. Let me know if any questions remain or if there's something I could have explained better. Finally, while it is conceptually simple, the code for the redirector is quite long; I'll be happy to hear if you find a shorter way to achieve the same effect. [1] Do not despair. As of February 2015, a sizable chunk of the worldwide Python programmers are in the same boat. [2] Note that bytes passed to puts. This being Python 3, we have to be careful since libc doesn't understand Python's unicode strings. [3] The following description focuses on Unix/POSIX systems; also, it's necessarily partial. Large book chapters have been written on this topic - I'm just trying to present some key concepts relevant to stream redirection. [4] The approach taken here is inspired by this Stack Overflow answer.

February 23, 2015

by Eli Bendersky

· 19,726 Views

Sneak Peek into the JCache API (JSR 107)

This post covers the JCache API at a high level and provides a teaser – just enough for you to (hopefully) start itching about it ;-) In this post …. JCache overview JCache API, implementations Supported (Java) platforms for JCache API Quick look at Oracle Coherence Fun stuff – Project Headlands (RESTified JCache by Adam Bien) , JCache related talks at Java One 2014, links to resources for learning more about JCache What is JCache? JCache (JSR 107) is a standard caching API for Java. It provides an API for applications to be able to create and work with in-memory cache of objects. Benefits are obvious – one does not need to concentrate on the finer details of implementing the Caching and time is better spent on the core business logic of the application. JCache components The specification itself is very compact and surprisingly intuitive. The API defines high level components (interfaces) some of which are listed below Caching Provider – used to control Caching Managers and can deal with several of them, Cache Manager – deals with create, read, destroy operations on a Cache Cache – stores entries (the actual data) and exposes CRUD interfaces to deal with the entries Entry – abstraction on top of a key-value pair akin to a java.util.Map Hierarchy of JCache API components JCache Implementations JCache defines the interfaces which of course are implemented by different vendors a.k.a Providers. Oracle Coherence Hazelcast Infinispan ehcache Reference Implementation – this is more for reference purpose rather than a production quality implementation. It is per the specification though and you can be rest assured of the fact that it does in fact pass the TCK as well From the application point of view, all that’s required is the implementation to be present in the classpath. The API also provides a way to further fine tune the properties specific to your provider via standard mechanisms. You should be able to track the list of JCache reference implementations from the JCP website link public class JCacheUsage{ public static void main(String[] args){ //bootstrap the JCache Provider CachingProvider jcacheProvider = Caching.getCachingProvider(); CacheManager jcacheManager = jcacheProvider.getCacheManager(); //configure cache MutableConfiguration jcacheConfig = new MutableConfiguration<>(); jcacheConfig.setTypes(String.class, MyPreciousObject.class); //create cache Cache cache = jcacheManager.createCache("PreciousObjectCache", jcacheConfig); //play around String key = UUID.randomUUID().toString(); cache.put(key, new MyPreciousObject()); MyPreciousObject inserted = cache.get(key); cache.remove(key); cache.get(key); //will throw javax.cache.CacheException since the key does not exist } } JCache provider detection JCache provider detection happens automatically when you only have a single JCache provider on the class path You can choose from the below options as well //set JMV level system property -Djavax.cache.spi.cachingprovider=org.ehcache.jcache.JCacheCachingProvider //code level config System.setProperty("javax.cache.spi.cachingprovider","org.ehcache.jcache.JCacheCachingProvider //you want to choose from multiple JCache providers at runtime CachingProvider ehcacheJCacheProvider = Caching.getCachingProvider("org.ehcache.jcache.JCacheCachingProvider"); //which JCache providers do I have on the classpath? Iterable jcacheProviders = Caching.getCachingProviders(); Java Platform support Compliant with Java SE 6 and above Does not define any details in terms of Java EE integration. This does not mean that it cannot be used in a Java EE environment – it’s just not standardized yet. Could not be plugged into Java EE 7 as a tried and tested standard Candidate for Java EE 8 Project Headlands: Java EE and JCache in tandem By none other than Adam Bien himself ! Java EE 7, Java SE 8 and JCache in action Exposes the JCache API via JAX-RS (REST) Uses Hazelcast as the JCache provider Highly recommended ! Oracle Coherence This post deals with high level stuff w.r.t JCache in general. However, a few lines about Oracle Coherence in general would help put things in perspective Oracle Coherence is a part of Oracle’s Cloud Application Foundation stack It is primarily an in-memory data grid solution Geared towards making applications more scalable in general What’s important to know is that from version 12.1.3 onwards, Oracle Coherence includes a reference implementation for JCache (more in the next section) JCache support in Oracle Coherence Support for JCache implies that applications can now use a standard API to access the capabilities of Oracle Coherence This is made possible by Coherence by simply providing an abstraction over its existing interfaces (NamedCache etc). Application deals with a standard interface (JCache API) and the calls to the API are delegated to the existing Coherence core library implementation Support for JCache API also means that one does not need to use Coherence specific APIs in the application resulting in vendor neutral code which equals portability How ironic – supporting a standard API and always keeping your competitors in the hunt ;-) But hey! That’s what healthy competition and quality software is all about ! Talking of healthy competition – Oracle Coherence does support a host of other features in addition to the standard JCache related capabilities. The Oracle Coherence distribution contains all the libraries for working with the JCache implementation The service definition file in the coherence-jcache.jar qualifies it as a valid JCache provider implementation Curious about Oracle Coherence ? Quick Starter page Documentation Installation Further reading about Coherence and JCache combo – Oracle Coherence documentation JCache at Java One 2014 Couple of great talks revolving around JCache at Java One 2014 Come, Code, Cache, Compute! by Steve Millidge Using the New JCache by Brian Oliver and Greg Luck Hope this was fun :-) Cheers !

February 23, 2015

by Abhishek Gupta

CORE

· 6,357 Views · 1 Like

Getting Started with Dropwizard: Authentication, Configuration and HTTPS

Basic Authentication is the simplest way to secure access to a resource.

February 10, 2015

by Dmitry Noranovich

· 48,883 Views · 1 Like

The API Gateway Pattern: Angular JS and Spring Security Part IV

Written by Dave Syer in the Spring blog In this article we continue our discussion of how to use Spring Security with Angular JS in a “single page application”. Here we show how to build an API Gateway to control the authentication and access to the backend resources using Spring Cloud. This is the fourth in a series of articles, and you can catch up on the basic building blocks of the application or build it from scratch by reading the first article, or you can just go straight to the source code in Github. In the last article we built a simple distributed application that used Spring Session to authenticate the backend resources. In this one we make the UI server into a reverse proxy to the backend resource server, fixing the issues with the last implementation (technical complexity introduced by custom token authentication), and giving us a lot of new options for controlling access from the browser client. Reminder: if you are working through this article with the sample application, be sure to clear your browser cache of cookies and HTTP Basic credentials. In Chrome the best way to do that for a single server is to open a new incognito window. Creating an API Gateway An API Gateway is a single point of entry (and control) for front end clients, which could be browser based (like the examples in this article) or mobile. The client only has to know the URL of one server, and the backend can be refactored at will with no change, which is a significant advantage. There are other advantages in terms of centralization and control: rate limiting, authentication, auditing and logging. And implementing a simple reverse proxy is really simple with Spring Cloud. If you were following along in the code, you will know that the application implementation at the end of the last article was a bit complicated, so it’s not a great place to iterate away from. There was, however, a halfway point which we could start from more easily, where the backend resource wasn’t yet secured with Spring Security. The source code for this is a separate project in Github so we are going to start from there. It has a UI server and a resource server and they are talking to each other. The resource server doesn’t have Spring Security yet so we can get the system working first and then add that layer. Declarative Reverse Proxy in One Line To turn it into an API Gateawy, the UI server needs one small tweak. Somewhere in the Spring configuration we need to add an @EnableZuulProxy annotation, e.g. in the main (only)application class: @SpringBootApplication @RestController @EnableZuulProxy public class UiApplication { ... } and in an external configuration file we need to map a local resource in the UI server to a remote one in the external configuration (“application.yml”): security: ... zuul: routes: resource: path: /resource/** url: http://localhost:9000 This says “map paths with the pattern /resource/** in this server to the same paths in the remote server at localhost:9000”. Simple and yet effective (OK so it’s 6 lines including the YAML, but you don’t always need that)! All we need to make this work is the right stuff on the classpath. For that purpose we have a few new lines in our Maven POM: org.springframework.cloud spring-cloud-starter-parent 1.0.0.BUILD-SNAPSHOT pom import org.springframework.cloud spring-cloud-starter-zuul ... Note the use of the “spring-cloud-starter-zuul” - it’s a starter POM just like the Spring Boot ones, but it governs the dependencies we need for this Zuul proxy. We are also using because we want to be able to depend on all the versions of transitive dependencies being correct. Consuming the Proxy in the Client With those changes in place our application still works, but we haven’t actually used the new proxy yet until we modify the client. Fortunately that’s trivial. We just need to go from this implementation of the “home” controller: angular.module('hello', [ 'ngRoute' ]) ... .controller('home', function($scope, $http) { $http.get('http://localhost:9000/').success(function(data) { $scope.greeting = data; }) }); to a local resource: angular.module('hello', [ 'ngRoute' ]) ... .controller('home', function($scope, $http) { $http.get('resource/').success(function(data) { $scope.greeting = data; }) }); Now when we fire up the servers everything is working and the requests are being proxied through the UI (API Gateway) to the resource server. Further Simplifications Even better: we don’t need the CORS filter any more in the resource server. We threw that one together pretty quickly anyway, and it should have been a red light that we had to do anything as technically focused by hand (especially where it concerns security). Fortunately it is now redundant, so we can just throw it away, and go back to sleeping at night! Securing the Resource Server You might remember in the intermediate state that we started from there is no security in place for the resource server. Aside: Lack of software security might not even be a problem if your network architecture mirrors the application architecture (you can just make the resource server physically inaccessible to anyone but the UI server). As a simple demonstration of that we can make the resource server only accessible on localhost. Just add this to application.properties in the resource server: server.address: 127.0.0.1 Wow, that was easy! Do that with a network address that’s only visible in your data center and you have a security solution that works for all resource servers and all user desktops. Suppose that we decide we do need security at the software level (quite likely for a number of reasons). That’s not going to be a problem, because all we need to do is add Spring Security as a dependency (in the resource server POM): org.springframework.boot spring-boot-starter-security That’s enough to get us a secure resource server, but it won’t get us a working application yet, for the same reason that it didn’t in Part III: there is no shared authentication state between the two servers. Sharing Authentication State We can use the same mechanism to share authentication (and CSRF) state as we did in the last, i.e. Spring Session. We add the dependency to both servers as before: org.springframework.session spring-session 1.0.0.RELEASE org.springframework.boot spring-boot-starter-redis but this time the configuration is much simpler because we can just add the same Filterdeclaration to both. First the UI server (adding @EnableRedisHttpSession): @SpringBootApplication @RestController @EnableZuulProxy @EnableRedisHttpSession public class UiApplication { ... } and then the resource server. There are two changes to make: one is adding@EnableRedisHttpSession and a HeaderHttpSessionStrategy bean to theResourceApplication: @SpringBootApplication @RestController @EnableRedisHttpSession class ResourceApplication { ... @Bean HeaderHttpSessionStrategy sessionStrategy() { new HeaderHttpSessionStrategy(); } } and the other is to explicitly ask for a non-stateless session creation policy inapplication.properties: security.sessions: NEVER As long as redis is still running in the background (use the fig.yml if you like to start it) then the system will work. Load the homepage for the UI at http://localhost:8080 and login and you will see the message from the backend rendered on the homepage. How Does it Work? What is going on behind the scenes now? First we can look at the HTTP requests in the UI server (and API Gateway): VERB PATH STATUS RESPONSE GET / 200 index.html GET /css/angular-bootstrap.css 200 Twitter bootstrap CSS GET /js/angular-bootstrap.js 200 Bootstrap and Angular JS GET /js/hello.js 200 Application logic GET /user 302 Redirect to login page GET /login 200 Whitelabel login page (ignored) GET /resource 302 Redirect to login page GET /login 200 Whitelabel login page (ignored) GET /login.html 200 Angular login form partial POST /login 302 Redirect to home page (ignored) GET /user 200 JSON authenticated user GET /resource 200 (Proxied) JSON greeting That’s identical to the sequence at the end of Part II except for the fact that the cookie names are slightly different (“SESSION” instead of “JSESSIONID”) because we are using Spring Session. But the architecture is different and that last request to “/resource” is special because it was proxied to the resource server. We can see the reverse proxy in action by looking at the “/trace” endpoint in the UI server (from Spring Boot Actuator, which we added with the Spring Cloud dependencies). Go tohttp://localhost:8080/trace in a browser and scroll to the end (if you don’t have one already get a JSON plugin for your browser to make it nice and readable). You will need to authenticate with HTTP Basic (browser popup), but the same credentials are valid as for your login form. At or near the end you should see a pair of requests something like this: { "timestamp": 1420558194546, "info": { "method": "GET", "path": "/", "query": "" "remote": true, "proxy": "resource", "headers": { "request": { "accept": "application/json, text/plain, */*", "x-xsrf-token": "542c7005-309c-4f50-8a1d-d6c74afe8260", "cookie": "SESSION=c18846b5-f805-4679-9820-cd13bd83be67; XSRF-TOKEN=542c7005-309c-4f50-8a1d-d6c74afe8260", "x-forwarded-prefix": "/resource", "x-forwarded-host": "localhost:8080" }, "response": { "Content-Type": "application/json;charset=UTF-8", "status": "200" } }, } }, { "timestamp": 1420558200232, "info": { "method": "GET", "path": "/resource/", "headers": { "request": { "host": "localhost:8080", "accept": "application/json, text/plain, */*", "x-xsrf-token": "542c7005-309c-4f50-8a1d-d6c74afe8260", "cookie": "SESSION=c18846b5-f805-4679-9820-cd13bd83be67; XSRF-TOKEN=542c7005-309c-4f50-8a1d-d6c74afe8260" }, "response": { "Content-Type": "application/json;charset=UTF-8", "status": "200" } } } }, The second entry there is the request from the client to the gateway on “/resource” and you can see the cookies (added by the browser) and the CSRF header (added by Angular as discussed inPart II). The first entry has remote: true and that means it’s tracing the call to the resource server. You can see it went out to a uri path “/” and you can see that (crucially) the cookies and CSRF headers have been sent too. Without Spring Session these headers would be meaningless to the resource server, but the way we have set it up it can now use those headers to re-constitute a session with authentication and CSRF token data. So the request is permitted and we are in business! Conclusion We covered quite a lot in this article but we got to a really nice place where there is a minimal amount of boilerplate code in our two servers, they are both nicely secure and the user experience isn’t compromised. That alone would be a reason to use the API Gateway pattern, but really we have only scratched the surface of what that might be used for (Netflix uses it for a lot of things). Read up on Spring Cloud to find out more on how to make it easy to add more features to the gateway. The next article in this series will extend the application architecture a bit by extracting the authentication responsibilities to a separate server (the Single Sign On pattern).

February 9, 2015

by Pieter Humphrey

· 16,349 Views

Introducing the Database Selection Matrix

Originally Written by Mat Keep For the better part of a generation, the database landscape had changed very little. No one could say “this is not your father’s database.” They had become, in a word, boring. Then a combination of factors catalyzed an era of innovation in database technologies: cheap storage and compute resources; pervasive connectivity; social networks; smartphones; the proliferation of sensors; open source software. Data volumes grew (and are growing) at exponential rates. Over 80% of today’s data no longer fits neatly into the normalized row and column table formats of the past. And so developers began engineering solutions to a new set of problems with a very different set of resources and assumptions. Today these new options include a variety of database architectures built around diverse data models – from key-value to document to wide-column and graph. And of course you still have the option of the venerable relational database. For the enterprise these new technologies hold great promise. They open the door to new applications that could not be imagined before, or to more efficiently solve existing problems. They attract new technical talent. They facilitate the migration of systems to more cost effective infrastructure based on commodity hardware and cloud platforms. But at the same time, evaluation of these new options requires careful consideration. Selecting the appropriate database for a new project requires evaluation against multiple criteria, including: Development considerations: includes the data model, query functionality, available drivers, data consistency. These factors dictate the functionality of your application, and how quickly you can build it. Operational considerations: performance and scalability, high availability, data center awareness, security, management and backups. Over the application’s lifetime, operational costs will contribute a significant percentage to the project’s Total Cost of Ownership (TCO), and so these factors constitute your ability to meet SLAs while minimizing administrative overhead. Commercial considerations: licensing, pricing and support. You need to know that the database you choose is available in a way that is aligned with how you do business. Each these considerations need to be evaluated in context of specific application requirements as well as internal technology standards, skills availability and integration with your existing enterprise architecture. So, where to start? The Database Selection Matrix is designed to serve as a decision framework by teams responsible for database selection. It has been developed in collaboration with several large enterprises who have the choice of running multiple databases in production, and who wanted to institute a systematic methodology for database evaluation. Responses to questions in the matrix helped them identify key requirements and guide selection. And it can do the same for you. Lets illustrate how the Database Selection Matrix can be used by working through a practical example. The Database Selection Matrix in Action! ACME Retail Corporation runs a large vehicle fleet to distribute produce to its nationwide network of stores. The CEO is intent on improving distribution efficiency and so tasks her enterprise architects to build a new platform that can utilize sensor data generated by the company’s trucks. By capturing and analyzing this data, the organization believes it can optimize route planning, improve delivery times, cut wastage and reduce business interruptions caused by breakdowns. ACME Retail Corp is typical of many enterprises that see the opportunity to unlock new efficiencies by leveraging the “Internet of Things”. As Morgan Stanley stated in the “Internet of Things is Now” research “We do not believe traditional data storage architectures are well- suited to accommodate the volume, velocity, and variety of IoT data”. For this reason, enterprises are looking beyond traditional RDBMS technology to the swathe of new database options available to them. Bosch SI did exactly this when it took the decision to use MongoDB to power the Bosch IoT Suite. Of course, MongoDB may not be a perfect fit for every IoT project. There are many choices available – as there for every new type of project – and the ACME architecture group needs a way to navigate the complex landscape of modern databases. Using the Database Selection Matrix, they have the framework to ask the key questions that will guide their technology decisions. So lets put it into practice. Development Considerations In this opening phase, the architects need to evaluate how their shortlisted database options meet the functional requirements of the app that is being built. This is impacted directly by multiple factors – and these are the questions they will need to ask. The Data Model: Will the application need to handle data of varying structure and types? How large can each data type be – is our data made up of simple integers, strings and timestamps or can it also be large binary files such as images or videos? Can our data just be represented as a set of opaque values, or does it need to be typed so other applications can make sense of it? Do we know the data structure will remain constant, or will it vary as we introduce new sensor data and as the business updates application requirements? Does the application require its data to be strongly consistent (i.e. read our own writes), or can eventually consistent data be tolerated (and do our developers know how to handle the complexity it introduces?). Do we end up trading performance and availability if we configure the database to only return the freshest data? The Query Model: What sort of queries are we going to run against the database? Is it simple key-value lookups that we know in advance or do we need to execute ad-hoc queries and complex aggregations to support real-time analytics that the business wants to see? Can we run analytics directly against the database, or do we need to replicate data to dedicated search or analytics engines? Will the application be handling geospatial queries and text search? Does the data need to be integrated with our BI & analytics tools, and what about our new Hadoop cluster, or the data warehouse? Which languages will our engineers be using to develop the application, and does the database have drivers available for them? Operational Considerations In this second phase, the ACME Retail architects need to evaluate how each database would run in production. No-one wants to hand-feed a custom technology, so they need to understand if the database can meet the availability, scalability and security needs of the business, and interoperate with the existing management frameworks. Service Availability: What is the application’s availability SLA? What are our RTO and RPO objectives? Will our operations teams manage failure recovery, or is this something that should be fully automated by the database? What capabilities does the database offer to maintain availability during routine maintenance? Are there tools available to manage this or do we need to script something ourselves? Are there specific requirements to replicate data between our data centers to support disaster recovery? Scalability: How do we expect this application to grow? Will the database need to scale beyond the limits of just a few servers? If data is to be distributed across multiple nodes, will it be partitioned in such a way that it is still optimized for the application’s query patterns? Do we need to scale this across data centers? Can we write and read data locally to reduce the effects of geographic latency? Can we scale storage capacity and I/O by compressing the data, and are different compression algorithms available to optimize compression ratio to CPU overhead? Security: What types of data access control do we need? Can we just use authentication controls within the database or do we need to integrate with our existing LDAP infrastructure? What type of authorization controls are available, and how granular can we get? Do these controls needs to extend down to the level of individual attributes within a document? Is encryption needed, and will those pesky compliance officers need to audit every action taken against the database? Administration: How are going to run this thing? Does the database provide tools to automate provisioning and upgrades or do we need to create our own scripts? How about backups? Can we get incremental backups. How about point in time backups? And then monitoring. We need to know, for example, if disk utilization is peaking above 60% so we can take action before we hit a problem. Can we add these alerts into our existing operational workflow tools? Can we integrate the database’s management platform into our own operational tooling so we don’t need to leave our single screen? Commercial Considerations Once the ACME architects have profiled their technology requirements, they will need to understand how the database is licensed and priced, before legal and procurement come knocking at the door: Licensing: what license is used, and is this acceptable to our legal team? Are commercial licenses available? Support: What support options are open to me? Can I get support SLAs from my vendor, even if I use a community version of their product? What is the SLA I can expect if I do hit an issue? What sort of training is available? Is my only option to send my engineers to public classes, or can we get trained on-demand, at our own pace? What’s Next? The ACME example is designed to illustrate some of the key questions engineering teams need to ask. It is true that the database landscape is more complex than ever. But it needn’t be bewildering – the Database Selection Matrix is designed to help you identify and compare what is most critical as you build your next app, so go ahead and download it now. Looking for additional information about database selection? Learn why organizations choose MongoDB to deliver applications and outcomes that were never previously possible. Download the white paper below: THE VALUE OF DATABASE SELECTION

February 9, 2015

by Francesca Krihely

· 11,450 Views · 1 Like

NetBeans in the Classroom: MySQL JDBC Connection Pool & JDBC Resource for GlassFish

This tutorial assumes that you have installed the Java EE version of NetBeans 8.02. It is further assumed that you have replaced the default GlassFish server instance in NetBeans with a new instance with its own folder for the domain. This is required for all platforms. See my previous article “Creating a New Instance of GlassFish in NetBeans IDE” The other day I presented to my students the steps necessary to make GlassFish responsible for JDBC connections. As I went through the steps I realized that I needed to record the steps for my students to reference. Here then are these steps. Steps 1 through 4 are required, Step 5 is optional. Step 1a: Manually Adding the MySQL driver to the GlassFish Domain Before we even start NetBeans we must do a bit of preliminary work. With the exception of Derby, GlassFish does not include the MySQL driver or any other driver in its distribution. Go to the MySql Connector/J download site at http://dev.mysql.com/downloads/connector/j/ and download the latest version. I recommend downloading the Platform Independent version. If your OS is Windows download the ZIP archive otherwise download the TAR archive. You are looking for the driver file named mysql-connector-java-5.1.34-bin.jar in the archive. Copy the driver file to the lib folder in the directory where you placed your domain. On my system the folder is located at C:\Users\Ken\personal_domain\lib. If GlassFish is already running then you will have to restart it so that it picks up the new library. Step 1b: Automatically Adding the MySQL driver to the GlassFish Domain NetBeans has a feature that deploys the database driver to the domain’s lib folder if that driver is in NetBeans’ folder of drivers. On my Windows 8.1 system the MySQL driver can be found in C:\Program Files\NetBeans 8.0.2\ide\modules\ext. Start NetBeans and go to the Services tab, expand Servers and right mouse click on your GlassFish Server. Click on Properties and the Servers dialog will appear. On this dialog you will see a check box labelled Enable JDBC Driver Deployment. By default it is checked. NetBeans determines the driver to copy to GlassFish from the file glassfish-resources.xml that we will create in Step 4 of this tutorial. Without this file and if you have not copied the driver into GlassFish manually then GlassFish will not be able to connect to the database. Any code in your web application will not work and all you will likely see are blank pages. Step 1a or Step 1b? I recommend Step 1a and manually add the driver. The reason I prefer this approach is that I can be certain that the most recent driver is in use. As of this writing NetBeans contains version 5.1.23 of the connector but the current version is 5.1.34. If you copy a driver into the lib folder then NetBeans will not replace it with an older driver even if the check box on the Server dialog is checked. NetBeans does not replace a driver if one is already in place. If you need a driver that NetBeans does have a copy of then Step 1b is your only choice. Step 2: Create a Database Connection in NetBeans One feature I have always liked in NetBeans is that it has an interface for working with databases. All that is required is that you create a connection to the database. It also has additional features for managing a MySQL server but we won’t need those. If you have not already started your MySQL DBMS then do that now. I assume that the database you wish to connect to already exists. Go to the Services tab and right mouse click on New Connection. In the next dialog you must choose the database driver you wish to use. It defaults to Java DB (Embedded). Pull down the combobox labeled Driver: and select MySQL (Connector/J driver). Click on Next and you will now see the Customize Connection dialog. Here you can enter the details of the connection. On my system the server is localhost and the database name is Aquarium. Here is what my dialog looks like. Notice the Test Connection button. I have clicked on mine and so I have the message Connection Succeeded. Click on Next. There is nothing to do on this dialog so click on Next. On this last dialog you have the option of assigning a name to the connection. By default it uses the URL but I prefer a more meaningful name. I have used AquariumMySQL. Click on Finish and the connection will appear under Databases. If the icon next to AquariumMySQL has what looks like a crack in it similar to the jdbc:derby connection then this means that a connection to the database could not be made. Verify that the database is running and is accessible. If it is then delete the connection and start over. Having a connection to the database in NetBeans is invaluable. You can interact with the database directly and issue SQL commands. As a MySQL user this means that I do not need to run the MySQL command line program to interact with the database. Step 3: Create a Web Application Project in NetBeans If you have not already done so create a New Project in NetBeans. I require my students to create a New Project in the Maven category of a Web Application project. Click on Next. In this dialog you can give the project a name and a location in your file system. The Artifact Id, Group Id and Version are used by Maven. The final dialog lets you select the application server that your application will use and the version of Java EE that your code must be compliant with. Here is my project ready for the next step. Step 4: Create the GlassFish JDBC Resource For GlassFish to manage your database connection you need to set up two resources, a JDBC Connection Pool and a JDBC Resource. You can create both in one step by creating a GlassFish JDBC Resource because you can create the Connection Pool as part of the same operation. Right mouse click on the project name and select New and then Other … Scroll down the Categories list and select GlassFish. In the File Types list select JDBC Resource. Click on Next. The next dialog is the General Attributes. Click on the radio button for Create New JDBC Connection Pool. In the text field JNDI Name enter a name that is unique for the project. JNDI names for connection resources always begin with jdbc/ followed by a name that starts with a lower case letter. I have used jdbc/myAquarium. Do not prefix the name with java:app/ as some tutorials suggest. An upcoming article will explain why. Click on Next. There is nothing for us to enter on the Properties dialog. Click on Next. On the Choose Database Connection dialog we will give our connection pool a name and select the database connection we created in Step 2. Notice that in the list of available connections you are shown the connection URL and not the name you assigned to it back in Step 2. Click on Next. On the Add Connection Pool Properties dialog you will see the connection URL and the user name and password. We do need to make one change. The resource type shows javax.sql.DataSource and we must change it to javax.sql.ConnectionPoolDataSource. Click on Next. There is nothing we need to change on Add Connection Pool Optional Properties so click on Finish. A new folder has appeared in the Projects view named Other Sources. It contains a sub folder named setup. In this folder is the file glassfish-resources.xml. The glassfish-resources.xml file will contain the following. I have reformatted the file for easier viewing. OPTIONAL Step 5: Configure GlassFish with glassfish-resources.xml The glassfish-resources.xml file, when included in the application’s WAR file in the WEB-INF folder, can configure the resource and pool for the application when it is deployed in GlassFish. When the application is un-deployed the resource and pool are removed. If you want to set up the resource and pool permanently in GlassFish then follow these steps. Go to the Services tab and select Servers and then right mouse click on GlassFish. If GlassFish is not running then click on Start. With the server started click on View Domain Admin Console. Your web browser will now open and show you the GlassFish console. If you assigned a user name and password to the server you will have to enter this information before you see the console. In the Common Tasks tree select Resources. You should now see in the panel adjacent to the tree the following: Click on Add Resources. You should now see: In the Location click on Choose File and locate your glassfish-resources.xml file. Mine is found at D:\NetBeansProjects\GlassFishTutorial\src\main\setup. You should now see: Click on OK. If everything has gone well you should see: The final task in this step is to test if the connection works. In the Common Tasks tree select Resources, JDBC, JDBC Connection Pools and aquariumPool. Click on Ping. You should see: The most common reason for the Ping to fail is that the database driver is not in the domain’s lib folder. Go to Step 1a and manually add the driver. The resources are now visible in NetBeans. Having the resource and pool add to GlassFish permanently will allow other applications to share this same resource and pool. You are now ready to code!

February 9, 2015

by Ken Fogel

· 52,720 Views · 3 Likes

Microservices: Five Architectural Constraints

Microservices is a new software architecture and delivery paradigm, where applications are composed of several small runtime services. The current mainstream approach for software delivery is to build, integrate, and test entire applications as a monolith. This approach requires any software change, however small, to require a full test cycle of the entire application. With Microservices a software module is delivered as an independent runtime service with a well defined API. The Microservices approach allow faster delivery of smaller incremental changes to an application. There are several tradeoffs to consider with the Microservices architecture. On one hand, the Microservices approach builds on several best practices and patterns for software design, architecture, and DevOps style organization. On the other hand, Microservices requires expertise in distributed programming and can become an operational nightmare without proper tooling in place. There are several good posts that highlight the pros-and-cons of Microservices, and I have added in the references section. In the remainder of this post, I will define five architectural constraints (principles that drive desired properties) for the Microservices architectural style. To be a Microservice, a service must be: Elastic Resilient Composable Minimal, and; Complete Microservice Constraint #1 - Elastic A microservice must be able to scale, up or down, independently of other services in the same application. This constraint implies that based on load, or other factors, you can fine tune your applications performance, availability, and resource usage. This constraint can be realized in different ways, but a popular pattern is to architect the system so that you can run multiple stateless instances of each microservice, and there is a mechanism for Service naming, registration, and discovery along with routing and load-balancing of requests. Microservice Constraint #2 - Resilient A microservice must fail without impacting other services in the same application. A failure of a single service instance should have minimal impact on the application. A failure of all instances of a microservice, should only impact a single application function and users should be able to continue using the rest of the application without impact. Adrian Cockroft describes Microservices as loosely coupled service oriented architecture with bounded contexts [3]. To be resilient a service has to be loosely coupled with other services, and a bounded context limits a service’s failure domain. Microservice Constraint #3 - Composable A microservice must offer an interface that is uniform and is designed to support service composition. Microservice APIs should be designed with a common way of identifying, representing, and manipulating resources, describing the API schema and supported API operations. The ‘Uniform Interfaces constraint of the REST architectural style describes this in detail. Service Composition is a SOA principle that has fairly obvious benefits, but few guidelines on how it can be achieved. A Microservice interface should be designed to support composition patterns like aggregation, linking, and higher-level functions such as caching, proxies and gateways. I previously discussed REST constraints and elements in as two part blog post: REST is not about APIs Microservice Constraint #4 - Minimal A microservice must only contain highly cohesive entities In software, cohesion is a measure of whether things belong together. A module is said to have high cohesion if all objects and functions in it are focused on the same tasks. Higher cohesion leads to more maintainable software. A Microservice should perform a single business function, which implies that all of its components are highly cohesive. This is also an Single Responsibility Principle (SRP) of object-oriented design [5] Microservice Constraint #5 - Complete A microservice must be functionally complete Bjarne Stroustrup, the creator of C++, stated that a good interface must be, “minimal but complete” i.e. as small as possible, and no smaller. Similarly, a Microservice must offer a complete function, with minimal dependencies (loose coupling) to other services in the application. This is important, as otherwise its becomes impossible to version and upgrade individual services. This constraint is designed to oppose the minimal constraint. Put together a microservice must be “minimal but complete.” Conclusions Designing a Microservices application requires application of several principles, patterns, and best practices of modular design and service-oriented architectures. In this post, I've outlined five architectural constraints which can help guide and retain the key benefits of a Microservices-style architecture. For example, Microservices Constraint# 1 - Elastic steers implementations towards separating the data tier from the application tier, and leads to stateless services. At Nirmata we have built our solution, that makes it easy to deploy and operate microservices applications, using these very same principles. We believe that Microservices style applications, running in containers, will power the next generation of software innovation. If you are using, or interested in using microservices, I would love to hear from you. Jim Bugwadia Founder and CEO Nirmata -- For additional content and articles follow us at @NirmataCloud. -- If you are in the San Francisco Bay Area, come join our Microservices meetup group. References [1] Microservices, Martin Fowler and James Lewis, http://martinfowler.com/articles/microservices.html [2] Microservices Are Not a free lunch!, Benjamin Wootton, http://contino.co.uk/microservices-not-a-free-lunch/ [3] State of the Art in Microservices, Adrian Cockroft, http://thenewstack.io/dockercon-europe-adrian-cockcroft-on-the-state-of-microservices/ [4] The Principles of Object-Oriented Design, Robert C. Martin, http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod

February 5, 2015

by Jim Bugwadia

· 13,307 Views · 7 Likes

Dropwizard vs Spring Boot—A Comparison Matrix

Of late, I have been looking into Microservice containers that are available out there to help speed up the development. Although, Microservice is a generic term however there is some consensus with respect to what it means. Hence, we may conveniently refer to the definition Microservice as an "architectural design pattern, in which complex applications are composed of small, independent processes communicating with each other using language-agnostic APIs. These services are small, highly decoupled and focus on doing a small task." There are several Microservice containers out there. However, in my experience I have found Dropwizard and Spring-boot to have had received more attention and they appear to be widely used compared to the rest. In my current role, I was asked create a comparison matrix between the two, so it's here below. Dropwizard Spring-Boot What is it? Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done. [more...] Takes an opinionated view of building production-ready Spring applications. Spring Boot favours convention over configuration and is designed to get you up and running as quickly as possible. [more...] Overview? Dropwizard straddles the line between being a library and a framework. Provide performant, reliable implementations of everything a production-ready web application needs. [more...] Spring-boot takes an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration. [more...] Out of the box features? Dropwizard has out-of-the-box support for sophisticated configuration, application metrics, logging, operational tools, and much more, allowing you and your team to ship a production-quality web service in the shortest time possible. [more...] Spring-boot provides a range of non-functional features that are common to large classes of projects (e.g. embedded servers, security, metrics, health checks, externalized configuration). [more...] Libraries Core: Jetty, Jersey, Jackson and Matrics Others: Guava, Liquibase and Joda Time. Spring, JUnit, Logback, Guava. There are several starter POM files covering various use cases, which can be included in the POM to get started. Dependency Injection? No built in Dependency Injection. Requires a 3rd party dependency injection framework such as Guice, CDI or Dagger. [Ref...] Built in Dependency Injection provided by Spring Dependency Injection container. [Ref...] Types of Services i.e. REST, SOAP Has some support for other types of services but primarily is designed for performant HTTP/REST LAYER. If ever need to integrate SOAP, there is a dropwizard bundle for building SOAP web services using JAX-WS API is provided here but it’s not official drop-wizard sub project. [more...] As well as supporting REST Spring-boot has support for other types of services such as JMS, Advanced Message Queuing Protocol, SOAP based Web Services to name a few. [more...] Deployment? How it creates the Executable Jar? Uses Shading to build executable fat jars, where a shaded jar spackages all classes, from all jars, into a single 'uber jar'. [Ref...] Spring-boot adopts a different approach and avoids shaded jars, as it becomes hard to see which libraries you are actually using in your application. It can also be problematic if the same filename is used in Shaded jars. Instead it uses “Nested Jar” approach where all classes from all jars do not need to be included into a single “uber jar” instead all dependent jars should be in the “lib” folder, spring loader loads them appropriately. [Ref...] Contract First Web Services? No built in support. Would have to refer to 3rd party library (CXF or any other JAX-WS implementation) if needed a solution for the Contract First SOAP based services. Contract First services support is available with the help of spring-boot-starter-ws starter application. [Ref...] Externalised Configuration for properties and YAML Supports both Properties and YAML Supports both Properties and YAML Concluding Remarks If dealing with only REST micro services, drop wizard is an excellent choice. Where Spring-boot shines is the types of services supported i.e. REST, JMS, Messaging, and Contract First Services. Not least a fully built in Dependency Injection container. Disclaimer: The matrix is purely based on my personal views and experiences, having tried both frameworks and is by no means an exhaustive guide. Readers are requested to do their own research before making a strategic decision between the two very formidable frameworks.

February 2, 2015

by Rizwan Ullah

· 74,169 Views · 9 Likes

Why Customers Choose Datical DB to Automate Database Deployments

Over the past year, Datical has had amazing success with our flagship product, Datical DB. We’ve seen multiple visionary, sector-leading companies select Datical DB to drive their Application Schema changes. Now that the number has grown rapidly over the past year, we can begin to see patterns in why customers choose Datical DB. One of them turns out to be pretty emblematic of our other customers. So, let’s examine the reasons why they chose to adopt Datical DB. Customer Facing Applications are the Front Door When your competitor is only a mobile screen swipe away, this Datical customer focuses on brand reputation and customer satisfaction. They know that application uptime and fast delivery are key to customer retention and account expansion. All applications have a database backend. Though, when something goes wrong with the database, it is not apparent to the consumer. The consumer just knows that the app isn’t working. An unresponsive mobile app or website is often enough to make the customer take their business elsewhere. Customer stickiness due to the hassle of changing providers continues to lessen as the cost to change continues to drop. Therefore, immediate and fast access is a must have for today’s companies. Remember: Consumer facing applications don’t have business hours. They must ALWAYS be open. Cross Team Collaboration This Datical customer had difficulty in determining who made what change to the database. Moreover, answering which database and why was near impossible. All of the changes were stored in a single document. That solution is not multi-tenant, meaning that the team had to distribute the document to share information. Moreover, there was a single point of failure in the tracking process. Too often, people were required to visit the datacenter for days at a time during application changes. There was simply no method to notify team members of changes, when they occurred, and the change impact. This lack of communication led to almost 80% of all database change requests being rejected. There was clearly a need to increase communication of changes and increase the requested changes’ quality. Increase Staff Productivity With over 70% of the Database team’s time spent on managing change due to application requirements, the Database team was taxed in meeting other demands. Needs to improve scalability and reliability of the database servers became a lower priority as the DBAs struggled to keep up with change requests. Furthermore, development teams spent almost 10% of their time managing these change requests, including reworks of failed requests. By eliminating a manual, time-consuming process, this customer can now focus resources on addressing other needs such as managing continuity during server failure, better allocation of server resources, and certainly scalability concerns. Go Faster With the adoption of IBM UrbanCode Deploy, this customer quickly streamlined their deployments with all but the database automated. With all other components automated, the database changes required by application changes was clearly the weak link in the deployment chain. Truly, database automation is the last mile necessary to realize the promise of Agile, Continuous Delivery, and DevOps. Until the customer was able to apply Agile and Continuous Delivery to database changes, the entire application stack was, in effect, still using out-of-date development and deployment methods. To see the complete benefits of Agile and Continuous Delivery, automation throughout the entire stack, including the database, was absolutely necessary. Leverage Existing Infrastructure and Processes With out of the box integration with UrbanCode Deploy, Datical DB did not require an independent separate server. Furthermore, with Datical DB’s ability to utilize the customer’s existing source code control repository, the implementation cycle was shortened significantly. Too often, customers are asked by vendors to spend money on more server resources or make strange, unnatural changes to their network security to support potential solutions. By utilizing existing infrastructure and processes, Datical DB was able to deliver to this customer lightening quick ROI. We measure ROI in days and weeks, not months and years. Please join us for a webcast next Wednesday, February 4th, from 12:00 – 1:00 pm EST, as we discuss these customer benefits in detail and show how Datical DB integrates seamlessly with IBM UrbanCode Deploy.

January 30, 2015

by Robert Reeves

· 8,406 Views

Bulk Data Insertion into Oracle Database in C#

Bulk insertion of data into database table is a big overhead in application development.

January 28, 2015

by Ayobami Adewole

· 44,844 Views

Importing Big Tables With Large Indexes With Myloader MySQL Tool

originally written by david ducos mydumper is known as the faster (much faster) mysqldump alternative. so, if you take a logical backup you will choose mydumper instead of mysqldump. but what about the restore? well, who needs to restore a logical backup? it takes ages! even with myloader. but this could change just a bit if we are able to take advantage of fast index creation. as you probably know, mydumper and mysqldump export the struct of a table, with all the indexes and the constraints, and of course, the data. then, myloader and mysql import the struct of the table and import the data. the most important difference is that you can configure myloader to import the data using a certain amount of threads. the import steps are: create the complete struct of the table import the data when you execute myloader, internally it first creates the tables executing the “-schema.sql” files and then takes all the filenames without “schema.sql” and puts them in a task queue. every thread takes a filename from the queue, which actually is a chunk of the table, and executes it. when finished it takes another chunk from the queue, but if the queue is empty it just ends. this import procedure works fast for small tables, but with big tables with large indexes the inserts are getting slower caused by the overhead of insert the new values in secondary indexes. another way to import the data is: split the table structure into table creation with primary key, indexes creation and constraint creation create tables with primary key per table do: load the data create index create constraints this import procedure is implemented in a branch of myloader that can be downloaded from here or directly executing bzr with the repository: bzr branch lp:~david-ducos/mydumper/mydumper the tool reads the schema files and splits them into three separate statements which create the tables with the primary key, the indexes and the constraints. the primary key is kept in the table creation in order to avoid the recreation of the table when a primary key is added and the “key” and “constraint” lines are removed. these lines are added to the index and constraint statements, respectively. it processes tables according to their size starting with the largest because creating the indexes of a big table could take hours and is single-threaded. while we cannot process other indexes at the time, we are potentially able to create other tables with the remaining threads. it has a new thread (monitor_process) that decides which chunk of data will be put in the task queue and a communication queue which is used by the task processes to tell the monitor_process which chunk has been completed. i run multiple imports on an aws m1.xlarge machine with one table comparing myloader and this branch and i found that with large indexes the times were: as you can see, when you have less than 150m rows, import the data and then create the indexes is higher than import the table with the indexes all at once. but everything changes after 150m rows, import 200m takes 64 minutes more for myloader but just 24 minutes for the new branch. on a table of 200m rows with a integer primary key and 9 integer columns, you will see how the time increases as the index gets larger: where: 2-2-0: two 1-column and two 2-column index 2-2-1: two 1-column, two 2-column and one 3-column index 2-3-1: two 1-column, three 2-column and one 3-column index 2-3-2: two 1-column, three 2-column and two 3-column index conclusion this branch can only import all the tables with this same strategy, but with this new logic in myloader, in a future version it could be able to import each table with the best strategy reducing the time of the restore considerably.

January 27, 2015

by Peter Zaitsev

· 5,292 Views

A very quick guide to deadlock diagnosis in SQL Server

Recently I was asked about diagnosing deadlocks in SQL Server – I’ve done a lot of work in this area way back in 2008, so I figure it’s time for a refresher. If there’s a lot of interest in exploring SQL Server and deadlocks further, I’m happy to write an extended article going into far more detail. Just let me know. Before we get into diagnosis and investigation, it’s a good time to pose the question: “what is a deadlock?”: From TechNet: A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource which the other tasks are trying to lock. The following graph presents a high level view of a deadlock state where: Task T1 has a lock on resource R1 (indicated by the arrow from R1 to T1) and has requested a lock on resource R2 (indicated by the arrow from T1 to R2). Task T2 has a lock on resource R2 (indicated by the arrow from R2 to T2) and has requested a lock on resource R1 (indicated by the arrow from T2 to R1). Because neither task can continue until a resource is available and neither resource can be released until a task continues, a deadlock state exists. The SQL Server Database Engine automatically detects deadlock cycles within SQL Server. The Database Engine chooses one of the sessions as a deadlock victim and the current transaction is terminated with an error to break the deadlock. Basically, it’s a resource contention issue which blocks one process or transaction from performing actions on resources within SQL Server. This can be a serious condition, not just for SQL Server as processes become suspended, but for the applications which rely on SQL Server as well. The T-SQL Approach A fast way to respond is to execute a bit of T-SQL on SQL Server, making use of System Views. The following T-SQL will show you the “victim” processes, much like activity monitor does: select * from sys.sysprocesses where blocked > 0 Which is not particularly useful (but good to know, so you can see the blocked count). To get to the heart of the deadlock, this is what you want (courtesy of this SO question/answer): SELECT Blocker.text –, Blocker.*, * FROM sys.dm_exec_connections AS Conns INNER JOIN sys.dm_exec_requests AS BlockedReqs ON Conns.session_id = BlockedReqs.blocking_session_id INNER JOIN sys.dm_os_waiting_tasks AS w ON BlockedReqs.session_id = w.session_id CROSS APPLY sys.dm_exec_sql_text(Conns.most_recent_sql_handle) AS Blocker This will show you line and verse (the actual statement causing the resource block) – see the attached screenshot for an example. However, the generally accepted way to determine and diagnose deadlocks is through the use of SQL Server trace flags. SQL Trace Flags They are (usually) set temporarily, and they cause deadlocking information to be dumped to the SQL management logs. The flags that are useful are flags 1204 and 1222. From TechNet: https://technet.microsoft.com/en-us/library/ms178104%28v=sql.105%29.aspx Trace flags are set on or off by using either of the following methods: · Using the DBCC TRACEON and DBCC TRACEOFF commands. For example, DBCC TRACEON 2528: To enable the trace flag globally, use DBCC TRACEON with the -1 argument: DBCC TRACEON (2528, -1). To turn off a global trace flag, use DBCC TRACEOFF with the -1 argument. · Using the -T startup option to specify that the trace flag be set on during startup. The -T startup option enables a trace flag globally. You cannot enable a session-level trace flag by using a startup option. So to enable or disable deadlock trace flags globally, you’d use the following T-SQL: DBCC TRACEON (1204, -1) DBCC TRACEON (1222, -1) DBCC TRACEOFF (1204, -1) DBCC TRACEOFF (1222, -1) Due to the overhead, it’s best to enable the flag at runtime rather than on start up. Note that the scope of a non-startup trace flag can be global or session-level. Basic Deadlock Simulation By way of a very simple scenario, you can make use of SQL Management Studio (and breakpoints) to roughly simulate a deadlock scenario. Given the following basic table schema: CREATE TABLE [dbo].[UploadedFile]( [Id] [int] NOT NULL, [Filename] [nvarchar](50) NOT NULL, [DateCreated] [datetime] NOT NULL, [DateModified] [datetime] NULL, CONSTRAINT [PK_UploadedFile] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ) With some basic test data in it: If you create two separate queries in SQL Management Studio, use the following transaction (Query #1) to lock rows in the table: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE BEGIN TRANSACTION SELECT [Id],[Filename],[DateCreated],[DateModified] FROM [dbo].[UploadedFile] WHERE DateCreated > ‘2015-01-01′ ROLLBACK TRANSACTION Now add a “victim” script (Query #2) in a separate query session: UPDATE [dbo].[UploadedFile] SET [DateModified] = ‘2014-12-31′ WHERE DateCreated > ‘2015-01-01′ As long as you set a breakpoint on the ROLLBACK TRANSACTION statement, you’ll block the second query due to the isolation level of the transaction which wraps query #1. Now you can use the diagnostic T-SQL to examine the victim and the blocking transaction. Enjoy!

January 27, 2015

by Rob Sanders

· 165,267 Views

Improving Lock Performance in Java

After we introduced locked thread detection to Plumbr couple of months ago, we have started to receive queries similar to “hey, great, now I understand what is causing my performance issues, but what I am supposed to do now?” We are working hard to build the solution instructions into our own product, but in this post I am going to share several common techniques you can apply independent of the tool used for detecting the lock. The methods include lock splitting, concurrent data structures, protecting the data instead of the code and lock scope reduction. Locking is not evil, lock contention is Whenever you face a performance problem with the threaded code there is a chance that you will start blaming locks. After all, common “knowledge” is that locks are slow and limit scalability. So if you are equipped with this “knowledge” and start to optimize the code and getting rid of locks there is a chance that you end up introducing nasty concurrency bugs that will surface later on. So it is important to understand the difference between contended and uncontended locks. Lock contention occurs when a thread is trying to enter the synchronized block/method currently executed by another thread. This second thread is now forced to wait until the first thread has completed executing the synchronized block and releases the monitor. When only one thread at a time is trying to execute the synchronized code, the lock stays uncontended. As a matter of fact, synchronization in JVM is optimized for the uncontended case and for the vast majority of the applications, uncontended locks pose next to no overhead during execution. So, it is not locks you should blame for performance, but contended locks. Equipped with this knowledge, lets see what we can do to reduce either the likelihood of contention or the length of the contention. Protect the data not the code A quick way to achieve thread-safety is to lock access to the whole method. For example, take look at the following example, illustrating a naive attempt to build an online poker server: class GameServer { public Map<> tables = new HashMap>(); public synchronized void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } public synchronized void leave(Player player, Table table) {/*body skipped for brevity*/} public synchronized void createTable() {/*body skipped for brevity*/} public synchronized void destroyTable(Table table) {/*body skipped for brevity*/} } The intentions of the author have been good - when new players join() the table, there must be a guarantee that the number of players seated at the table would not exceed the table capacity of nine. But whenever such a solution would actually be responsible for seating players to tables - even on a poker site with moderate traffic, the system would be doomed to constantly trigger contention events by threads waiting for the lock to be released. Locked block contains account balance and table limit checks which potentially can involve expensive operations both increasing the likelihood and length of the contention. First step towards solution would be making sure we are protecting the data, not the code by moving the synchronization from the method declaration to the method body. In the minimalistic example above, it might not change much at the first place. But lets consider the whole GameServerinterface, not just the single join() method: class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { synchronized (tables) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } public void leave(Player player, Table table) {/* body skipped for brevity */} public void createTable() {/* body skipped for brevity */} public void destroyTable(Table table) {/* body skipped for brevity */} } What originally seemed to be a minor change, now affects the behaviour of the whole class. Whenever players were joining tables, the previously synchronized methods locked on theGameServer instance (this) and introduced contention events to players trying to simultaneouslyleave() tables. Moving the lock from the method signature to the method body postpones the locking and reduces the contention likelihood. Reduce the lock scope Now, after making sure it is the data we actually protect, not the code, we should make sure our solution is locking only what is necessary - for example when the code above is rewritten as follows: public class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { synchronized (tables) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } //other methods skipped for brevity } then the potentially time-consuming operation of checking player account balance (which potentially can involve IO operations) is now outside the lock scope. Notice that the lock was introduced only to protect against exceeding the table capacity and the account balance check is not anyhow part of this protective measure. Split your locks When we look at the last code example, you can clearly notice that the whole data structure is protected by the same lock. Considering that we might hold thousands of poker tables in this structure, it still poses a high risk for contention events as we have to protect each table separately from overflowing in capacity. For this there is an easy way to introduce individual locks per table, such as in the following example: public class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); synchronized (tablePlayers) { if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } //other methods skipped for brevity } Now, if we synchronize the access only to the same table instead of all the tables, we have significantly reduced the likelihood of locks becoming contended. Having for example 100 tables in our data structure, the likelihood of the contention is now 100x smaller than before. Use concurrent data structures Another improvement is to drop the traditional single-threaded data structures and use data structures designed explicitly for concurrent usage. For example, when picking ConcurrentHashMapto store all your poker tables would result in code similar to following: public class GameServer { public Map> tables = new ConcurrentHashMap>(); public synchronized void join(Player player, Table table) {/*Method body skipped for brevity*/} public synchronized void leave(Player player, Table table) {/*Method body skipped for brevity*/} public synchronized void createTable() { Table table = new Table(); tables.put(table.getId(), table); } public synchronized void destroyTable(Table table) { tables.remove(table.getId()); } } The synchronization in join() and leave() methods is still behaving as in our previous example, as we need to protect the integrity of individual tables. So no help from ConcurrentHashMap in this regards. But as we are also creating new tables and destroying tables in createTable() and destroyTable()methods, all these operations to the ConcurrentHashMap are fully concurrent, permitting to increase or reduce the number of tables in parallel. Other tips and tricks Reduce the visibility of the lock. In the example above, the locks are declared public and are thus visible to the world, so there is there is a chance that someone else will ruin your work by also locking on your carefully picked monitors. Check out java.util.concurrent.locks to see whether any of the locking strategies implemented there will improve the solution. Use atomic operations. The simple counter increase we are actually conducting in example above does not actually require a lock. Replacing the Integer in count tracking withAtomicInteger would most suit this example just fine. Hope the article helped you to solve the lock contention issues, independent of whether you are using Plumbr automatic lock detection solution or manually extracting the information from thread dumps.

January 22, 2015

by Nikita Salnikov-Tarnovski

· 11,610 Views

Avoiding MySQL ALTER Table Downtime

Originally Written byAndrew Moore MySQL table alterations can interrupt production traffic causing bad customer experience or in worst cases, loss of revenue. Not all DBAs, developers, syadmins know MySQL well enough to avoid this pitfall. DBAs usually encounter these kinds of production interruptions when working with upgrade scripts that touch both application and database or if an inexperienced admin/dev engineer perform the schema change without knowing how MySQL operates internally. Truths * Direct MySQL ALTER table locks for duration of change (pre-5.6) * Online DDL in MySQL 5.6 is not always online and may incurr locks * Even with Percona Toolkit‘s pt-online-schema-change there are several workloads that can experience blocking Here on the Percona MySQL Managed Services team we encourage our clients to work with us when planning and performing schema migrations. We aim to ensure that we are using the best method available in their given circumstance. Our intentions to avoid blocking when performing DDL on large tables ensures that business can continue as usual whilst we strive to improve response time or add application functionality. The bottom line is that a business relying on access to its data cannot afford to be down during core trading hours. Many of the installations we manage are still below MySQL 5.6, which requires us to seek workarounds to minimize the amount of disruption a migration can cause. This may entail slave promotion or changing the schema with an ‘online schema change’ tool. MySQL version 5.6 looks to address this issue by reducing the number of scenarios where a table is rebuilt and locked but it doesn’t yet cover all eventualities, for example when changing the data type of a column a full table rebuild is necessary. The topic of 5.6 Online Schema Change was discussed in great detail last year in the post, “Schema changes – what’s new in MySQL 5.6?” by Przemysław Malkowski With new functionality arriving in MySQL 5.7, we look forward to non-blocking DDL operations such as; OPTIMIZE TABLE and RENAME INDEX. (More info) The best advice for MySQL 5.6 users is to review the matrix to familiarize with situations where it might be best to look outside of MySQL to perform schema changes, the good news is that we’re on the right path to solving this natively. Truth be told, a blocking alter is usually going to go unnoticed on a 30MB table and we tend to use a direct alter in this situation, but on a 30GB or 300GB table we have some planning to do. If there is a period of time where activity is low and the this is permissive of locking the table then sometimes it is better execute within this window. Frequently though we are reactive to new SQL statements or a new performance issue and an emergency index is required to reduce load on the master in order to improve the response time. To pt-osc or not to pt-osc? As mentioned, pt-online-schema-change is a fixture in our workflow. It’s usually the right way to go but we still have occasions where pt-online-schema-change cannot be used, for example; when a table already uses triggers. It’s an important to remind ourselves of the the steps that pt-online-schema-change traverses to complete it’s job. Lets look at the source code to identify these; [moore@localhost]$ egrep 'Step' pt-online-schema-change # Step 1: Create the new table. # Step 2: Alter the new, empty table. This should be very quick, # Step 3: Create the triggers to capture changes on the original table and <--(metadata lock) # Step 4: Copy rows. # Step 5: Rename tables: orig -> old, new -> orig <--(metadata lock) # Step 6: Update foreign key constraints if there are child tables. # Step 7: Drop the old table. [moore@localhost]$egrep'Step'pt-online-schema-change # Step 1: Create the new table. # Step 2: Alter the new, empty table. This should be very quick, # Step 3: Create the triggers to capture changes on the original table and <--(metadata lock) # Step 4: Copy rows. # Step 5: Rename tables: orig -> old, new -> orig <--(metadata lock) # Step 6: Update foreign key constraints if there are child tables. # Step 7: Drop the old table. I pick out steps 3 and 5 from above to highlight a source of a source of potential downtime due to locks, but step 6 is also an area for concern since foreign keys can have nested actions and should be considered when planning these actions to avoid related tables from being rebuilt with a direct alter implicitly. There are several ways to approach a table with referential integrity constraints and they are detailed within the pt-osc documentation a good preparation step is to review the structure of your table including the constraints and how the ripples of the change can affect the tables around it. Recently we were alerted to an incident after a client with a highly concurrent and highly transactional workload ran a standard pt-online-schema-change script over a large table. This appeared normal to them and a few hours later our pager man was notified that this client was experiencing max_connections limit reached. So what was going on? When pt-online-schema-change reached step 5 it tried to acquire a metadata lock to rename the the original and the shadow table, however this wasn’t immediately granted due to open transactions and thus threads began to queue behind the RENAME command. The actual effect this had on the client’s application was downtime. No new connections could be made and all existing threads were waiting behind the RENAME command. Metadata locks Introduced in 5.5.3 at server level. When a transaction starts it will acquire a metadata lock (independent of storage engine) on all tables it uses and then releases them when it’s finished it’s work. This ensures that nothing can alter the table definition whilst a transaction is open. With some foresight and planning we can avoid these situations with non-default pt-osc options, namely –nodrop-new-table and –no-swap-tables. This combination leaves both the shadow table and the triggers inplace so that we can instigate an atomic RENAME when load permits. EDIT: as of percona-toolkit version 2.2 we have a new variable –tries which in conjunction with –set-vars has been deployed to cover this scenario where various pt-osc operations could block waiting for a metadata lock. The default behaviour of pt-osc (–set-vars) is to set the following session variables when it connects to the server; wait_timeout=10000 innodb_lock_wait_timeout=1 lock_wait_timeout=60 when using –tries we can granularly identify the operation, try count and the wait interval between tries. This combination will ensure that pt-osc will kill it’s own waiting session in good time to avoid the thread pileup and provide us with a loop to attempt to acquire our metadata lock for triggers|rename|fk management; –tries swap_tables:5:0.5,drop_triggers:5:0.5 The documentation is here http://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html#cmdoption-pt-online-schema-change–tries This illustrates that even with a tool like pt-online-schema-change it is important to understand the caveats presented with the solution you think is most adequate. To help decide the direction to take use the flow chart to ensure you’re taking into account some of the caveats of the MySQL schema change. Be sure to read up on the recommended outcome though as there are uncharted areas such as disk space, IO load that are not featured on the diagram. Choosing the right DDL option Ensure you know what effect ALTER TABLE will have on your platform and pick the right method to suit your uptime. Sometimes that means delaying the change until a period of lighter use or utilising a tool that will avoid holding a table locked for the duration of the operation. A direct ALTER is sometimes the answer like when you have triggers installed on a table. – In most cases pt-osc is exactly what we need – In many cases pt-osc is needed but the way in which it’s used needs tweaking – In few cases pt-osc isn’t the right tool/method and we need to consider native blocking ALTER or using failovers to juggle the change into place on all hosts in the replica cluster. If you want to learn more about avoiding avoidable downtime please tune into my webinar Wednesday, November 19 at 10 a.m. PST. It’s titled “Tips from the Trenches: A Guide to Preventing Downtime for the Over-Extended DBA.” Register now!(If you miss it, don’t worry: Catch the recording and download the slides from that same registration page.)

January 20, 2015

by Peter Zaitsev

· 15,317 Views · 1 Like

ORM and Angular -- Make Your App Smarter

Posted by Gilad F on Back& Blog. Current approaches to web development rely upon having two kinds of intelligence built into your application – business intelligence in the server, and presentation intelligence on the client side. This institutes a clear delineation in responsibilities, which is often desirable from an architectural standpoint. However, this approach does have some drawbacks. Processing time for business logic, for example, is centralized on the server. This can introduce bottlenecks in the application’s performance, or add complexity when it comes to cross-server communication. For smaller applications that nonetheless have a large user base, this can often be the single greatest performance concern – the time spent computing solutions by the server. One way this can be offset is through the use of Object-Relational Mapping, or ORM. Below we’ll look at the concept of ORM, and how creating an ORM system in Angular can help make your application smarter. What is an ORM? Simply put, Object-Relational Mapping is the concept of creating representations of your underlying data that know how to manage themselves. Most web applications boil down to four basic actions, known as the “CRUD” approach – Create a record, Retrieve records, Update a record, or Delete a record. With an ORM, you simply encapsulate each of these functions within a class that represents a given record in the database. In essence, the objects you create to represent your data on the front end also know how to manipulate that data on the back end. Why Use an ORM? The primary benefit of an ORM is that it hides a lot of the functional complexity of database integration behind an established API. Communication with the database to implement each of the CRUD methods can be complex, but once it’s been accomplished for one model it can be easily ported to all of the other models in your system. An ORM focuses on hiding as much of this code as possible, allowing your models to care only about how they are represented – and how they interact with other elements in the system. A series of calls to establish a connection to the database, for example, becomes a single call to a method named “Save” on the model instance. This also allows you to centralize your database code, giving you only one location where you need to look for database-related bugs instead of having to search a complex code base for different custom data communication handlers. Why Use an ORM in Angular? While the JavaScript stack is particularly performant when compared to more heavyweight offerings such as Rails and Django, it still faces the issues common to the standard web application architecture – the server has the potential to be a bottleneck, handling the incoming traffic from a number of locations. By focusing your development efforts to create a pure CRUD API in your server, and developing a rudimentary ORM in Angular, you can offload a lot of that processing load to the client machines – in essence parallelizing the process at the expense of increased network communication. This allows you to reduce the overall dependence of your application on the server, making the server a “thin” client that simply updates the database based upon the API calls issued by the client. After a certain point, your back-end can be outsourced completely to an external provider that specializes in providing this type of access – such as Backand – allowing you to completely offload scalability and security concerns. In essence, it allows you to focus on your application as opposed to focusing on the attendant resources. Conclusion Object-Relational Mapping is a powerful paradigm that eases communication with a database for the basic CRUD activities associated with web applications. As most existing web development environments focus on implementing ORM on the server side, this can result in performance and communication bottlenecks – not to mention increased infrastructure costs. By offloading some of these ORM tasks to AngularJS, you can parallelize many of these tasks and reduce overall server load, in some cases obviating the need for the server entirely. If your application is facing a bloated back-end communication pattern, it might be worth your time to look at working towards implementation of a client-side ORM system. Build your Angular app and connect it to any database with Backand today. – Get started now.translate in hindi

January 16, 2015

by Itay Herskovits

· 8,887 Views