DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Performance Topics

article thumbnail
How to Develop and Monitor Thread Pool Services Using Spring
Thread Pools are very important to execute synchronous & asynchronous processes. This article shows how to develop and monitor Thread Pool Services by using Spring. Creating Thread Pool has been explained via two alternative methods. Used Technologies : JDK 1.6.0_21 Spring 3.0.5 Maven 3.0.2 STEP 1 : CREATE MAVEN PROJECT A maven project is created as below. (It can be created by using Maven or IDE Plug-in). STEP 2 : LIBRARIES Spring dependencies are added to Maven’ s pom.xml. ? org.springframework spring-core ${spring.version} org.springframework spring-context ${spring.version} For creating runnable-jar, below plugin can be used. ? org.apache.maven.plugins maven-shade-plugin 1.3.1 package shade com.otv.exe.Application META-INF/spring.handlers META-INF/spring.schemas STEP 3 : CREATE TASK CLASS A new TestTask Class is created by implementing Runnable Interface. This class shows to be executed tasks. ? package com.otv.task; import org.apache.log4j.Logger; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class TestTask implements Runnable { private static Logger log = Logger.getLogger(TestTask.class); String taskName; public TestTask() { } public TestTask(String taskName) { this.taskName = taskName; } public void run() { try { log.debug(this.taskName + " : is started."); Thread.sleep(10000); log.debug(this.taskName + " : is completed."); } catch (InterruptedException e) { log.error(this.taskName + " : is not completed!"); e.printStackTrace(); } } @Override public String toString() { return (getTaskName()); } public String getTaskName() { return taskName; } public void setTaskName(String taskName) { this.taskName = taskName; } } STEP 4 : CREATE TestRejectedExecutionHandler CLASS TestRejectedExecutionHandler Class is created by implementing RejectedExecutionHandler Interface. If there is no idle thread and queue overflows, tasks will be rejected. This class handles rejected tasks. ? package com.otv.handler; import java.util.concurrent.RejectedExecutionHandler; import java.util.concurrent.ThreadPoolExecutor; import org.apache.log4j.Logger; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class TestRejectedExecutionHandler implements RejectedExecutionHandler { private static Logger log = Logger.getLogger(TestRejectedExecutionHandler.class); public void rejectedExecution(Runnable runnable, ThreadPoolExecutor executor) { log.debug(runnable.toString() + " : has been rejected"); } } STEP 5 : CREATE ITestThreadPoolExecutorService INTERFACE ITestThreadPoolExecutorService Interface is created. ? package com.otv.srv; import java.util.concurrent.ThreadPoolExecutor; import com.otv.handler.TestRejectedExecutionHandler; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public interface ITestThreadPoolExecutorService { public ThreadPoolExecutor createNewThreadPool(); public int getCorePoolSize(); public void setCorePoolSize(int corePoolSize); public int getMaxPoolSize(); public void setMaxPoolSize(int maximumPoolSize); public long getKeepAliveTime(); public void setKeepAliveTime(long keepAliveTime); public int getQueueCapacity(); public void setQueueCapacity(int queueCapacity); public TestRejectedExecutionHandler getTestRejectedExecutionHandler(); public void setTestRejectedExecutionHandler(TestRejectedExecutionHandler testRejectedExecutionHandler); } STEP 6 : CREATE TestThreadPoolExecutorService CLASS TestThreadPoolExecutorService Class is created by implementing ITestThreadPoolExecutorService Interface. This class creates a new Thread Pool. ? package com.otv.srv; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import com.otv.handler.TestRejectedExecutionHandler; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class TestThreadPoolExecutorService implements ITestThreadPoolExecutorService { private int corePoolSize; private int maxPoolSize; private long keepAliveTime; private int queueCapacity; TestRejectedExecutionHandler testRejectedExecutionHandler; public ThreadPoolExecutor createNewThreadPool() { ThreadPoolExecutor executor = new ThreadPoolExecutor(getCorePoolSize(), getMaxPoolSize(), getKeepAliveTime(), TimeUnit.SECONDS, new ArrayBlockingQueue(getQueueCapacity()), getTestRejectedExecutionHandler()); return executor; } public int getCorePoolSize() { return corePoolSize; } public void setCorePoolSize(int corePoolSize) { this.corePoolSize = corePoolSize; } public int getMaxPoolSize() { return maxPoolSize; } public void setMaxPoolSize(int maxPoolSize) { this.maxPoolSize = maxPoolSize; } public long getKeepAliveTime() { return keepAliveTime; } public void setKeepAliveTime(long keepAliveTime) { this.keepAliveTime = keepAliveTime; } public int getQueueCapacity() { return queueCapacity; } public void setQueueCapacity(int queueCapacity) { this.queueCapacity = queueCapacity; } public TestRejectedExecutionHandler getTestRejectedExecutionHandler() { return testRejectedExecutionHandler; } public void setTestRejectedExecutionHandler(TestRejectedExecutionHandler testRejectedExecutionHandler) { this.testRejectedExecutionHandler = testRejectedExecutionHandler; } } STEP 7 : CREATE IThreadPoolMonitorService INTERFACE IThreadPoolMonitorService Interface is created. ? package com.otv.monitor.srv; import java.util.concurrent.ThreadPoolExecutor; public interface IThreadPoolMonitorService extends Runnable { public void monitorThreadPool(); public ThreadPoolExecutor getExecutor(); public void setExecutor(ThreadPoolExecutor executor); } STEP 8 : CREATE ThreadPoolMonitorService CLASS ThreadPoolMonitorService Class is created by implementing IThreadPoolMonitorService Interface. This class monitors created thread pool. ? package com.otv.monitor.srv; import java.util.concurrent.ThreadPoolExecutor; import org.apache.log4j.Logger; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class ThreadPoolMonitorService implements IThreadPoolMonitorService { private static Logger log = Logger.getLogger(ThreadPoolMonitorService.class); ThreadPoolExecutor executor; private long monitoringPeriod; public void run() { try { while (true){ monitorThreadPool(); Thread.sleep(monitoringPeriod*1000); } } catch (Exception e) { log.error(e.getMessage()); } } public void monitorThreadPool() { StringBuffer strBuff = new StringBuffer(); strBuff.append("CurrentPoolSize : ").append(executor.getPoolSize()); strBuff.append(" - CorePoolSize : ").append(executor.getCorePoolSize()); strBuff.append(" - MaximumPoolSize : ").append(executor.getMaximumPoolSize()); strBuff.append(" - ActiveTaskCount : ").append(executor.getActiveCount()); strBuff.append(" - CompletedTaskCount : ").append(executor.getCompletedTaskCount()); strBuff.append(" - TotalTaskCount : ").append(executor.getTaskCount()); strBuff.append(" - isTerminated : ").append(executor.isTerminated()); log.debug(strBuff.toString()); } public ThreadPoolExecutor getExecutor() { return executor; } public void setExecutor(ThreadPoolExecutor executor) { this.executor = executor; } public long getMonitoringPeriod() { return monitoringPeriod; } public void setMonitoringPeriod(long monitoringPeriod) { this.monitoringPeriod = monitoringPeriod; } } STEP 9 : CREATE Starter CLASS Starter Class is created. ? package com.otv.start; import java.util.concurrent.ThreadPoolExecutor; import org.apache.log4j.Logger; import com.otv.handler.TestRejectedExecutionHandler; import com.otv.monitor.srv.IThreadPoolMonitorService; import com.otv.monitor.srv.ThreadPoolMonitorService; import com.otv.srv.ITestThreadPoolExecutorService; import com.otv.srv.TestThreadPoolExecutorService; import com.otv.task.TestTask; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class Starter { private static Logger log = Logger.getLogger(TestRejectedExecutionHandler.class); IThreadPoolMonitorService threadPoolMonitorService; ITestThreadPoolExecutorService testThreadPoolExecutorService; public void start() { // A new thread pool is created... ThreadPoolExecutor executor = testThreadPoolExecutorService.createNewThreadPool(); executor.allowCoreThreadTimeOut(true); // Created executor is set to ThreadPoolMonitorService... threadPoolMonitorService.setExecutor(executor); // ThreadPoolMonitorService is started... Thread monitor = new Thread(threadPoolMonitorService); monitor.start(); // New tasks are executed... for(int i=1;i<10;i++) { executor.execute(new TestTask("Task"+i)); } try { Thread.sleep(40000); } catch (Exception e) { log.error(e.getMessage()); } for(int i=10;i<19;i++) { executor.execute(new TestTask("Task"+i)); } // executor is shutdown... executor.shutdown(); } public IThreadPoolMonitorService getThreadPoolMonitorService() { return threadPoolMonitorService; } public void setThreadPoolMonitorService(IThreadPoolMonitorService threadPoolMonitorService) { this.threadPoolMonitorService = threadPoolMonitorService; } public ITestThreadPoolExecutorService getTestThreadPoolExecutorService() { return testThreadPoolExecutorService; } public void setTestThreadPoolExecutorService(ITestThreadPoolExecutorService testThreadPoolExecutorService) { this.testThreadPoolExecutorService = testThreadPoolExecutorService; } } STEP 10 : CREATE Application CLASS Application Class is created. This class runs the application. ? package com.otv.start; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class Application { public static void main(String[] args) { ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml"); Starter starter = (Starter) context.getBean("Starter"); starter.start(); } } STEP 11 : CREATE applicationContext.xml applicationContext.xml is created. ? STEP 12 : ALTERNATIVE METHOD TO CREATE THREAD POOL ThreadPoolTaskExecutor Class provided by Spring can also be used to create Thread Pool. ? STEP 13 : BUILD PROJECT After OTV_Spring_ThreadPool Project is build, OTV_Spring_ThreadPool-0.0.1-SNAPSHOT.jar will be created. STEP 14 : RUN PROJECT After created OTV_Spring_ThreadPool-0.0.1-SNAPSHOT.jar file is run, below output logs will be shown : ? 18.10.2011 20:08:48 DEBUG (TestRejectedExecutionHandler.java:19) - Task7 : has been rejected 18.10.2011 20:08:48 DEBUG (TestRejectedExecutionHandler.java:19) - Task8 : has been rejected 18.10.2011 20:08:48 DEBUG (TestRejectedExecutionHandler.java:19) - Task9 : has been rejected 18.10.2011 20:08:48 DEBUG (TestTask.java:25) - Task1 : is started. 18.10.2011 20:08:48 DEBUG (TestTask.java:25) - Task6 : is started. 18.10.2011 20:08:48 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 2 - CompletedTaskCount : 0 - TotalTaskCount : 5 - isTerminated : false 18.10.2011 20:08:48 DEBUG (TestTask.java:25) - Task5 : is started. 18.10.2011 20:08:53 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 0 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:08:58 DEBUG (TestTask.java:27) - Task6 : is completed. 18.10.2011 20:08:58 DEBUG (TestTask.java:27) - Task1 : is completed. 18.10.2011 20:08:58 DEBUG (TestTask.java:25) - Task3 : is started. 18.10.2011 20:08:58 DEBUG (TestTask.java:25) - Task2 : is started. 18.10.2011 20:08:58 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 2 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:08:58 DEBUG (TestTask.java:27) - Task5 : is completed. 18.10.2011 20:08:58 DEBUG (TestTask.java:25) - Task4 : is started. 18.10.2011 20:09:03 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 3 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:08 DEBUG (TestTask.java:27) - Task2 : is completed. 18.10.2011 20:09:08 DEBUG (TestTask.java:27) - Task3 : is completed. 18.10.2011 20:09:08 DEBUG (TestTask.java:27) - Task4 : is completed. 18.10.2011 20:09:08 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 6 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:13 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 6 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:18 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 0 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 6 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:23 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 0 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 6 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:28 DEBUG (TestTask.java:25) - Task10 : is started. 18.10.2011 20:09:28 DEBUG (TestRejectedExecutionHandler.java:19) - Task16 : has been rejected 18.10.2011 20:09:28 DEBUG (TestRejectedExecutionHandler.java:19) - Task17 : has been rejected 18.10.2011 20:09:28 DEBUG (TestRejectedExecutionHandler.java:19) - Task18 : has been rejected 18.10.2011 20:09:28 DEBUG (TestTask.java:25) - Task14 : is started. 18.10.2011 20:09:28 DEBUG (TestTask.java:25) - Task15 : is started. 18.10.2011 20:09:28 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 6 - TotalTaskCount : 12 - isTerminated : false 18.10.2011 20:09:33 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 6 - TotalTaskCount : 12 - isTerminated : false 18.10.2011 20:09:38 DEBUG (TestTask.java:27) - Task10 : is completed. 18.10.2011 20:09:38 DEBUG (TestTask.java:25) - Task11 : is started. 18.10.2011 20:09:38 DEBUG (TestTask.java:27) - Task14 : is completed. 18.10.2011 20:09:38 DEBUG (TestTask.java:27) - Task15 : is completed. 18.10.2011 20:09:38 DEBUG (TestTask.java:25) - Task12 : is started. 18.10.2011 20:09:38 DEBUG (TestTask.java:25) - Task13 : is started. 18.10.2011 20:09:38 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 9 - TotalTaskCount : 12 - isTerminated : false 18.10.2011 20:09:43 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 9 - TotalTaskCount : 12 - isTerminated : false 18.10.2011 20:09:48 DEBUG (TestTask.java:27) - Task11 : is completed. 18.10.2011 20:09:48 DEBUG (TestTask.java:27) - Task13 : is completed. 18.10.2011 20:09:48 DEBUG (TestTask.java:27) - Task12 : is completed. 18.10.2011 20:09:48 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 0 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 12 - TotalTaskCount : 12 - isTerminated : true 18.10.2011 20:09:53 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 0 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 12 - TotalTaskCount : 12 - isTerminated : true STEP 15 : DOWNLOAD OTV_Spring_ThreadPool
November 28, 2014
by Eren Avsarogullari
· 30,090 Views
article thumbnail
Sharding Pitfalls Part III: Chunk Balancing and Collection Limits
In Parts 1 and 2 we have covered a number of common issues people run into when managing a sharded MongoDB cluster. In this final post of the series we will cover a subtle, but important distinction in terms of balancing a sharded cluster as well as an interesting limitation that can be worked around relatively easily, but is nonetheless surprising when it comes up. 6. Chunk balancing != data balancing != traffic balancing The balancer in a sharded cluster cares about just one thing: Are chunks for a given collection evenly balanced across all shards? If they are not, then it will take steps to rectify that imbalance. This all sounds perfectly logical, and even with extra complexity like tagging involved the logic is pretty straight forward. If we assume that all chunks are equal, then we can rest assured that our data is being evenly balanced across all the shards in our cluster and rest easy at night. Although that is sometimes, perhaps even frequently, the case it is not always true - chunks are not always equal. There can be massive “jumbo” chunks that exceed the maximum chunk size (64MiB), completely empty chunks and everything in between. Let’s use an example from our first pitfall, the monotonically increasing shard key. For our example, we have picked just such a key to shard on (date), and up until this point we have had just one shard and had not sharded the collection. We are about to add a second shard to our cluster and so we enable sharding on the collection and do the necessary admin work to add the new shard into the cluster. Once the collection is enabled for sharding, the first shard contains all the newly minted chunks. Let’s represent them in a simplified table of 10 chunks. This is not representative of a real data set, but it will do for illustrative purposes: Table 1 - Initial Chunk Layout Now we add our second shard. The balancer will kick in and attempt to distribute the chunks evenly. It will do this by moving the lowest range chunks to the new shard until the counts are identical. Once it is finished balancing, our table now looks like this: Table 2 - Balanced Chunk Layout That looks pretty good at the moment, but lets imagine that more recent chunks are more likely to have more activity (updates say) than older chunks. Adding the traffic share estimates for each chunk shows that shard1 is taking far more traffic (72%) than shard2 (28%) despite the chunks seeming balanced overall based on the approximate size. Hence, chunk balancing is not equal to traffic balancing. Using that same example, let’s add another wrinkle - periodic deletion of old data. Every 3 months we run a job to delete any data older than 12 months. Let’s look at the impact of that on our table after we run it for the first time (assuming the first run happens on July 1st 2015). Table 3 - Post-Delete Chunk Layout The distribution of data is now completely skewed toward shard1 - shard2 is in fact empty! However, the balancer is completely unaware of this imbalance - the chunk count has remained the same the entire time, and as far as it is concerned the system is in a steady state. With no data on shard2, our traffic imbalance as seen above will be even worse, and we have essentially negated the benefit of having a second shard for this collection. Possible Mitigation Strategies If data and traffic balance are important, select an appropriate shard key Move chunks manually to address the imbalances - swap “hot” chunks for “cool” chunks, empty chunks for larger chunks 7. Waiting too long to shard a collection (collection too large) This is not very common, but when it falls on your shoulders, it can be quite challenging to solve. There is a maximum data size for a collection when when it is initially split which is a function of the chunk size and data size as noted on the limits page. If your collection contains less than 256GiB of data, then there will be no issue. If the collection size exceeds 256GiB but is less than 400GiB, then MongoDB may be able to do an initial split without any special measures being taken. Otherwise, with larger initial data sizes and the default settings, the initial split will fail. It is worth noting that once split the collection may grow as needed and without any real limitations as long as you can continue to add shards as data size grows. Possible Mitigation Strategies Since the limit is dictated by the chunk size and the data size, and assuming there is not much to be done about the data size, then the remaining variable is the chunk size. This is adjustable (default is 64MiB) and can be raised in order to let a large collection split initially and then reduced once that has been completed. The required chunk size increase will depend on the actual data size. However, this is relatively easy to work out - simply divide your data size by 256GB and then multiply that figure by 64MiB (and round up if it is not a nice even number). As an example, let’s consider a 4TiB collection: 4TiB divided by 256GiB = 16 64MiB x 16 = 1024MiB Hence, set the max chunk size to 1024MiB, then perform the initial sharding of the collection, and then finally reduce the chunk size back to 64MiB using the same procedure. . Thanks for reading through the Sharding Pitfall series! If you want to learn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations. Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.
October 27, 2014
by Francesca Krihely
· 4,263 Views
article thumbnail
Sharding Pitfalls Part II: Running a Sharded Cluster
By Adam Comerford, Senior Solutions Engineer In Part I we discussed important considerations when picking a shard key. In this post we will go through some recommendations when running a sharded cluster at scale. Scalability is one of the core benefits of sharding in MongoDB but this can give you a false sense of security; even with that flexibility, you still have to make smart decisions about how and when you deploy resources. In this post, we will cover a couple of common mistakes that people tend to make when it comes to running a sharded cluster. 3. Waiting too long to add a new shard (overloaded) You sharded your database and scaled horizontally for a reason, perhaps it was to add more memory or disk capacity. Whatever the reason, if your application usage grows over time so (generally) does your database utilization. Eventually, your current sharded cluster will pass a certain point, let’s call it 80% utilized (as a nice round estimate), such that it becomes problematic to add another shard. Why? Well, adding a new shard to a cluster is not free, and it is not instantaneous. It consumes resources and (initially) accepts very little traffic. Essentially, at the start of its existence, a newly added shard costs you capacity instead of adding capacity. The length of time it will stay in this state will depend on the balancer and how long it takes for a significant portion of “busy/active” chunks to move onto the new shard. It can often be easier to visualize this process, so let’s make up some hypothetical numbers and set the bar relatively low. Our imaginary existing cluster will be a set of 2 shards, with 2000 chunks (500 considered “active”) and to that we need to add a 3rd shard. This 3rd shard will eventually store one third of the active chunks (and total chunks). The question is, when does this shard stop adding overhead overall and instead become an asset? In reality, this will vary from cluster to cluster and have a lot of dependencies and variables - in other words you need to have good metrics about your cluster, particularly your load bottleneck. Therefore we will once again use our imaginations and go with a relatively low bar: when 5% of active chunks—that is, those chunks seeing most traffic—have migrated to the new shard, you should expect a net gain in performance. In our imaginary system we have evaluated our load levels, the expected impact of migrations and have determine that once that 5% threshold of active chunks has been migrated to the new shard it can be considered a net gain for the overall system. Once all chunks have been balanced, then the migration overhead disappears, but initially this will be an expected trade off. This chart shows how long it would take for new shards to reach net positive contribution in your cluster (the dotted line implies net gain): In this fabricated example, it takes almost 2 hours for the new shard to attain a viable level of active chunks and be considered a net gain for the overall system. Although these numbers are fictional, these numbers are based on setups we have seen in real systems with moderate load. From there it is relatively easy to imagine this set of migrations taking even longer on an overloaded set of shards, and taking far longer for our newly added shard to cross the threshold and become a net gain. As such it is best to be proactive and add capacity before it becomes a necessity. Possible Mitigation Strategies Manual balancing of targeted “hot” chunks (chunk that is being accessed more than others) to move activity to the new shard more quickly Add the shard at low traffic time so that there is less competition for resources Disable balancing on some collections, prioritise balancing busy collections first 4. Under-provisioning Config Servers Provisioning enough resources without being wasteful is always tricky, and all the more so in a complicated distributed system like a MongoDB sharded cluster. Everyone wants to use their hardware, virtual instances, virtual machines, containers and the like in the most efficient way possible, and get the best bang for their buck. Hence it is only natural to take a look at the various pieces of a distributed cluster and look for lower utilized pieces that could be put on less expensive resources. The most common pitfall here with MongoDB are the config servers, which are often neglected when stress testing a cluster. In testing environments and smaller deployments (unless specific measures are taken to stress them) they are relatively lightly loaded and usually identified as candidates for lesser instances/hardware. The problem is that these are critical pieces of infrastructure. They may not be heavily loaded all the time, but when they do see load and struggle to service requests, that can impact all queries (reads, writes, authentication) and add latency to all requests made of the cluster in question. In particular, the first config server in the list supplied to your mongos processes is vital. This is the config server that all mongos processes will default to read from when fetching or refreshing their view of the data distribution in your cluster. Similarly, this is the server that will be hit when attempting to authenticate a user. If it is under-provisioned and cannot service queries, or if it has problems with networking (packet loss, congestion), then the effects will be significant. Possible Mitigation Strategies Ensure the config servers are load tested, slightly over-provisioned (the first config server in particular) If using virtual machines or cloud based instances, investigate increasing available resources Turning off the balancer, disabling chunk splitting will reduce the chances of high read traffic to the config servers (no migrations, no meta data refresh) but this is only a temporary fix unless you have a perfect write distribution and may not eliminate issues completely. 5. Using the count() command on sharded collections This pitfall is very common, and it seems to hit somewhat randomly in terms of how long someone has been running a sharded environment. At some point, a question will arise along the lines of: “How are we tracking/verifying/checking how many documents we have in each collection on each shard, how balanced are they and do they agree with ?” Hopefully no one is actually constructing questions this way in your organization, but you get the basic idea. The most obvious way to do a quick check on this type of thing is to count the documents and see if the numbers make sense and/or agree with counts elsewhere. That thinking naturally leads people to the count command and they proceed to use it to gather figures for their documents and collections. Unfortunately, on a busy, mature sharded cluster, the results will very rarely be what is expected. The reason for this is that the count command as implemented today has several optimizations in place to make it faster to run in general and those speed optimizations essentially bypass a key piece of the sharding functionality needed to return accurate results in this case. This is a known bug and is being tracked in SERVER-3645, but does not stop people from consistently hitting this issue. The nature of the issue means that count will report documents in the results that it should not, for example: Documents that are being deleted as part of a chunk migrations Documents that have been left behind from previous chunk migrations (also known as orphans) Documents currently being copied as part of an in-flight chunk migration A regular query (rather than a count) will have its results filtered by the respective primary and not suffer from the same problem. Hence, if you were to manually count the results from a query client-side you would get an accurate result. This quirk of sharded environments will eventually be fixed, but for now it will inevitably crop up from time to time in all active sharded clusters used by a large team. Possible Mitigation Strategies Do counts on the client side, or use targeted, range based queries (with a primary read preference) to count instead Use cleanUpOrphaned and disable the balancer (make sure it has finished current round) when performing counts across the cluster If you want tolearn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations. Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.
October 21, 2014
by Francesca Krihely
· 4,720 Views
article thumbnail
Do it in Java 8: Automatic memoization
Memoization is a technique used to speed up functions. Memoization may be done manually. It may also be done automatically. We can find many examples of automatic memoization on Internet. In this article, I will show how Java 8 makes it very easy to memoize functions. What is memoization Memoization consist in caching the results of functions in order to speed them up when they are called several times with the same argument. The first call implies computing and storing the result in memory before returning it. Subsequent calls with the same parameter imply only fetching the previously stored value and returning it. How to apply memoization Memoization may be applied manually by hard coding it in every function that may benefit from it. If it takes a long time to compute the return value, memoization will speed up the program. For functions that take less time to evaluate than fetching the previously stored value from memory, memoization is clearly not a good option. Hard coding memoization by hand in each function is not a good option neither because it is repeating the same principle again and again. That is why automatic memoization is desirable. What to memoize Memoization applies to functions. Prior to Java 8, Java had no functions. However, it was perfectly possible to define some. Furthermore, we used to create “functional” methods, that is methods taking an argument and returning a value based only upon this argument. These kind of method may benefit from memoization. By the way, there is a match between functional methods and functions. For example, the following method: Integer doubleValue(Integer x) { return x * 2; } corresponds to: Integer doubleValue(Integer x) { if (cache.containsKey(x)) { return cache.get(x); } else { Integer result = x * 2; cache.put(x, result) ; return result; } } In Java 8, we can make this much cleaner: Map cache = new ConcurrentHashMap<>(); Integer doubleValue(Integer x) { return cache.computeIfAbsent(x, y -> y * 2); } Our function may be modified to use the same technique: Function doubleValue = x -> cache.computeIfAbsent(x, y -> y * 2); This is pretty simple, but it has two main drawbacks: We have to repeat this modification for all functions. The map we use is exposed and could potentially be modified by another thread having nothing to do with the function. The second problem is quite easy to address. We may put the method or the function in a separate class, including the map, with private access. For example, for the method case: public class Doubler { private static Map cache = new ConcurrentHashMap<>(); public static Integer doubleValue(Integer x) { return cache.computeIfAbsent(x, y -> y * 2); } } We may then instantiate that class and use it each time we want to compute a value: Integer y = Doubler.doubleValue(x); With this solution, the map is no longer accessible from outside. We can't do the same for functions because functions are anonymous classes and such classes may not have static members. One possibility would be to pass the map to the function as an additional argument. This may be done through a closure: class Doubler { private static Map cache = new ConcurrentHashMap<>(); public static Function doubleValue = x->cache.computeIfAbsent(x, y -> y * 2); } We can use this function as follows: Integer y = Doubler.doubleValue.apply(x); This gives no advantage compared to the “method” solution. However, we may also use this function in more idiomatic examples, such as: IntStream.range(1, 10).boxed().map(Doubler.doubleValue); This is equivalent to using the method version with the following syntax: IntStream.range(1, 10).boxed().map(ThisClass::doubleValue); The main problem is that while solving the second issue, we have made the first one more acute, which make automatic memoization more desirable. Automatic memoization: requirements What we need is a way to do the following: Function f = x -> x * 2; Function g = Memoizer.memoize(f); so that we may use the memoized function as a drop in replacement for the original one. All values returned by function g will be calculated through the original function f the first time, and returned from the cache for all subsequent accesses. By contrast, if we create a third function: Function f = x -> x * 2; Function g = Memoizer.memoize(f); Function h = Memoizer.memoize(f); the values cached by g will not be returned by h. In other words, g and h will use separate caches. Implementation The Memoizer class is quite simple: public class Memoizer { private final Map cache = new ConcurrentHashMap<>(); private Memoizer() {} private Function doMemoize(final Function function) { return input -> cache.computeIfAbsent(input, function::apply); } public static Function memoize(final Function function) { return new Memoizer().doMemoize(function); } } Using this class is also extremely simple: Integer longCalculation(Integer x) { try { Thread.sleep(1_000); } catch (InterruptedException ignored) { } return x * 2; } Function f = this::longCalculation; Function g = Memoizer.memoize(f); public void automaticMemoizationExample() { long startTime = System.currentTimeMillis(); Integer result1 = g.apply(1); long time1 = System.currentTimeMillis() - startTime; startTime = System.currentTimeMillis(); Integer result2 = g.apply(1); long time2 = System.currentTimeMillis() - startTime; System.out.println(result1); System.out.println(result2); System.out.println(time1); System.out.println(time2); } Running the automaticMemoizationExample method will produce the following result: 2 2 1000 0 We can now make memoized function out of ordinary ones by just calling a single method! What about functions with several arguments? Short answer: nothing. There are no such things in this world as functions with several arguments. Functions are applications of one set (the source set) to another set (the target set). So, they simply can't have several arguments. But this does not solve our problem. What is the functional equivalent to a method with several arguments? Long answer: what people generally consider as functions with several arguments are in fact either: Functions of tuples Function returning functions returning functions … returning a result In either cases, we are only concerned with functions of one argument, so we can easily use our Memoizer class. Using functions of tuples would probably be the simplest choice... if Java had tuples! We could of course write tuples. But to store tuples in maps, we would have to implement equals and hashcode for them, plus we would have to define tuples for two elements (pairs), tuple for three elements, and so on. Who knows where to stop? The second option is much easier. It is based upon currying, which means applying each argument one after the other instead of applying them as a whole (the tuple). Currying a function is very easy. The only problem, in Java 8, is that writing the types is really cumbersome. Currying a “function of two arguments” (in fact a function of a pair) is easy once you master the type. Java has in fact a shortcut for functions of tuple2 which is called BiFunction. We will take this as an example. The two following functions are equivalent (from the result point of view): BiFunction h = (x, y) -> x + y; Function> hc = x -> y -> x + y; Not considering the types, there are very little differences. In the first case, the two arguments are put between parentheses, separated by a comma, which is, by the way, how tuples are written in most languages which have them! Remove the parentheses and separate the arguments with an arrow and you get the curried version. We can only regret that we have to write the type as: Function> when other languages use a simplified syntax such as: Integer -> Integer -> Integer From this, it is easy to memoized this curried version, although we cant use the same simple form as previously. We have to memoize each function: Function> mhc = Memoizer.memoize(x -> Memoizer.memoize(y -> x + y)); Same thing for a function of (a tuple of) 3 arguments (which by the way has no equivalent in Java): Function>> f3 = x -> y -> z -> x + y - z; Function>> f3m = Memoizer.memoize(x -> Memoizer.memoize(y -> Memoizer.memoize(z -> x + y – z)); Here is an example of using this memoized function “of three arguments”: Function>> f3 = x -> y -> z -> longCalculation(x) + longCalculation(y) - longCalculation(z); Function>> f3m = Memoizer.memoize(x -> Memoizer.memoize(y -> Memoizer.memoize(z -> longCalculation(x) + longCalculation(y) - longCalculation(z)))); public void automaticMemoizationExample2() { long startTime = System.currentTimeMillis(); Integer result1 = f3m.apply(2).apply(3).apply(4); long time1 = System.currentTimeMillis() - startTime; startTime = System.currentTimeMillis(); Integer result2 = f3m.apply(2).apply(3).apply(4); long time2 = System.currentTimeMillis() - startTime; System.out.println(result1); System.out.println(result2); System.out.println(time1); System.out.println(time2); } This example produces the following output: 2 2 3002 0 showing that the first access to method longCalculation has taken 3000 milliseconds and the second has return immediately. On the other hand, using a function of tuple may seem easier once you have the Tuple class defined. Here is an example of Tuple3: public class Tuple3 { public final T _1; public final U _2; public final V _3; public Tuple3(T t, U u, V v) { _1 = Objects.requireNonNull(t); _2 = Objects.requireNonNull(u); _3 = Objects.requireNonNull(v); } @Override public boolean equals(Object o) { if (!(o instanceof Tuple3)) return false; else { Tuple3 that = (Tuple3) o; return _1.equals(that._1) && _2.equals(that._2) && _3.equals(that._3); } } @Override public int hashCode() { return _1.hashCode() + _2.hashCode() + _3.hashCode(); } } Using this class, we may rewrite the previous example as: Function, Integer> ft = x -> longCalculation(x._1) + longCalculation(x._2) - longCalculation(x._3); Function, Integer> ftm = Memoizer.memoize(ft); public void automaticMemoizationExample3() { long startTime = System.currentTimeMillis(); Integer result1 = ftm.apply(new Tuple3<>(2, 3, 4)); long time1 = System.currentTimeMillis() - startTime; startTime = System.currentTimeMillis(); Integer result2 = ftm.apply(new Tuple3<>(2, 3, 4)); long time2 = System.currentTimeMillis() - startTime; System.out.println(result1); System.out.println(result2); System.out.println(time1); System.out.println(time2); } Conclusion Memoizing is about maintaining state between function calls. A memoized function is a function which behavior is dependent upon the current state. However, it will always return the same value for the same argument. Only the time needed to return the value will be different. So the memoized function is still a pure function if the original function is pure. However, there is a kind of function that may pose a problem: recursive functions that call themselves several times with the same argument may not be memoized this way. This will be addressed in a next article.
September 30, 2014
by Pierre-Yves Saumont
· 69,123 Views · 19 Likes
article thumbnail
How to Setup Custom Remote Deployment Repositories for JBoss BPM Suite
In this article we wanted to share another configuration property that can provide surprising help when setting up your JBoss BPM Suite. Previously we outlined a basic set of configuration properties to provide you with a few tricks when installing your own JBoss BRMS or JBoss BPM Suite products. As the JBoss BPM Suite is a super set, including full JBoss BRMS functionality, the rest of this article will refer only to JBoss BPM Suite but apply to both products. In this article we will show you how to modify your JBoss EAP container configuration to point the products at a custom deployment repository by adjusting a single configuration property. Maven repository The default setup is that the products will look for your maven setting in the default settings.xml as found set in theM2_HOME variable or in the users home directory at .m2/settings.xml. The following system property can be added to JBoss EAP standalone.xml configuration file to point to any file containing your custom settings. kie.maven.settings.custom Location of the maven configuration file where it can find it's settings. Default: the M2_HOME/conf/settings.xml or users home directory .m2/settings.xml Example usage in JBoss EAP When initially setting up the product for use on JBoss EAP containers, one can adjust configuration with the help of system properties. Below we show how to configure an installation to point to our custom maven deployment repository by using a custom settings file we will call bpmsuite-settings.xml We hope this helps you with configuring your own custom deployment repositories and enables you to tie into existing continuous integration infrastructures that might exist in your organization.
September 19, 2014
by Eric D. Schabell DZone Core CORE
· 6,169 Views · 1 Like
article thumbnail
MySQL 101: Monitor Disk I/O with pt-diskstats
Originally Written by Muhammad Irfan Here on the Percona Support team we often ask customers to retrieve disk stats to monitor disk IO and to measure block devices iops and latency. There are a number of tools available to monitor IO on Linux. iostat is one of the popular tools and Percona Toolkit, which is free, contains the pt-diskstats tool for this purpose. The pt-diskstats tool is similar to iostat but it’s more interactive and contains extended information. pt-diskstats reports current disk activity and shows the statistics for the last second (which by default is 1 second) and will continue until interrupted. The pt-diskstats tool collects samples of /proc/diskstats. In this post, I will share some examples about how to monitor and check to see if the IO subsystem is performing properly or if any disks are a limiting factor – all this by using the pt-diskstats tool. pt-diskstats output consists on number of columns and in order to interpret pt-diskstats output we need to know what each column represents. rd_s tells about number of reads per second while wr_s represents number of writes per second. rd_rt and wr_rt shows average response time in milliseconds for reads & writes respectively, which is similar to iostat tool output await column but pt-diskstats shows individual response time for reads and writes at disk level. Just a note, modern iostat splits read and write latency out, but most distros don’t have the latest iostat in their systat (or equivalent) package. rd_mrg and wr_mrg are other two important columns in pt-diskstats output. *_mrg is telling us how many of the original operations the IO elevator (disk scheduler) was able to merge to reduce IOPS, so *_mrg is telling us a quite important thing by letting us know that the IO scheduler was able to consolidate many or few operations. If rd_mrg/wr_mrg is high% then the IO workload is sequential on the other hand, If rd_mrg/wr_mrg is a low% then IO workload is all random. Binary logs, redo logs (aka ib_logfile*), undo log and doublewrite buffer all need sequential writes. qtime and stime are last two columns in pt-diskstats output where qtime reflects to time spent in disk scheduler queue i.e. average queue time before sending it to physical device and on the other hand stime is average service time which is time accumulated to process the physical device request. Note, that qtime is not discriminated between reads and writes and you can check if response time is higher for qtime than it signal towards disk scheduler. Also note that service time (stime field and svctm field in in pt-diskstats & iostat output respectively) is not reliable on Linux. If you read the iostat manual you will see it is deprecated. Along with that, there are many other parameters for pt-diskstats – you can found full documentation here. Below is an example of pt-disktats in action. I used the –devices-regex option which prints only device information that matches this Perl regex. $ pt-diskstats --devices-regex=sd --interval 5 #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.1 sda 21.6 22.8 0.5 45% 1.2 29.4 275.5 4.0 1.1 0% 40.0 145.1 65% 158 297.1 155.0 2.1 1.1 sdb 15.0 21.0 0.3 33% 0.1 5.2 0.0 0.0 0.0 0% 0.0 0.0 11% 1 15.0 0.5 4.7 1.1 sdc 5.6 10.0 0.1 0% 0.0 5.2 1.9 6.0 0.0 33% 0.0 2.0 3% 0 7.5 0.4 3.6 1.1 sdd 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 5.0 sda 17.0 14.8 0.2 64% 3.1 66.7 404.9 4.6 1.8 14% 140.9 298.5 100% 111 421.9 277.6 1.9 5.0 sdb 14.0 19.9 0.3 48% 0.1 5.5 0.4 174.0 0.1 98% 0.0 0.0 11% 0 14.4 0.9 2.4 5.0 sdc 3.6 27.1 0.1 61% 0.0 3.5 2.8 5.7 0.0 30% 0.0 2.0 3% 0 6.4 0.7 2.4 5.0 sdd 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 These are the stats from 7200 RPM SATA disks. As you can see, the write-response time is very high and most of that is made up of IO queue time. This shows the problem exactly. The problem is that the IO subsystem is not able to handle the write workload because the amount of writes that are being performed are way beyond what it can handle. It means the disks cannot service every request concurrently. The workload would actually depend a lot on where the hot data is stored and as we can see in this particular case the workload only hits a single disk out of the 4 disks. A single 7.2K RPM disk can only do about 100 random writes per second which is not a lot considering heavy workload. It’s not particularly a hardware issue but a hardware capacity issue. The kind of workload that is present and the amount of writes that are performed per second are not something that the IO subsystem is able to handle in an efficient manner. Mostly writes are generated on this server as can be seen by the disk stats. Let me show you a second example. Here you can see read latency. rd_rt is consistently between 10ms-30ms. It depends on how fast the disks are spinning and the number of disks. To deal with it possible solutions would be to optimize queries to avoid table scans, use memcached where possible, use SSD’s as it can provide good I/O performance with high concurrency. You will find this post useful on SSD’s from our CEO, Peter Zaitsev. #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.0 sdb 33.0 29.1 0.9 0% 1.1 34.7 7.0 10.3 0.1 61% 0.0 0.4 99% 1 40.0 2.2 19.5 1.0 sdb1 0.0 0.0 0.0 0% 0.0 0.0 7.0 10.3 0.1 61% 0.0 0.4 1% 0 7.0 0.0 0.4 1.0 sdb2 33.0 29.1 0.9 0% 1.1 34.7 0.0 0.0 0.0 0% 0.0 0.0 99% 1 33.0 3.5 30.2 1.0 sdb 81.9 28.5 2.3 0% 1.1 14.0 0.0 0.0 0.0 0% 0.0 0.0 99% 1 81.9 2.0 12.0 1.0 sdb1 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 1.0 sdb2 81.9 28.5 2.3 0% 1.1 14.0 0.0 0.0 0.0 0% 0.0 0.0 99% 1 81.9 2.0 12.0 1.0 sdb 50.0 25.7 1.3 0% 1.3 25.1 13.0 11.7 0.1 66% 0.0 0.7 99% 1 63.0 3.4 11.3 1.0 sdb1 25.0 21.3 0.5 0% 0.6 25.2 13.0 11.7 0.1 66% 0.0 0.7 46% 1 38.0 3.2 7.3 1.0 sdb2 25.0 30.1 0.7 0% 0.6 25.0 0.0 0.0 0.0 0% 0.0 0.0 56% 0 25.0 3.6 22.2 From the below diskstats output it seems that IO is saturated between both reads and writes. This can be noticed with high value for columns rd_s and wr_s. In this particular case, consider having disks in either RAID 5 (better for read only workload) or RAID 10 array is good option along with battery-backed write cache (BBWC) as single disk can really be bad for performance when you are IO bound. device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime sdb1 362.0 27.4 9.7 0% 2.7 7.5 525.2 20.2 10.3 35% 6.4 8.0 100% 0 887.2 7.0 0.9 sdb1 439.9 26.5 11.4 0% 3.4 7.7 545.7 20.8 11.1 34% 9.8 11.9 100% 0 985.6 9.6 0.8 sdb1 576.6 26.5 14.9 0% 4.5 7.8 400.2 19.9 7.8 34% 6.7 10.9 100% 0 976.8 8.6 0.8 sdb1 410.8 24.2 9.7 0% 2.9 7.1 403.1 18.3 7.2 34% 10.8 17.7 100% 0 813.9 12.5 1.0 sdb1 378.4 24.6 9.1 0% 2.7 7.3 506.1 16.5 8.2 33% 5.7 7.6 100% 0 884.4 6.6 0.9 sdb1 572.8 26.1 14.6 0% 4.8 8.4 422.6 17.2 7.1 30% 1.7 2.8 100% 0 995.4 4.7 0.8 sdb1 429.2 23.0 9.6 0% 3.2 7.4 511.9 14.5 7.2 31% 1.2 1.7 100% 0 941.2 3.6 0.9 The following example reflects write heavy activity but write-response time is very good, under 1ms, which shows disks are healthy and capable of handling high number of IOPS. #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.0 dm-0 530.8 16.0 8.3 0% 0.3 0.5 6124.0 5.1 30.7 0% 1.7 0.3 86% 2 6654.8 0.2 0.1 2.0 dm-0 633.1 16.1 10.0 0% 0.3 0.5 6173.0 6.1 36.6 0% 1.7 0.3 88% 1 6806.1 0.2 0.1 3.0 dm-0 731.8 16.0 11.5 0% 0.4 0.5 6064.2 5.8 34.1 0% 1.9 0.3 90% 2 6795.9 0.2 0.1 4.0 dm-0 711.1 16.0 11.1 0% 0.3 0.5 6448.5 5.4 34.3 0% 1.8 0.3 92% 2 7159.6 0.2 0.1 5.0 dm-0 700.1 16.0 10.9 0% 0.4 0.5 5689.4 5.8 32.2 0% 1.9 0.3 88% 0 6389.5 0.2 0.1 6.0 dm-0 774.1 16.0 12.1 0% 0.3 0.4 6409.5 5.5 34.2 0% 1.7 0.3 86% 0 7183.5 0.2 0.1 7.0 dm-0 849.6 16.0 13.3 0% 0.4 0.5 6151.2 5.4 32.3 0% 1.9 0.3 88% 3 7000.8 0.2 0.1 8.0 dm-0 664.2 16.0 10.4 0% 0.3 0.5 6349.2 5.7 35.1 0% 2.0 0.3 90% 2 7013.4 0.2 0.1 9.0 dm-0 951.0 16.0 14.9 0% 0.4 0.4 5807.0 5.3 29.9 0% 1.8 0.3 90% 3 6758.0 0.2 0.1 10.0 dm-0 742.0 16.0 11.6 0% 0.3 0.5 6461.1 5.1 32.2 0% 1.7 0.3 87% 1 7203.2 0.2 0.1 Let me show you a final example. I used –interval and –iterations parameters for pt-diskstats which tells us to wait for a number of seconds before printing the next disk stats and to limit the number of samples respectively. If you notice, you will see in 3rd iteration high latency (rd_rt, wr_rt) mostly for reads. Also, you can notice a high value for queue time (qtime) and service time (stime) where qtime is related to disk IO scheduler settings. For MySQL database servers we usually recommends noop/deadline instead of default cfq. $ pt-diskstats --interval=20 --iterations=3 #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 10.4 hda 11.7 4.0 0.0 0% 0.0 1.1 40.7 11.7 0.5 26% 0.1 2.1 10% 0 52.5 0.4 1.5 10.4 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.4 7.0 0.0 43% 0.0 0.1 0% 0 0.4 0.0 0.1 10.4 hda3 0.0 0.0 0.0 0% 0.0 0.0 0.4 107.0 0.0 96% 0.0 0.2 0% 0 0.4 0.0 0.2 10.4 hda5 0.0 0.0 0.0 0% 0.0 0.0 0.7 20.0 0.0 80% 0.0 0.3 0% 0 0.7 0.1 0.2 10.4 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.1 4.0 0.0 0% 0.0 4.0 0% 0 0.1 0.0 4.0 10.4 hda9 11.7 4.0 0.0 0% 0.0 1.1 39.2 10.7 0.4 3% 0.1 2.7 9% 0 50.9 0.5 1.8 10.4 drbd1 11.7 4.0 0.0 0% 0.0 1.1 39.1 10.7 0.4 0% 0.1 2.8 9% 0 50.8 0.5 1.7 20.0 hda 14.6 4.0 0.1 0% 0.0 1.4 39.5 12.3 0.5 26% 0.3 6.4 18% 0 54.1 2.6 2.7 20.0 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.4 9.1 0.0 56% 0.0 42.0 3% 0 0.4 0.0 42.0 20.0 hda3 0.0 0.0 0.0 0% 0.0 0.0 1.5 22.3 0.0 82% 0.0 1.5 0% 0 1.5 1.2 0.3 20.0 hda5 0.0 0.0 0.0 0% 0.0 0.0 1.1 18.9 0.0 79% 0.1 21.4 11% 0 1.1 0.1 21.3 20.0 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.8 10.4 0.0 62% 0.0 1.5 0% 0 0.8 1.3 0.2 20.0 hda9 14.6 4.0 0.1 0% 0.0 1.4 35.8 11.7 0.4 3% 0.2 4.9 18% 0 50.4 0.5 3.5 20.0 drbd1 14.6 4.0 0.1 0% 0.0 1.4 36.4 11.6 0.4 0% 0.2 5.1 17% 0 51.0 0.5 3.4 20.0 hda 0.9 4.0 0.0 0% 0.2 251.9 28.8 61.8 1.7 92% 4.5 13.1 31% 2 29.6 12.8 0.9 20.0 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.6 8.3 0.0 52% 0.1 98.2 6% 0 0.6 48.9 49.3 20.0 hda3 0.0 0.0 0.0 0% 0.0 0.0 2.0 23.2 0.0 83% 0.0 1.4 0% 0 2.0 1.2 0.3 20.0 hda5 0.0 0.0 0.0 0% 0.0 0.0 4.9 249.4 1.2 98% 4.0 13.2 9% 0 4.9 12.9 0.3 20.0 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 20.0 hda9 0.9 4.0 0.0 0% 0.2 251.9 21.3 24.2 0.5 32% 0.4 12.9 31% 2 22.2 10.2 9.7 20.0 drbd1 0.9 4.0 0.0 0% 0.2 251.9 30.6 17.0 0.5 0% 0.7 24.1 30% 5 31.4 21.0 9.5 You can see the busy column in pt-diskstats output which is the same as the util column in iostat – which points to utilization. Actually, pt-diskstats is quite similar to the iostat tool but pt-diskstats is more interactive and has more information. The busy percentage is only telling us for how long the IO subsystem was busy, but is not indicating capacity. So the only time you care about %busy is when it’s 100% and at the same time latency (await in iostat and rd_rt/wr_rt in diskstats output) increases over -say- 5ms. You can estimate capacity of your IO subsystem and then look at the IOPS being consumed (r/s + w/s columns). Also, the system can process more than one request in parallel (in case of RAID) so %busy can go beyond 100% in pt-diskstats output. If you need to check disk throughput, block device IOPS run the following to capture metrics from your IO subsystem and see if utilization matches other worrisome symptoms. I would suggest capturing disk stats during peak load. Output can be grouped by sample or by disk using the –group-by option. You can use the sysbench benchmark tool for this purpose to measure database server performance. You will find this link useful for sysbench tool details. $ pt-diskstats --group-by=all --iterations=7200 > /tmp/pt-diskstats.out; Conclusion: pt-diskstats is one of the finest tools from Percona Toolkit. By using this tool you can easily spot disk bottlenecks, measure the IO subsystem and identify how much IOPS your drive can handle (i.e. disk capacity).
September 19, 2014
by Peter Zaitsev
· 5,239 Views
article thumbnail
Introducing BIRT iHub F-Type: Installing on Windows
Originally written by Virgil Dodson Actuate recently released a new, free BIRT server called the BIRT iHub F-Type. It incorporates all the functionality of BIRT iHub and is limited only by the capacity of output it can deliver on a daily basis. It is ideal for departmental and smaller scale applications. When BIRT F-Type reaches its maximum output capacity, additional capacity can be purchased on a subscription based model. Some of the key features of BIRT iHub F-Type that will help improve your BIRT content applications are: Interactivity – Allow end-users to modify and personalize reports, and answer questions themselves. Scheduling – Automate report generation based on rules and calendar, and then notify users. Sharing – Secure document management and distribution that allows users to only access content/data they are entitled to. Excel Emitter – Export as native Excel (not CSV) with formulas/pivot tables/worksheets/charts. Integration – JavaScript API to embed dynamic reports and visualizations in your web app. Downloading BIRT iHub F-Type Before we get started with the installation process, we need to download BIRT iHub F-Type. There are three downloads available: Windows, Linux, and a VMware image. This blog will cover the Windows installation. If you’re installing either of the other types, you’ll find links to guides for them at the bottom of this blog post. Once you click on your chosen download, you’ll be asked to register. If you’ve already registered, click the “Click to Login” button. If not, fill out the short registration form to get started. Next, read and accept the license agreement. Once you’ve done that, click the checkbox, and a link for the download will appear. Click that to start your download. At this point, you should also receive an email with an activation code. Be sure to check your spam folder if you don’t see it in your inbox. Installing BIRT iHub F-Type After the download is complete, launch the executable file named ActuateBIRTiHubFType.exe. A welcome message will appear. Press Next to continue. You must read and accept the license agreement on the next screen. Choose a destination folder for the installation. The default is C:\Actuate\BIRTiHub. If you have existing BIRT designs that depend on a JDBC database driver, you can optionally specify the folder where these drivers are located. Press Next to continue. Once the installation has finished, press Finish to launch the BIRT iHub F-Type. A desktop shortcut is also created that points to the iHub F-Type URL at http://localhost:8700/iportal. The first time you launch the BIRT iHub F-Type, you will need to activate it. Enter the activation code that you should have received in an e-mail. After entering a valid activation code, you should receive a message that the code was accepted and the BIRT iHub F-Type should start initializing services. Once that has completed, you will be presented with the login screen. The default user name is “administrator” and the password is blank for your first log in. You’ll be able to change this after you have logged in. Press “Log In” to continue. The first time you launch the BIRT iHub F-Type, you will be in tutorial mode which will help you get started loading your BIRT content and required resources. You can bypass the tutorial mode at any time by pressing the “Exit Tutorial” button at the top right. Select a BIRT design (*.rptdesign) file and press the Upload button. If you don’t have a BIRT design, you can download a sample from the link on the same page. The BIRT design file is automatically inspected and if there are any dependent files needed, like images, data files, BIRT report libraries, CSS styles, or other linked BIRT designs, you will be asked to upload those files as well. Once your BIRT design and dependent files are uploaded, your BIRT report will be displayed in the BIRT iHub F-Type and is now ready to explore. Thanks for reading. Now, it’s time to unleash the full power of BIRT into your application. If you have any questions or comments, please feel free to use the comments section below or visit the BIRT iHub F-Type forum. -Virgil For more blogs in the “Introducing BIRT iHub F-Type” series, see the list below: Installing iHub F-Type: Linux | VMWare Image
September 12, 2014
by Michael Singer
· 7,010 Views
article thumbnail
Jar Hell Made Easy - Demystifying the Classpath
Some of the hardest problems a Java Developer will ever have to face are classpath errors: ClassNotFoundException, NoClassDefFoundError, Jar Hell, Xerces Hell and company. In this post we will go through the root causes of these problems, and see how a minimal tool (JHades) can help solving them quickly. We will see why Maven cannot (always) prevent classpath duplicates, and also: The only way to deal with Jar Hell Class loaders The Class loader chain Class loader priority: Parent First vs Parent Last Debugging server startup problems Making sense of Jar Hell with jHades Simple strategy for avoiding classpath problems The classpath gets fixed in Java 9? The only way to deal with Jar Hell Classpath problems can be time-consuming to debug, and tend to happen at the worst possible times and places: before releases, and often in environments where there is little to no access by the development team. They can also happen at the IDE level, and become a source of reduced productivity. We developers tend to find these problems early and often, and this is the usual response: Let's try to save us some hair and get to the bottom of this. These type of problems are hard to approach via trial and error. The only real way to solve them is to really understand what is going on, but where to start? It turns out that Jar Hell problems are simpler than what they look, and only a few concepts are needed to solve them. In the end, the common root causes for Jar Hell problems are: a Jar is missing there is one Jar too many a class is not visible where it should be But if it's that simple, then why are classpath problems so hard to debug? Jar Hell stack traces are incomplete One reason is that the stack traces for classpath problems have a lot of information missing that is needed to troubleshoot the problem. Take for example this stack trace: java.lang.IncompatibleClassChangeError: Class org.jhades.SomeServiceImpl does not implement the requested interfaceorg.jhades.SomeService org.jhades.TestServlet.doGet(TestServlet.java:19) It says that a class does not implement a certain interface. But if we look at the class source: publicclassSomeServiceImpl implementsSomeService { @Override publicvoiddoSomething() { System.out.println( "Call successful!"); } Well, the class clearly implements the missing interface! So what is going on then? The problem is that the stack trace is missing a lot of information that is critical to understanding the problem. The stack trace should have probably contained an error message such as this (we will learn what this means): The Class SomeServiceImpl of class loader /path/to/tomcat/lib does not implement the interface SomeService loaded from class loader Tomcat - WebApp - /path/to/tomcat/webapps/test This would be at least an indication of where to start: Someone new learning Java would at least know that there is this notion of class loader that is essential to understand what is going on It would make clear that one class involved was not being loaded from a WAR, but somehow from some directory on the server (SomeServiceImpl). What is a Class Loader? To start, a Class Loader is just a Java class, more exactly an instance of a class at runtime. It is NOT an inaccessible internal component of the JVM like for example the garbage collector. Take for example the WebAppClassLoader of Tomcat, here is it's javadoc. As you can see it's just a plain Java class, we can even write our own class loader if needed. Any subclass of ClassLoader will qualify as a class loader. The main responsibilities of a class loader is to known where class files are located, and then load classes on JVM demand. Everything is linked to a class loader Each object in the JVM is linked to it's Class via getClass(), and each class is linked to a class loader via getClassLoader(). This means that: Every object in the JVM is linked to a class loader! Let's see how this fact can be used to troubleshoot a classpath error scenario. How-To find where a class file really is Let's take an object and see where it's class file is located in the file system: System.out.println(service.getClass() .getClassLoader() .getResource("org/jhades/SomeServiceImpl.class")); This is the full path to the class file: jar:file:/Users/user1/.m2/repository/org/jhades/jar-2/1.0-SNAPSHOT/jar-2-1.0-SNAPSHOT.jar!/org/jhades/SomeServiceImpl.class As we can see the class loader is just a runtime component that knowns where in the file system to look for class files and how to load them. But what happens if the class loader cannot find a given class? The Class loader Chain By default in the JVM, if a class loader does not find a class, it will then ask it's parent class loader for that same class and so forth. This continues all the way up until the JVM bootstrap class loader (more on this later). This chain of class loaders is the class loader delegation chain. Class loader priority: Parent First vs Parent Last Some class loaders delegate requests immediately to the parent class loader, without searching first in their own known set of directories for the class file. A class loader operating on this mode is said to be in Parent First mode. If a class loader first looks for a class locally and only after queries the parent if the class is not found, then that class loader is said to be working in Parent Last mode. Do all applications have a class loader chain ? Even the most simple Hello World main method has 3 class loaders: The Application class loader, responsible for loading the application classes (parent first) The Extensions class loader, that loads jars from $JAVA_HOME/jre/lib/ext (parent first) The Bootstrap class loader, that loads any class shipped with the JDK such as java.lang.String (no parent class loader) What does the class loader chain of a WAR application look like? In the case of application servers like Tomcat or Websphere, the class loader chain is configured differently than a simple Hello World main method program. Take for example the case of the Tomcat class loader chain: Here we wee that each WAR runs in a WebAppClassLoader, that works in parent last mode (it can be set to parent first as well). The Common class loader loads libraries installed at the level of the server. What does the Servlet spec say about class loading? Only a small part of the class loader chain behavior is defined by the Servlet container specification: The WAR application runs on it's own application class loader, that might be shared with other applications or not The files in WEB-INF/classes take precedence over everything else After that, it's anyones guess! The rest is completely open for interpretation by container providers. Why isn't there a common approach for class loading across vendors? Usually open source containers like Tomcat or Jetty are configured by default to look for classes in the WAR first, and only then search in server class loaders. This allows for applications to use their own versions of libraries that override the ones available on the server. What about the big iron servers? Commercial products like Websphere will try to 'sell' you their own server provided libraries, that by default take precedence over the ones installed on the WAR. This is done assuming that if you bought the server you want also to use the JEE libraries and versions it provides, which is often NOT the case. This makes deploying to certain commercial products a huge hassle, as they behave differently then the Tomcat or Jetty that developers use to run applications in their workstation. We will see further on a solution for this. Common Problem: duplicate class versions At this moment you probably have a huge question: What if there are two jars inside a WAR that contain the exact same class? The answer is that the behavior is undetermined and only at runtime one of the two classes will be chosen. Which one gets chosen depends on the internal implementation of the class loader, there is no way to know upfront. But luckily most projects these days use Maven, and Maven solves this problem by ensuring only one version of a given jar is added to the WAR. So a Maven project is immune to this particular type of Jar Hell, right? Why Maven does not prevent classpath duplicates Unfortunately Maven cannot help in all Jar Hell situations. In fact, many Maven projects that don't use certain quality control plugins can have hundreds of duplicate class files on the classpath (I saw trunks with over 500 duplicates). There are several reasons for that: Library publishers occasionally change the artifact name of a jar: This happens due to re-branding or other reasons. Take for example the example of the JAXB jar. There is no way Maven can identify those artifacts as being the same jar! Some jars are published with and without dependencies: Some library providers provide a 'with dependencies' version of a jar, which includes other jars inside. If we have transitive dependencies with the two versions, we will end up with duplicates. Some classes are copied between jars: Some library creators, when faced with the need for a certain class will just grab it from another project and copy it to a new jar without changing the package name. Are all class files duplicates dangerous? If the duplicate class files exist inside the same class loader, and the two duplicate class files are exactly identical then it does not matter which one gets chosen first - this situation is not dangerous. If the two class files are inside the same class loader and they are not identical, then there is no way which one will be chosen at runtime - this is problematic and can manifest itself when deploying to different environments. If the class files are in two different class loaders, then they are never considered identical (see the class identity crisis section further on). How can WAR classpath duplicates be avoided? This problem can be avoided for example by using the Maven Enforcer Plugin, with the extra rule of Ban Duplicate Classes turned on. You can quickly check if your WAR is clean using the JHades WAR duplicate classes report as well. This tool has an option to filter 'harmless' duplicates (same class file size). But even a clean WAR might have deployment problems: Classes missing, classes taken from the server instead of the WAR and thus with the wrong version, class cast exceptions, etc. Debugging the classpath with JHades Classpath problems often show up when the application server is starting up, which is a particularly bad moment specially when deploying to an environment where there is limited access. JHades is a tool to help deal it with Jar Hell (disclaimer: I wrote it). It's a single Jar with no dependencies other than the JDK7 itself. This is an example of how to use it: newJHades() .printClassLoaders() .printClasspath() .overlappingJarsReport() .multipleClassVersionsReport() .findClassByName("org.jhades.SomeServiceImpl") This prints to the screen the class loader chain, jars, duplicate classes, etc. Debugging server startup problems JHades works works well in scenarios where the server does not start properly. A servlet listener is provided that allows to print classpath debugging information even before any other component of the application starts running. ClassCastException and the Class Identity Crisis When troubleshooting Jar Hell, beware of ClassCastExceptions. A class is identified in the JVM not only by it's fully qualified class name, but also by it's class loader. This is counterintuitive but in hindsight makes sense: We can create two different classes with the same package and name, ship them in two jars and put them in two different class loaders. One let's say extends ArrayList and the other is a Map. The classes are therefore completely different (despite the same name) and cannot be cast to each other! The runtime will throw a CCE to prevent this potential error case, because there is no guarantee that the classes are castable. Adding the class loader to the class identifier was the outcome of the Class Identity Crisis that occurred in earlier Java days. A Strategy for Avoiding Classpath Problems This is easier said then done, but the best way to avoid classpath related deployment problems is to run the production server in Parent Last mode. This way the class versions of the WAR take precedence over the ones on the server, and the same classes are used in production and in a developer workstation where it's likely that Tomcat, Jetty or other open source Parent Last server is being used. In certain servers like Websphere, this is not sufficient and you also have to provide special properties on the manifest file to explicitly turn off certain libraries like for example JAX-WS. Fixing the classpath in Java 9 In Java 9 the classpath gets completely revamped with the new Jigsaw modularity system. In Java 9 a jar can be declared as a module and it will run in it's own isolated class loader, that reads class files from other similar module class loaders in an OSGI sort of way. This will allow multiple versions of the same Jar to coexist in the same application if needed. Conclusions In the end, Jar Hell problems are not that low level or unapproachable as they might seem at first. It's all about zip files (jars) being present/ not being present in certain directories, how to find those directories, and how to debug the classpath in environments with limited access. By knowing a limited set of concepts such as Class Loaders, the Class Loader Chain and Parent First / Parent Last modes, these problems can be tackled effectively. External links This presentation Do you really get class loaders from Jevgeni Kabanov of ZeroTurnaround (JRebel company) is a great resource about Jar Hell and the different type of classpath related exceptions.
September 8, 2014
by Vasco Cavalheiro
· 55,024 Views · 7 Likes
article thumbnail
Fibonacci Tutorial with Java 8 Examples: recursive and corecursive
Learn Fibonacci Series patterns and best practices with easy Java 8 source code examples in this outstanding tutorial by Pierre-Yves Saumont
September 5, 2014
by Pierre-Yves Saumont
· 49,719 Views · 6 Likes
article thumbnail
Securing JBoss EAP 6 - Implementing SSL
Security is one of the most important features while running a JBoss server in a production environment. Implementing SSL and securing communications is a must do, to avoid malicious use. This blogs details the steps you could take to secure JBoss EAP 6 running in Domain mode. These are probably documented by RedHat but the documentation seems a bit scattered. The idea behind this blog is to put together everything in one place. In Order to enhance security in JBoss EAP 6, SSL/encryption can be implemented for the following Admin console access – enable https access for admin console Domain Controller – Host controller communication – Communication between the main domain controller and all the other host controllers should be secured. Jboss CLI – enable ssl for the command line interface The below example uses a single keystore being both the key and truststore and also uses CA signed certificates. You could use self-signed certificates and/or separated keystores and truststores if required. Create the keystores (certificates for each of the servers) keytool -genkeypair -alias testServer.prd -keyalg RSA -keysize 2048 -validity 730 -keystore testServer.prd.jks Generate a certificate signing request (CSR) for the Java keystore keytool -certreq -alias testServer.prd -keystore testServer.prd.jks -file testServer.prd.csr Get the CSR signed by the Certificate Authorities Import a root or intermediate CA certificate to the existing Java keystore keytool -import -trustcacerts -alias root -file rootCA.crt -keystore testServer.prd.jks Import the signed primary certificate to the existing Java keystore. Keytool -importcert -keystore testServer.prd.jks -trustcacerts -alias testServer.prd -file testServer.prd.crt Repeat steps 1-6 for each of the servers. In order to establish trust between the master and slave hosts, Import the signed certificates of all the (slave) servers that the Domain Controller must trust onto the Domain Controllers Keystore keytool -importcert -keystore testServer.prd.jks -trustcacerts -alias slaveServer.prd -file slaveServers.prd.crt repeat step for all slave hosts. Import the signed certificate of the Domain controller onto the slave hosts keytool -importcert -keystore slaveServer.prd.jks -trustcacerts -alias testServer.prd -file testServer.prd.crt repeat steps for all slave hosts This has be to done because (as per RedHat’s Documentation) There is a problem with this methodology when trying to configure one way SSL between the servers, because there the HC's and the DC (depending on what action is being performed) switch roles (client, server). Because of this one way SSL configuration will not work and it is recommended that if you need SSL between these two endpoints that you configure two way SSL Once this is done, we now have signed certificates loaded onto the java keystore. In Jboss EAP 6 , the http-interface which provides access to the admin console, by default uses the ManagementRealm to provide file based authentication. (mgmt.-users.properties).The next step is to modify the configurations in the host.xml, to make the ManagementRealm use the certificates we created above. The host.xml should be modified to look like: view source print? 01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. On the Slave hosts, In addition to the above configuration, the following needs to be changed view source print? 1. 2. 3. " 4. 5. Once you make the above changes and restart the servers, you should be able to access the admin console via https. https://testServer.prd:9443/console Finally, in order to secure cli authentication Modify /opt/jboss/jboss-eap-6.1/bin/jboss-cli.xml for each server and add view source print? 01. 02. 03. testServer.prd 04. 05. /opt/jboss/jboss-eap-6.1/domain/configuration/testServer.prd.jks 06. 07. xxxx 08. 09. /opt/jboss/jboss-eap-6.1/domain/configuration/testServer.prd.jks 10. 11. xxxx 12. 13. true 14. 15.
August 28, 2014
by Arvind Anandam
· 11,397 Views
article thumbnail
Setting up Java Applications to Communicate with MongoDB, Kerberos and SSL
By Alex Komyagin, Technical Services Engineer at MongoDB Setting up Kerberos authentication and SSL encryption in a MongoDB Java application is not as simple as other languages. In this post, I’m going to show you how to create a Kerberos and SSL enabled Java application that communicates with MongoDB. My original setup consists of the following: 1) KDC server: kdc.mongotest.com kerberos config file (/etc/krb5.conf): [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] default_realm = MONGOTEST.COM dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true [realms] MONGOTEST.COM = { kdc = kdc.mongotest.com admin_server = kdc.mongotest.com } [domain_realm] .mongotest.com = MONGOTEST.COM mongotest.com = MONGOTEST.COM KDC has the following principals: [email protected] - user principle (for java app) mongodb/[email protected] - service principle (for mongodb server) 2) MongoDB server: rhel64.mongotest.com MongoDB version: 2.6.0 MongoDB config file: dbpath= logpath= fork=true auth = true setParameter = authenticationMechanisms=GSSAPI sslOnNormalPorts = true sslPEMKeyFile = /etc/ssl/mongodb.pem This server also has the global environment variable $KRB5_KTNAME set to the keytab file exported from KDC. Application user is configured in the admin database like this: { "_id" : "[email protected]", "user" : "[email protected]", "db" : "$external", "credentials" : { "external" : true }, "roles" : [ { "role" : "readWrite", "db" : "test" } ] } Download the Java driver: wget http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.12.1/mongo-java-driver-2.12.1.jar Install java and jdk: sudo yum install java-1.7.0 sudo yum install java-1.7.0-devel Create a certificate store for Java and store the server certificate there, so that Java knows who it should trust: keytool -importcert -file mongodb.crt -alias mongoCert -keystore firstTrustStore (mongodb.crt is just a public certificate part of mongodb.pem) Copy kerberos config file to the application server: /etc/krb5.conf or ““C:\WINDOWS\krb5.ini“` (otherwise you’ll have to specify kdc and realm as Java runtime options) Use kinit to store the principal password on the application server: kinit [email protected] As an alternative to kinit, you can use JAAS to cache kerberos credentials. Compile and run the Java program javac -cp ../mongo-java-driver-2.12.1.jar SSLApp.java java -cp .:../mongo-java-driver-2.12.1.jar -Djavax.net.ssl.trustStore=firstTrustStore -Djavax.net.ssl.trustStorePassword=changeme -Djavax.security.auth.useSubjectCredsOnly=false SSLApp It is important to specify useSubjectCredsOnly=false, otherwise you’ll get the “No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)” exception from Java. As we discovered, this is not strictly necessary in all cases, but it is if you are relying on kinit to get the service ticket. The Java driver needs to construct MongoDB service principal name in order to request the Kerberos ticket. The service principal is constructed based on the server name you provide (unless you explicitly asked to canonicalize server name). For example, if I change rhel64.mongotest.com to the host IP address in the connection URI, I would be getting Kerberos exceptions No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]. So be sure you specify the same server host name as you used in the Kerberos principal (). Adding -Dsun.security.krb5.debug=true to Java runtime options helps a lot in debugging kerberos auth issues. These steps should help simplify the process of connecting Java applications with SSL. Before deploying any application with MongoDB, be sure to read through our Security Checklist which outlines recommended security measures to protect your MongoDB installation. More information on configuring MongoDB Security can be found in the MongoDB Manual. For further questions, feel free to reach out to the MongoDB team through google-groups.
August 26, 2014
by Francesca Krihely
· 8,241 Views
article thumbnail
Introducing BIRT iHub F-Type
Actuate recently released a new, free BIRT server called the BIRT iHub F-Type. It incorporates all the functionality of BIRT iHub and is limited only by the capacity of output it can deliver on a daily basis. It is ideal for departmental and smaller scale applications. When BIRT F-Type reaches its maximum output capacity, additional capacity can be purchased on a subscription based model. Some of the key features of BIRT iHub F-Type that will help improve your BIRT content applications are: Interactivity – Allow end-users to modify and personalize reports, and answer questions themselves. Scheduling – Automate report generation based on rules and calendar, and then notify users. Sharing – Secure document management and distribution that allows users to only access content/data they are entitled to. Excel Emitter – Export as native Excel (not CSV) with formulas/pivot tables/worksheets/charts. Integration – JavaScript API to embed dynamic reports and visualizations in your web app. Downloading BIRT iHub F-Type Before we get started with the installation process, we need to download BIRT iHub F-Type. There are three downloads available: Windows, Linux, and a VMware image. This blog will cover the Windows installation. If you’re installing either of the other types, you’ll find links to guides for them at the bottom of this blog post. Once you click on your chosen download, you’ll be asked to register. If you’ve already registered, click the “Click to Login” button. If not, fill out the short registration form to get started. Next, read and accept the license agreement. Once you’ve done that, click the checkbox, and a link for the download will appear. Click that to start your download. At this point, you should also receive an email with an activation code. Be sure to check your spam folder if you don’t see it in your inbox. Installing BIRT iHub F-Type After the download is complete, launch the executable file named ActuateBIRTiHubFType.exe. A welcome message will appear. Press Next to continue. You must read and accept the license agreement on the next screen. Choose a destination folder for the installation. The default is C:\Actuate\BIRTiHub. If you have existing BIRT designs that depend on a JDBC database driver, you can optionally specify the folder where these drivers are located. Press Next to continue. Once the installation has finished, press Finish to launch the BIRT iHub F-Type. A desktop shortcut is also created that points to the iHub F-Type URL at http://localhost:8700/iportal. The first time you launch the BIRT iHub F-Type, you will need to activate it. Enter the activation code that you should have received in an e-mail. After entering a valid activation code, you should receive a message that the code was accepted and the BIRT iHub F-Type should start initializing services. Once that has completed, you will be presented with the login screen. The default user name is “administrator” and the password is blank for your first log in. You’ll be able to change this after you have logged in. Press “Log In” to continue. The first time you launch the BIRT iHub F-Type, you will be in tutorial mode which will help you get started loading your BIRT content and required resources. You can bypass the tutorial mode at any time by pressing the “Exit Tutorial” button at the top right. Select a BIRT design (*.rptdesign) file and press the Upload button. If you don’t have a BIRT design, you can download a sample from the link on the same page. The BIRT design file is automatically inspected and if there are any dependent files needed, like images, data files, BIRT report libraries, CSS styles, or other linked BIRT designs, you will be asked to upload those files as well. Once your BIRT design and dependent files are uploaded, your BIRT report will be displayed in the BIRT iHub F-Type and is now ready to explore. Thanks for reading. Now, it’s time to unleash the full power of BIRT into your application. If you have any questions or comments, please feel free to use the comments section below or visit the BIRT iHub F-Type forum. -Virgil For more blogs in the “Introducing BIRT iHub F-Type” series, see the list below: Installing iHub F-Type: Linux | VMWare Image - See more at: http://blogs.actuate.com/introducing-birt-ihub-f-type-installing-on-windows/#sthash.QPJhv2gw.dpuf
August 6, 2014
by Michael Singer
· 1,900 Views
article thumbnail
JBoss Data Grid: Installation and Development
In this blog, we will discuss one particular data grid platform from Redhat namely JBoss Data Grid (JDG). We will firstly cover how to access and install this data grid platform and then we will demonstrate how to develop and deploy a simple remote client/server data grid application which utilises the HotRod protocol. We will be using the latest release JDG 6.2 from Redhat in this article. Installation Overview To start using JDG, firstly log on to the redhat site https://access.redhat.com/home and download the software from the Downloads section of the site. We wish to download JDG 6.2 server by clicking on the appropriate links in the Downloads section. For future reference, it is also useful to download the quickstart and maven repository zip files. To install JDG, we simply unzip the JDG server package into an appropriate directory in your environment. JDG Overview In this section, we will provide a brief overview of the contents of the JDG installation package and the most notable configuration options available to users. Out of the box, users are provided with two runtime options either to run JDG in standalone or clustered mode. We can start JDG in either mode by invoking the stanadalone or clustered start up scripts in the / bin directory. To configure the JDG in either mode we need to configure the files standalone.xml and clustered.xml. In our case we will creating a distributed cache which will run on 3 node JDG cluster so we will be utilizing the clustered startup script. In order to set up and add new cache instances to JDG, we modify the infinispan subsystems in the appropriate xml configuration file above. We should also note the principal difference between the standalone and clustered configuration file is that in the clustered configuration file there is a JGroups subsystem configured element which allows for communication and messaging between configured cache instances running in a JDG cluster. Development Environment Setup and Configuration In this section, we will detail how to develop and configure a simple datagrid application which will be deployed to a 3 node JDG cluster. We will demonstrate how to configure and deploy a distributed cache in JDG and also show how to develop a HotRod Java client application which will be used to insert, update and display entries in the distributed cache. We will firstly discuss setting a new distributed cache on a 3 node JDG cluster. In this example, we will run our JDG cluster on a single machine by running each JDG instance on different ports. Firstly, we will create 3 instances of JDG by creating 3 directories (server1, server2, server3) on our host machine and unzipping each JDG installation into each directory. We will now configure each node in our cluster by copying and renaming the clustered.xml configuration file in the \server1\jboss-datagrid-6.2.0-server\standalone\configuration directory. We will name each of the cluster configuration files as "clustered1.xml", "clustered2.xml" and "clustered3.xml" for the JDG instances denoted by "server1", "server2" and "server3" respectively. We will now set up a new distributed cache on our JDG cluster by modifying the infinispan subsystem element in each clustered.xml file. We will demonstrate this for the node denoted "server1" here by modifying the file "clustered1.xml". The cache configuration shown here will be the same across all 3 nodes. To setup a new distributed cache named "directory-dist-cache", we configure the following elements in the file named "clustered1.xml" ......... ...... .............. ...... ...... /socket-binding-group> We will discuss the key elements and attributes relating to the configuration above. In the infinispan endpoint subsystem, we will configure hotrod clients to connect to the JDG server instance on socket 11222. The name of the cache container to host each of the cache instances will be held in the container named "clusteredcache". We have configured the infinispan core subsystem to the default cache container named "clusteredcacahe" whereby we will allow for jmx statistics to be collected relating the configured cache entries i.e statistics="true" We have created a new distributed cache named "directory-dist-cache" whereby there will be two copies of each cache entry held on two of the 3 cluster nodes. We have also set up an eviction policy whereby should there be more than 20 entries in our cache then cache entries will be removed using the LRU algorithm We should have configured nodes "server2" and "server3" to start up with a port offset of 100 and 200 respectively by configuring the socketing binding group element appropriately. Please view the socket bindings noted below. To set the socket binding element with a port offset of 100 on "server2", we configure "clustered2.xml" with the following entry: ...... ...... /socket-binding-group> To set the socket binding element with a port offset of 200 on "server3", we configure "clustered3.xml" with the following entry: ...... ...... /socket-binding-group> Before discussing the setup and configuration of our Hotrod client which will be used to interact with our JDG clustered HotRod server, we will start up each server instance to ensure our newly configured JDG distributed cache starts up correctly. Open up 3 Windows or Linux consoles and execute the following start up commands: Console 1: 1) Navigate to \server1\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the first instance of our JDG cluster denoted "server1": clustered -c=clustered1.xml -Djboss.node.name=server1 Console 2: 1) Navigate to \server2\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the second instance of our JDG cluster denoted "server2": clustered -c=clustered2.xml -Djboss.node.name=server2 Console 3: 1) Navigate to \server3\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the third instance of our JDG cluster denoted "server3": clustered -c=clustered3.xml -Djboss.node.name=server3 Providing all 3 JDG instances have started up correctly, you should see output in the console window whereby we can see there are 3 JDG instances in the JGroups view: HotRod Client Development Setup Now that the Hotrod server is up and running, we need to develop a Hotrod Java client which will interact with the clustered server application. The development environment consists of the following tools. 1) JDK Hotspot 1.7.0_45 2) IDE - Eclipse Kepler Build id: 20130919-0819 The HotRod client application is a simple application consisting of two Java classes. The application allows users to retrieve a reference to the distributed cache from the JDG server and then perform these actions: a) add new cinema objects. b) add and remove shows to each cinema object. c) print the list of all cinemas and shows stored in our distributed cache. The source code can be downloaded from github @ https://github.com/davewinters/JDG. We could use maven here to build and execute our application by configuring the maven settings.xml to point to the maven repository files we downloaded earlier and set up a maven project file (pom.xml) to build and execute the client application. In this article we will build our application using the Eclipse IDE and run the client application on the command line. To create a HotRod client application and execute the sample application, one should complete the following steps: 1) Create a new Java Project in Eclipse 2) Create a new package named uk.co.c2b2.jdg.hotrod and import the source code that has been downloaded from Github mentioned previously. 3) Now we need to configure the build path in Eclipse to contain the appropriate JDG client jar files which are required to compile the application. You should include all the client jar files in the project build path. These jar files are contained in the JDG installation zip file. For example on my machine these jar files are located in the directory: \server1\jboss-datagrid-6.2.0-server\client\hotrod\java 4. Providing the Eclipse build path has been configured appropriately, the application source should compile without issue. 5. We will need to execute the Hotrod application by opening the console window and executing the following command. Note the path specified here will differ depending on where the JDG client jar files and application class files are located in your environment: java -classpath ".;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\commons-pool-1.6-redhat-4.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-client-hotrod-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-commons-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-query-dsl-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-remote-query-client-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-logging-3.1.2.GA-redhat-1.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-river-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protobuf-java-2.5.0.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protostream-1.0.0.CR1-redhat-1.jar" uk/co/c2b2/jdg/hotrod/CinemaDirectory 6. The Hotrod client at runtime provides the end user with a number of different options to interact with the distributed cache as we can view from the console window below. Client Application Principal API Details We will not provide a detailed overview of the Hotrod application code however we will describe the principal API and code details briefly. In order to interact with the distributed cache on the JDG cluster using the Hotrod protocol, we will use the RemoteCacheManager Object which will allow us to retrieve a remote reference to the distributed cache. We have initialised a Properties object with the list of JDG instances and the associated with HotRod server port on each instance. We can add Cinema objects into the distributed cache using the RemoteCache.put() method. private RemoteCacheManager cacheManager; private RemoteCache cache; ..... Properties properties = new Properties(); properties.setProperty(ConfigurationProperties.SERVER_LIST, "127.0.0.1:11222;127.0.0.1:11322;127.0.0.1:11422"); cacheManager = new RemoteCacheManager(properties); cache = cacheManager.getCache("directory-dist-cache"); ..... cache.put(cinemaKey, cinemalist); In the webinar below, I describe in further detail how to set up a JDG cluster and how to develop and run the JDG application discussed above. For further details on JDG please visit: http://www.redhat.com/products/jbossenterprisemiddleware/data-grid/ Webinar: Introduction to JBoss Data Grid -- Installation, Configuration and Development In this webinar we will look at the basics of setting up JBoss Data Grid covering installation, configuration and development. We will look at practical examples of storing data, viewing the data in the cache and removing it. We will also take a look at the different clustered modes and what effect these have on the storage of your data:
July 25, 2014
by David Winters
· 16,038 Views
article thumbnail
Option.fold() Considered Unreadable
We had a lengthy discussion recently during code review whether scala.Option.fold() is idiomatic and clever or maybe unreadable and tricky? Let's first describe what the problem is. Option.fold does two things: maps a function f overOption's value (if any) or returns an alternative alt if it's absent. Using simple pattern matching we can implement it as follows: val option: Option[T] = //... def alt: R = //... def f(in: T): R = //... val x: R = option match { case Some(v) => f(v) case None => alt } If you prefer one-liner, fold is actually a combination of map and getOrElse val x: R = option map f getOrElse alt Or, if you are a C programmer that still wants to write in C, but using Scala compiler: val x: R = if (option.isDefined) f(option.get) else alt Interestingly this is similar to how fold() is actually implemented, but that's an implementation detail. OK, all of the above can be replaced with single Option.fold(): val x: R = option.fold(alt)(f) Technically you can even use /: and \: operators (alt /: option) - but that would be simply masochistic. I have three problems with option.fold() idiom. First of all - it's anything but readable. We are folding (reducing) over Option - which doesn't really make much sense. Secondly it reverses the ordinary positive-then-negative-case flow by starting with failure (absence, alt) condition followed by presence block (f function; see also: Refactoring map-getOrElse to fold). Interestingly this method would work great for me if it was named mapOrElse: ** * Hypothetical in Option */ def mapOrElse[B](f: A => B, alt: => B): B = this map f getOrElse alt Actually there is already such method in Scalaz, called OptionW.cata. cata. Here is whatMartin Odersky has to say about it: "I personally find methods like cata that take two closures as arguments are often overdoing it. Do you really gain in readability over map + getOrElse? Think of a newcomer to your code[...]"While cata has some theoretical background, Option.fold just sounds like a random name collision that doesn't bring anything to the table, apart from confusion. I know what you'll say, that TraversableOnce has fold and we are sort-of doing the same thing. Why it's a random collision rather than extending the contract described inTraversableOnce? fold() method in Scala collections typically just delegates to one offoldLeft()/foldRight() (the one that works better for given data structure), thus it doesn't guarantee order and folding function has to be associative. But inOption.fold() the contract is different: folding function takes just one parameter rather than two. If you read my previous article about folds you know that reducing function always takes two parameters: current element and accumulated value (initial value during first iteration). But Option.fold() takes just one parameter: current Option value! This breaks the consistency, especially when realizing Option.foldLeft() andOption.foldRight() have correct contract (but it doesn't mean they are more readable). The only way to understand folding over option is to imagine Option as a sequence with0 or 1 elements. Then it sort of makes sense, right? No. def double(x: Int) = x * 2 Some(21).fold(-1)(double) //OK: 42 None.fold(-1)(double) //OK: -1 but: Some(21).toList.fold(-1)(double) : error: type mismatch; found : Int => Int required: (Int, Int) => Int Some(21).toList.fold(-1)(double) ^ If we treat Option[T] as a List[T], awkward Option.fold() breaks because it has different type than TraversableOnce.fold(). This is my biggest concern. I can't understand why folding wasn't defined in terms of the type system (trait?) and implemented strictly. As an example take a look at: Data.Foldable in Haskell (advanced) Data.Foldable typeclass describes various flavours of folding in Haskell. There are familiar foldl/foldr/foldl1/foldr1, in Scala namedfoldLeft/foldRight/reduceLeft/reduceRight accordingly. They have the same type as Scala and behave unsurprisingly with all types that you can fold over, including Maybe, lists, arrays, etc. There is also a function named fold, but it has a completely different meaning: class Foldable t where fold :: Monoid m => t m -> m While other folds are quite complex, this one barely takes a foldable container of ms (which have to be Monoids) and returns the same Monoid type. A quick recap: a type can be aMonoid if there exists a neutral value of that type and an operation that takes two values and produces just one. Applying that function with one of the arguments being neutral value yields the other argument. String ([Char]) is a good example with empty string being neutral value (mempty) and string concatenation being such operation (mappend). Notice that there are two different ways you can construct monoids for numbers: under addition with neutral value being 0 (x + 0 == 0 + x == x for any x) and under multiplication with neutral 1 (x * 1 == 1 * x == x for any x). Let's stick to strings. If I fold empty list of strings, I'll get an empty string. But when a list contains many elements, they are being concatenated: > fold ([] :: [String]) "" > fold [] :: String "" > fold ["foo", "bar"] "foobar" In the first example we have to explicitly say what is the type of empty list []. Otherwise Haskell compiler can't figure out what is the type of elements in a list, thus which monoid instance to choose. In second example we declare that whatever is returned from fold [], it should be a String. From that the compiler infers that [] actually must have a type of [String]. Last fold is the simplest: the program folds over elements in list and concatenates them because concatenation is the operation defined in Monoid Stringtypeclass instance. Back to options (or more precisely Maybe). Folding over Maybe monad having type parameter being Monoid (I can't believe I just said it) has an interesting interpretation: it either returns value inside Maybe or a default Monoid value: > fold (Just "abc") "abc" > fold Nothing :: String "" Just "abc" is same as Some("abc") in Scala. You can see here that if Maybe Stringis Nothing, neutral String monoid value is returned, that is an empty string. Summary Haskell shows that folding (also over Maybe) can be at least consistent. In ScalaOption.fold is unrelated to List.fold, confusing and unreadable. I advise avoiding it and staying with slightly more verbose map/getOrElse transformations or pattern matching. PS: Did I mention there is also Either.fold() (with even different contract) but noTry.fold()?
June 26, 2014
by Tomasz Nurkiewicz
· 9,579 Views
article thumbnail
Anomaly Detection : A Survey
this post is summary of the “anomaly detection : a survey”. anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. these non-conforming patterns are often referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities or contaminants in different application domains. anomalies are patterns in data that do not conform to a well defined notion of normal behavior. interesting to analyze unwanted noise in the data also can be found in there. novelty detection which aims at detecting previously unobserved (emergent, novel) patterns in the data challenges for anomaly detection drawing the boundary between normal and anomalous behavior availability of labeled data noisy data type of anomaly anomalies can be classified into following three categories point anomalies - an individual data instance can be considered as anomalous with respect to the rest of data contextual anomalies - a data instance is anomalous in a specific context (but not otherwise), then it is termed as a contextual anomaly (also referred as conditional anomaly). each data instance is defined using following two sets of attributes contextual attributes. the contextual attributes are used to determine the context (or neighborhood) for that instance eg: in time- series data, time is a contextual attribute which determines the position of an instance on the entire sequence behavioral attributes. the behavioral attributes define the non-contextual characteristics of an instance eg: in a spatial data set describing the average rainfall of the entire world, the amount of rainfall at any location is a behavioral attribute to explain this we will look into "exchange rate history for converting united states dollar (usd) to sri lankan rupee (lkr)"[1] contextual anomaly t2 in a exchange rate time series. note that the exchange rate at time t1 is same as that at time t2 but occurs in a different context and hence is not considered as an anomaly 3. collective anomalies - a collection of related data instances is anomalous with respect to the entire data set data labels the labels associated with a data instance denote if that instance is normal or anomalous. depending labels availability, anomaly detection techniques can be operated in one of the following three modes supervised anomaly detection - techniques trained in supervised mode assume the availability of a training data set which has labeled instances for normal as well as anomaly class semi-supervised anomaly detection - techniques that operate in a semi-supervised mode, assume that the training data has labeled instances for only the normal class. since they do not require labels for the anomaly class unsupervised anomaly detection - techniques that operate in unsupervised mode do not require training data, and thus are most widely applicable. the techniques implicit assume that normal instances are far more frequent than anomalies in the test data. if this assumption is not true then such techniques suffer from high false alarm rate output of anomaly detection anomaly detection have two types of output techniques scores. scoring techniques assign an anomaly score to each instance in the test data depending on the degree to which that instance is considered an anomaly labels. techniques in this category assign a label (normal or anomalous) to each test instance applications of anomaly detection intrusion detection intrusion detection refers to detection of malicious activity. the key challenge for anomaly detection in this domain is the huge volume of data. thus, semi-supervised and unsupervised anomaly detection techniques are preferred in this domain.denning[3] classifies intrusion detection systems into host based and net- work based intrusion detection systems. host based intrusion detection systems - this deals with operating system call traces network intrusion detection systems - these systems deal with detecting intrusions in network data. the intrusions typically occur as anomalous patterns (point anomalies) though certain techniques model[4] the data in a sequential fashion and detect anomalous subsequences (collective anomalies). a challenge faced by anomaly detection techniques in this domain is that the nature of anomalies keeps changing over time as the intruders adapt their network attacks to evade the existing intrusion detection solutions. fraud detection fraud detection refers to detection of criminal activities occurring in commercial organizations such as banks, credit card companies, insurance agencies, cell phone companies, stock market, etc. the organizations are interested in immediate detection of such frauds to prevent economic losses. detection techniques used for credit card fraud and network intrusion detection as below. statistical profiling using histograms parametric statisti- cal modeling non-parametric sta- tistical modeling bayesian networks neural networks support vector ma- chines rule-based clustering based nearest neighbor based spectral information theoretic here are some domain in fraud detections credit card fraud detection mobile phone fraud detection insurance claim fraud detection insider trading detection medical and public health anomaly detection anomaly detection in the medical and public health domains typically work with pa- tient records. the data can have anomalies due to several reasons such as abnormal patient condition or instrumentation errors or recording errors. thus the anomaly detection is a very critical problem in this domain and requires high degree of accuracy. industrial damage detection such damages need to be detected early to prevent further escalation and losses. fault detection in mechanical units structural defect detection image processing anomaly detection techniques dealing with images are either interested in any changes in an image over time (motion detection) or in regions which appear ab- normal on the static image. this domain includes satellite imagery. anomaly detection in text data anomaly detection techniques in this domain primarily detect novel topics or events or news stories in a collection of documents or news articles. the anomalies are caused due to a new interesting event or an anomalous topic. sensor networks since the sensor data collected from various wireless sensors has several unique characteristics. references [1] http://themoneyconverter.com/usd/lkr.aspx [2] varun chandola, arindam banerjee, and vipin kumar. 2009. anomaly detection: a survey. acm comput. surv. 41, 3, article 15 (july 2009), 58 pages. doi=10.1145/1541880.1541882 http://doi.acm.org/10.1145/1541880.1541882 [3] denning, d. e. 1987. an intrusion detection model. ieee transactions of software engineer-ing 13, 2, 222–232. [4]gwadera, r., atallah, m. j., and szpankowski, w. 2004. detection of significant sets of episodes in event sequences. in proceedings of the fourth ieee international conference on data mining. ieee computer society, washington, dc, usa, 3–10.
June 16, 2014
by Madhuka Udantha
· 12,782 Views
article thumbnail
Introducing Partitioned Collections for MongoDB Applications
TokuMX 1.5 is around the corner. The big feature will be something we discussed briefly when talking about replication changes in 1.4: partitioned collections. Before introducing the feature, I wanted to mention the following. Although TokuMX 1.5 is not available as of this writing, we would love to hear feedback on partitioned collections, which we think are wonderful for time-series data, as I describe below. If you are interested in trying out the feature, email [email protected] for a pre-release version of TokuMX 1.5. What is a partitioned collection? A partitioned collection is analogous to a partitioned table in relational databases. Oracle, MySQL, SQL Server, and Postgres all support partitioned tables. We are happy to bring this functionality to TokuMX. So, if the remainder of this blog is unclear, and you have friends in the office who are familiar with relational databases, you may want to ask them for more information :). Nevertheless, a partitioned collection is a collection that underneath the covers is broken into (or partitioned into) several individual collections, based on ranges of a “partition key”. From the application developer’s point of view, the collection is just another collection. Queries, inserts, updates, and deletes just work with no syntactical changes. Secondary indexes and replication work as well. But underneath the covers, the data will be broken into several collections, with each collection responsible for all data for a range of the partition key. If you are running TokuMX 1.4, a simple example is the oplog, which is a partitioned collection. Any normal query works just fine on the oplog. However, if you look in your data directory, you will see several .tokumx files named “local_oplog_rs_p…”. These files are the individual partitions that break up the data. Each partition stores a range of _id fields in the oplog. Why should I bother using a partitioned collection? This will be its own post with longer examples, but here is a summary. Partitioned collections have two big advantages: Large chunks of data can be deleted very efficiently by dropping partitions. The cost is that of performing an “rm” on some files in the filesystem. This is really fast and efficient. Queries that include the partition key may be isolated to individual partitions, and therefore run faster. This is similar to “query isolation” for shard keys. So, one scenario you may want a partitioned collection for is where the oldest data gets dropped periodically, and many queries benefit from a time based key. That will be a good fit. In short: time series data. If you have a time-series application where you want to keep a rolling period of data (e.g. the last 6 months worth), then using a partitioned collection will be great, and is preferable to using a TTL index or a capped collection. In a future blog post, I will expand on this. How do I use a partitioned collection in TokuMX 1.5? Basically, just like a normal collection, except with some commands added to create a partitioned collection, add partitions, and drop partitions. Below, I explain the shell commands added for this functionality. Our documentation contains the full commands so that they may be called by any driver’s runCommand method. Ok, so how do I create a partitioned collection? The first thing to consider is what your partition key should be. That is, what key do you want to use ranges of to partition your data? This key has similarities with a shard key. It should be a key that can be used to isolate partitions, the way a shard key is used to isolate shards (as explained here). Also, it should be a key that contains a range of data you would like to delete all at once. With time series data, that key will likely be a timestamp. In TokuMX, the partition key is always the primary key. To create a partitioned collection, “foo”, with a timestamp field, “ts”, used for the partitioning run the following: > db.createCollection("foo", { partitioned : 1 , primaryKey : { ts : 1 , _id : 1 } }) { "ok" : 1 } Note that in TokuMX, the primary key must have the _id field appended to it to ensure uniqueness. As a side note, we do not support hash based partitioning, only range based partitioning. Adding partitions? In TokuMX, partitions can only be appended to the end. Individual partitions cannot be split. So, say we have a collection that partitions on the _id field, where all _id’s happen to be integers. Suppose we have three partitions with the following ranges: _id <= 0 0 < _id <= 1000 _id > 1000 With this collection we cannot create a partition with the range 500 < _id <= 1000, because that would split the second partition. All we can do is add a new partition to the end, and “cap” the current last partition with a new maximum value. This new maximum value must be greater than or equal to the primary key (or in this case, _id) of the last partition’s last document. So, if the last partition’s last document has an _id of 2500, we can only partitions that create a range whose maximum is at least 2500. There are two ways to add a partition. The first method peeks at the last document in the current last partition, caps the partition with the primary key of that last document, and creates a new partition. To do so, one does: > db.foo.addPartition() { "ok" : 1 } In the above example, the partitioned collection would now have partitions with the following ranges: _id <= 0 0 < _id <= 1000 1000 < _id <= 2500 _id > 2500 Alternatively, we can specify what the new maximum of the existing last partition may be, provided it is greater than the last document in last partition (which in this example is 2500). To do so, we simply pass in the new maximum as a parameter: > db.foo.addPartition({ _id : 3000 }); { "ok" : 1 } This would make the collection have partitions with the following ranges: _id <= 0 0 < _id <= 1000 1000 < _id <= 3000 _id > 3000 Dropping partitions? Dropping partitions is simple. First, see what the partitions are with the following shell command: > db.foo.getPartitionInfo() { "numPartitions" : NumberLong(4), "partitions" : [ { "_id" : NumberLong(0), "max" : { "_id" : 0 }, "createTime" : ISODate("2014-05-29T01:50:15.839Z") }, { "_id" : NumberLong(1), "max" : { "_id" : 1000 }, "createTime" : ISODate("2014-05-29T01:50:27.049Z") }, { "_id" : NumberLong(2), "max" : { "_id" : 2500 }, "createTime" : ISODate("2014-05-29T01:50:30.549Z") }, { "_id" : NumberLong(3), "max" : { "_id" : { "$maxKey" : 1 } }, "createTime" : ISODate("2014-05-29T01:50:35.903Z") } ], "ok" : 1 } This lists each partition, what the maximum value that each partition may hold (thus defining the range of the partition), and the id of the partition (in the _id field). So, in the example we used for adding partitions, we have four partitions with _ids 0 through 3. To drop a partition, we run the following command and pass the _id of the partition we want to drop. To drop partition 0, we run: > db.foo.dropPartition(0) { "ok" : 1 } Looking at the list of partitions after this operation, we see the partition is dropped: > db.foo.getPartitionInfo() { "numPartitions" : NumberLong(3), "partitions" : [ { "_id" : NumberLong(1), "max" : { "_id" : 1000 }, "createTime" : ISODate("2014-05-29T01:50:27.049Z") }, { "_id" : NumberLong(2), "max" : { "_id" : 2500 }, "createTime" : ISODate("2014-05-29T01:50:30.549Z") }, { "_id" : NumberLong(3), "max" : { "_id" : { "$maxKey" : 1 } }, "createTime" : ISODate("2014-05-29T01:50:35.903Z") } ], "ok" : 1 } This covers how to use partitioned collections. We hope users in the MongoDB ecosystem find this feature as useful as relational database users do. In the comments section below, feel free to leave questions and/or feedback.
June 4, 2014
by Zardosht Kasheff
· 12,443 Views
article thumbnail
Open Session In View Design Tradeoffs
The Open Session in View (OSIV) pattern gives rise to different opinions in the Java development community. Let's go over OSIV and some of the pros and cons of this pattern. The problem The problem that OSIV solves is a mismatch between the Hibernate concept of session and it's lifecycle and the way that many server-side view technologies work. In a typical Java frontend application the service layer starts by querying some of the data needed to build the view. The remaining data needed can be lazy-loaded later, with the condition that the Hibernate session remains open - and there lies the problem. Between the moment that the service layer method finishes it's execution and the moment that the view is rendered, Hibernate has already committed the transaction and closed the session. When the view tries to lazy load the extra data that it needs, if finds the Hibernate session closed, causing a LazyInitializationException. The OSIV solution OSIV tackles this problem by ensuring that the Hibernate session is kept open all the way up to the rendering of the view - hence the name of the pattern. Because the session is kept open, no more LazyInitializationExceptions occur. The session or entity manager is kept open by means of a filter that is added to the request processing chain. In the case of JPA the OpenEntityManagerInViewFilter will create an entity manager at the beginning of the request, and then bind it to the request thread. The service layer will then be executed and the business transaction committed or rolled back, but the transaction manager will not remove the entity manager from the thread after the commit. When the view rendering starts, the transaction manager will then check if there is already an entity manager binded to the thread, and if so use it instead of creating a new one. After the request is processed, the filter will then unbind the entity manager from the thread. The end result is that the same entity manager used to commit the business transaction was kept around in the request thread, allowing the view rendering code to lazy load the needed data. Going back to the original problem Let's step back a moment and go back to the initial problem: the LazyInitializationException. Is this exception really a problem? This exception can also be seen as a warning sign of a wrongly written query in the service layer. When building a view and it's backing services, the developer knows upfront what data is needed, and can make sure that the needed data is loaded before the rendering starts. Several relation types such as one-to-many use lazy-loading by default, but that default setting can be overridden if needed at query time using the following syntax: select p FROM Person p left join fetch p.invoices This means that the lazy loading can be turned off on a case by case basis depending on the data needed by the view. OSIV in projects I've worked In projects I have worked that used OSIV, we could see via query logging that the database was getting hit with a high number of SQL queries, sometimes to the point that developers had to turn off the Hibernate SQL logging. The performance of these application was impacted, but it was kept manageable using second-level caches, and due to the fact that these where intranet-based applications with a limited number of users. Pros of OSIV The main advantage of OSIV is that it makes working with ORM and the database more transparent: Less queries need to be manually written Less awareness is required about the Hibernate session and how to solve LazyInitializationExceptions. Cons of OSIV OSIV seems to be easy to misuse and can accidentally introduce N+1 performance problems in the application. On projects I've worked OSIV did not work out well in the long-term. The alternative of writing custom queries that eager fetch data depending on the use case is manageable and turned out well in other projects I've worked. Alternatives to OSIV Besides the application-level solution of writing custom queries to pre-fetch the needed data, there are other framework-level aproaches to OSIV. The Seam Framework was built by some of the same developers as Hibernate , and solves the problem by introducing the notion of conversation. Can you let me know in the comments bellow your thoughts and experiences with OSIV, thanks for reading.
April 30, 2014
by Vasco Cavalheiro
· 19,105 Views · 3 Likes
article thumbnail
Circuit Breaker Pattern in Apache Camel
Camel is very often used in distributed environments for accessing remote resources. Remote services may fail for various reasons and periods. For services that are temporarily unavailable and recoverable after short period of time, a retry strategy may help. But some services can fail or hang for longer period of time making the calling application unresponsive and slow. A good strategy to prevent from cascading failures and exhaustion of critical resources is the Circuit Breaker pattern described by Michael Nygard in the Release It! book. Circuit Breaker is a stateful pattern that wraps the failure-prone resource and monitors for errors. Initially the Circuit Breaker is in closed state and passes all calls to the wrapped resource. When the failures reaches a certain threshold, the circuit moves to open state where it returns error to the caller without actually calling the wrapped resource. This prevents from overloading the already failing resource. While at this state, we need a mechanism to detect whether the failures are over and start calling the protected resource. This is where the third state called half-open comes into play. This state is reached after a certain time following the last failure. At this state, the calls are passed through to the protected resource, but the result of the call is important. If the call is successful, it is assumed that the protected resource has recovered and the circuit is moved into closed state, and if the call fails, the timeout is reset, and the circuit is moved back to open state where all calls are rejected. Here is the state diagram of Circuit Breaker from Martin Fowler's post: How Circuit Breaker is implemented in Camel? Circuit Breaker is available in the latest snapshot version of Camel as a Load balancer policy. Camel Load Balancer already has policies for Round Robin, Random, Failover, etc. and now also CircuiBreaker policy. Here is an example load balancer that uses Circuit Breaker policy with threshold of 2 errors and halfOpenAfter timeout of 1 second. Notice also that this policy applies only to errors caused by MyCustomException. new RouteBuilder() { public void configure() { from("direct:start").loadBalance() .circuitBreaker(2, 1000L, MyCustomException.class) .to("mock:result"); } }; And here is the same example using Spring XML DSL: MyCustomException
April 16, 2014
by Bilgin Ibryam
· 18,384 Views · 1 Like
article thumbnail
Be a Lazy But Productive Android Developer, Part 5: Image Loading Library
Welcome to part 5 of “Be a lazy but a productive android developer” series. If you are a lazy Android developer and looking for image loading library, which could help you to load image(s) asynchronously without writing a logic for downloading and caching images then this article is for you. This series so far: Part 1: We looked at RoboGuice, a dependency injection library by which we can reduce the boiler plate code, save time and there by achieve productivity during Android app development. Part 2: We saw and explored about Genymotion, which is a rocket speed emulator and super-fast emulator as compared to native emulator. And we can use Genymotion while developing apps and can quickly test apps and there by can achieve productivity. Part 3: We understood and explored about JSON Parsing libraries (GSON and Jackson), using which we can increase app performance, we can decrease boilerplate code and there by can optimize productivity. Part 4: We talked about Card UI and explored card library, also created a basic card and simple card list demo. In this part In this part, we are going to talk about some image libraries using which we can load image(s) asynchronously, can cache images and also can download images into the local storage. Required features for loading images Almost every android app has a need to load remote images. While loading remote images, we have to take care of below things: Image loading process must be done in background (i.e. asynchronously) to avoid blocking UI main thread. Image recycling image should be done. Image should be displayed once its loaded successfully. Images should be cached in local memory for the later use. If remote image gets failed (due to network connection or bad url or any other reasons) to load then it should be managed perfectly for avoiding duplicate requests to load the same again, instead it should load if and only if net connection is available. Memory management should be done efficiently. In short, we have to write a code to manage each and every aspects of image loading but there are some awesome libraries available, using which we can load/download image asynchronously. We just have to call the load image method and success/failure callbacks. Asynchronous image loading Consider a case where we are having 50 images and 50 titles and we try to load all the images/text into the listview, it won’t display anything until all the images get downloaded. Here Asynchronous image loading process comes in picture. Asynchronous image loading is nothing but a loading process which happens in background so that it doesn’t block main UI thread and let user to play with other loaded data on the screen. Images will be getting displayed as and when it gets downloaded from background threads. Asynchronous image loading libraries Nostra’s Universal Image loader – https://github.com/nostra13/Android-Universal-Image-Loader Picasso – http://square.github.io/picasso/ UrlImageViewHelper by Koush Volley - By Android team members @ Google Novoda’s Image loader – https://github.com/novoda/ImageLoader Let’s have a look at examples using Picasso and Universal Image loader libraries. Example 1: Nostra’s Universal Image loader Step 1: Initialize ImageLoader configuration ? public class MyApplication extends Application{ @Override public void onCreate() { // TODO Auto-generated method stub super.onCreate(); // Create global configuration and initialize ImageLoader with this configuration ImageLoaderConfiguration config = new ImageLoaderConfiguration.Builder(getApplicationContext()).build(); ImageLoader.getInstance().init(config); } } Step 2: Declare application class inside Application tag in AndroidManifest.xml file ? Step 3: Load image and display into ImageView ? ImageLoader.getInstance().displayImage(objVideo.getThumb(), holder.imgVideo); Now, Universal Image loader also provides a functionality to implement success/failure callback to check whether image loading is failed or successful. ? ImageLoader.getInstance().displayImage(photoUrl, imgView, new ImageLoadingListener() { @Override public void onLoadingStarted(String arg0, View arg1) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.VISIBLE); } @Override public void onLoadingFailed(String arg0, View arg1, FailReason arg2) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } @Override public void onLoadingComplete(String arg0, View arg1, Bitmap arg2) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } @Override public void onLoadingCancelled(String arg0, View arg1) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } }); Example 2: Picasso Image loading straight way: ? Picasso.with(context).load("http://postimg.org/image/wjidfl5pd/").into(imageView); Image re-sizing: ? Picasso.with(context) .load(imageUrl) .resize(100, 100) .centerCrop() .into(imageView) Example 3: UrlImageViewHelper library It’s an android library that sets an ImageView’s contents from a url, manages image downloading, caching, and makes your coffee too. UrlImageViewHelper will automatically download and manage all the web images and ImageViews. Duplicate urls will not be loaded into memory twice. Bitmap memory is managed by using a weak reference hash table, so as soon as the image is no longer used by you, it will be garbage collected automatically. Image loading straight way: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png"); Placeholder image when image is being downloaded: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png", R.drawable.loadingPlaceHolder); Cache images for a minute only: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png", null, 60000); Example 4: Volley library Yes Volley is a library developed and being managed by some android team members at Google, it was announced by Ficus Kirkpatrick during the last I/O. I wrote an article about Volley library 10 months back , read it and give it a try if you haven’t used it yet. Let’s look at an example of image loading using Volley. Step 1: Take a NetworkImageView inside your xml layout. ? Step 2: Define a ImageCache class Yes you are reading title perfectly, we have to define an ImageCache class for initializing ImageLoader object. ? public class BitmapLruCache extends LruCache implements ImageLoader.ImageCache { public BitmapLruCache() { this(getDefaultLruCacheSize()); } public BitmapLruCache(int sizeInKiloBytes) { super(sizeInKiloBytes); } @Override protected int sizeOf(String key, Bitmap value) { return value.getRowBytes() * value.getHeight() / 1024; } @Override public Bitmap getBitmap(String url) { return get(url); } @Override public void putBitmap(String url, Bitmap bitmap) { put(url, bitmap); } public static int getDefaultLruCacheSize() { final int maxMemory = (int) (Runtime.getRuntime().maxMemory() / 1024); final int cacheSize = maxMemory / 8; return cacheSize; } } Step 3: Create an ImageLoader object and load image Create an ImageLoader object and initialize it with ImageCache object and RequestQueue object. ? ImageLoader.ImageCache imageCache = new BitmapLruCache(); ImageLoader imageLoader = new ImageLoader(Volley.newRequestQueue(context), imageCache); Step 4: Load an image into ImageView ? NetworkImageView imgAvatar = (NetworkImageView) findViewById(R.id.imgDemo); imageView.setImageUrl(url, imageLoader); Which library to use? Can you decide which library you would use? Let us know which and what are the reasons? Selection of the library is always depends on the requirement. Let’s look at the few fact points about each library so that you would able to compare exactly and can take decision. Picasso: It’s just a one liner code to load image using Picasso. No need to initialize ImageLoader and to prepare a singleton instance of image loader. Picasso allows you to specify exact target image size. It’s useful when you have memory pressure or performance issues, you can trade off some image quality for speed. Picasso doesn’t provide a way to prepare and store thumbnails of local images. Sometimes you need to check image loading process is in which state, loading, finished execution, failed or cancelled image loading. Surprisingly It doesn’t provide a callback functionality to check any state. “fetch()” dose not pass back anything. “get()” is for synchronously read, and “load()” is for asynchronously draw a view. Universal Image loader (UIL): It’s the most popular image loading library out there. Actually, it’s based on the Fedor Vlasov’s project which was again probably a very first complete solution and also a most voted answer (for the image loading solution) on Stackoverflow. UIL library is better in documentation and even there’s a demo example which highlights almost all the features. UIL provides an easy way to download image. UIL uses builders for customization. Almost everything can be configured. UIL doesn’t not provide a way to specify image size directly you want to load into a view. It uses some rules based on the size of the view. Indirectly you can do it by mentioning ImageSize argument in the source code and bypass the view size checking. It’s not as flexible as Picasso. Volley: It’s officially by Android dev team, Google but still it’s not documented. It’s just not an image loading library only but an asynchronous networking library Developer has to define ImageCache class their self and has to initialize ImageLoader object with RequestQueue and ImageCache objects. So now I am sure now you can be able to compare libraries. Choosing library is a bit difficult talk because it always depends on the requirement and type of projects. If the project is large then you should go for Picasso or Universal Image loader. If the project is small then you can consider to use Volley librar, because Volley isn’t an image loading library only but it tries to solve a more generic solution.). I suggest you to start with Picasso. If you want more control and customization, go for UIL. Read more: http://blog.bignerdranch.com/3177-solving-the-android-image-loading-problem-volley-vs-picasso/ http://stackoverflow.com/questions/19995007/local-image-caching-solution-for-android-square-picasso-vs-universal-image-load https://plus.google.com/103583939320326217147/posts/bfAFC5YZ3mq Hope you liked this part of “Lazy android developer: Be productive” series. Till the next part, keep exploring image loading libraries mentioned above and enjoy!
April 11, 2014
by Paresh Mayani
· 63,969 Views · 2 Likes
article thumbnail
Measuring Code Coverage by Protractor End-to-End Tests
Was just setting up new JavaScript project based on Grunt. I scaffolded the project template by Yeoman with usage of angular-fullstack generator. I decided to try MEAN stack without MongoDB for my new project (DB isn't needed). Next step was integrating Require.JS and configuring measurement of code coverage on client and server by Instanbul. When was this all done I was wondering if it is possible to measure code coverage by Protractor end-to-end testing. After quick search I found that Ryan Bridges recently released grunt-protractor-coverage plugin. Interesting coincidence. So I decided to try it and can confirm that it's working fine with mentioned stack. Configuration was smooth and Ryan fixed small issue very promptly. It's based on grunt-protractor-runner plugin. I created separate Grunt configuration file just for this purpose not to mess around with normal build. I had also problems to run 'makeReport' task of grunt-istanbul plugin for two different directories (Mocha server side code coverage measurement is using same task). So here is the Grunt flow. First we need to copy non JavaScript files into target directory. It needs to be in separate directory because JavaScript files will need to be instrumented. copy: { coverageE2E: { files: [{ expand: true, dot: true, cwd: '', dest: '/app', src: [ '*.{ico,png,txt}', '.htaccess', 'bower_components/**/*', 'images/**/*', 'fonts/**/*', 'views/**/*', 'styles/**/*', ] }] }, }, Next step is instrumentation of the code. It is needed for gathering coverage stats. Each line is decorated by special instructions that helps during measurement. Pay attention to fact that we are instrumenting server and client side code. Instrumented code is placed into target directory represented by placeholder . instrument: { files: ['server/**/*.js', 'app/scripts/**/*.js'], options: { lazy: true, basePath: '/' } }, Next we start Express from target directory. express: { options: { port: process.env.PORT || 9000 }, coverageE2E: { options: { script: '/lib/server.js', debug: true } }, }, And the protractor_coverage task of grunt-protractor-coverage plugin. Configuration should be the same as for grunt-protractor-runner plugin. protractor_coverage: { options: { configFile: 'test/protractor/protractorConf.js', // Default config file keepAlive: true, // If false, the grunt process stops when the test fails. noColor: false, // If true, protractor will not use colors in its output. coverageDir: '', args: {} }, chrome: { options: { args: { baseUrl: 'http://localhost:3000/', // Arguments passed to the command 'browser': 'chrome' } } }, }, Last step is generation of coverage report. makeReport: { src: '/*.json', options: { type: 'html', dir: '/reports', print: 'detail' } }, Finally, this is grunt task gathering all steps. grunt.registerTask('default', [ 'clean:coverageE2E', 'copy:coverageE2E', 'instrument', 'express:coverageE2E', 'protractor_coverage:chrome', 'makeReport' ]); EDIT: Notice that following Github link was changed to branch, because project structure was significantly changed: Source code for this project can be found on GitHub. To run this Grunt configuration file: grunt --gruntfile Gruntfile-e2e.js For running end to end Protractor test you have to have webdriver-manager running. See Protractor documentation.
April 9, 2014
by Lubos Krnac
· 17,692 Views · 1 Like
  • Previous
  • ...
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×