DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Performance Topics

article thumbnail
Ode to a Workstation
Every now and then I get work done in the home office. I’ve written previously about my setup, but after churning out some solution design today, I sat back and really took some time to appreciate the workspace. I’m really pleased with the configuration, it’s probably the best setup I’ve had in years. The desk is a former QLD police desk from the 1940s, so it wasn’t built for modern computers – not a problem, the cables run down the back which is just a minor annoyance. The keyboard and mouse are gaming varieties so that they perform well – the old Sennheiser (RF) wireless headset has been with me since 2006 and still works very well. The wooden clock (recently reviewed) acts as external speakers, a Bluetooth receiver and has a built in microphone so it can be used as a hands-free option for conference calls. It also features Qi wireless charging capability and also features a thermostat. Under the second monitor is a HDD caddy which supports USB3, and features 4 bays which can be used in parallel. I try to keep the desk reasonably neat, and there’s plenty of space so it doesn’t get too cluttered. I have a nice view out the window to a small courtyard which gets early morning sun.
June 21, 2015
by Rob Sanders
· 1,061 Views
article thumbnail
How to to Backup Linux with Snapshots
While working on different web projects I have accumulated a large pool of tools and services to facilitate the work of developers, system administrators and DevOps One of the first challenges, that every developer faces at the end of each project is backup configuration and maintenance of media files, UGC, databases, application and servers' data (e.g. configuration files). Nowadays, there are a lot of solutions to make a snapshot backup of the entire server, and I decided to make a list of most convinient and really useful tools and services. rsync - http://linux.die.net/man/1/rsync Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It's a build in Linux tool. Real hardcore =) rsnapshot - http://rsnapshot.org/ rsnapshot is a filesystem snapshot utility for making backups of local and remote systems. Using rsync and hard links, it is possible to keep multiple, full backups instantly available. The disk space required is just a little more than the space of one full backup, plus incrementals. Depending on your configuration, it is quite possible to set up in just a few minutes. Files can be restored by the users who own them, without the root user getting involved. Stackoverflow users recommended it to me couple of years ago and I thinks that is a really good solutions, which unites the best from rsync and Linux filesystem. Snapper - http://snapper.io/ Snapper is a tool for Linux filesystem snapshot management. Apart from the obvious creation and deletion of snapshots, it can compare snapshots and revert differences between snapshots. In simple terms, this allows root and non-root users to view older versions of files and revert changes. Allows you to configure schedule for the backups, automatically deletes old snapshots. The only sad thing is that Snapper has no updates since 2014. backup2l - http://backup2l.sourceforge.net/ backup2l is a lightweight command line tool for generating, maintaining and restoring backups on a mountable file system (e. g. hard disk). The main design goals are are low maintenance effort, efficiency, transparency and robustness. In a default installation, backups are created autonomously by a cron script. The script, that I've found on sourceforge and even used on couple of my projects 5 years ago. But it is not being updated since 2009. FlyBack - https://www.linuxlinks.com/flyback/ FlyBack is software for system backup and restore, which offers similar functionality to the Mac OS X Leopard's Time Machine. Linux has almost all of the required technology already built in to recreate it. FlyBack is a snapshot-based backup tool based on rsync. It creates successive backup directories mirroring the files users want to backup, but hard-links unchanged files to the previous backup. Plenty of settings, mostly build for desktop computers, simple UI. TimeVault - https://wiki.ubuntu.com/TimeVault TimeVault monitors files for changes and takes snapshots after some user-specified delay. It is a simple front-end for making snapshots of a set of directories. Snapshots are a copy of a directory structure or file at a certain point in time. They use very little space for files which have not changed since the last snapshot was made, as they use hard links that point to existing backups. TimeVault makes all the work silently in background and is fully automated solution. Currently it gets no updates but it was a good solution when it just released. Box Backup - http://www.boxbackup.org/ Box Backup is an open source, completely automatic, on-line backup system. A backup daemon runs on systems to be backed up, and copies encrypted data to the server when it notices changes - so backups are continuous and up-to-date (although traditional snapshot backups are possible too). All backed up data is stored on the server in files on a filesystem - no tape, archive or other special devices are required. BitCalm - https://bitcalm.com BitCalm makes it easy for web developers to set up backup of applications on Linux servers just in one minute. It is SaaS for server backups. After installing python client user can manage backups for files and even databases in web-interface. Service provides Amazon S3 as a storage and allows users to connect their own storage for backups. All backups are incremental. Service is built for servers and supports all popular Linux based OS: Ubuntu, Debian, CentOS, ArchLinux. To let user be calm, service sends daily reports and notifications. BitCalm allows to manage multiple backups in a single account and user can restore the backup to any server added to service.
June 11, 2015
by Tom Cooper
· 52,185 Views · 1 Like
article thumbnail
Top 80 Thread- Java Interview Questions and Answers (Part 2)
PART 1 > THREADS - Top 80 interview questions and answers (detailed explanation with programs) Question 61. class MyRunnable implements Runnable{ public void run(){ for(int i=0;i<3;i++){ System.out.println("i="+i+" ,ThreadName="+Thread.currentThread().getName()); } } } public class MyClass { public static void main(String...args){ MyRunnable runnable=new MyRunnable(); System.out.println("start main() method"); Thread thread1=new Thread(runnable); Thread thread2=new Thread(runnable); thread1.start(); thread2.start(); System.out.println("end main() method"); } } Answer. Thread behaviour is unpredictable because execution of Threads depends on Thread scheduler, start main() method will be the printed first, but after that we cannot guarantee the order of thread1, thread2 and main thread they might run simultaneously or sequentially, so order of end main() method will not be guaranteed. /*OUTPUT start main() method end main() method i=0 ,ThreadName=Thread-0 i=0 ,ThreadName=Thread-1 i=1 ,ThreadName=Thread-0 i=2 ,ThreadName=Thread-0 i=1 ,ThreadName=Thread-1 i=2 ,ThreadName=Thread-1 */ Question 62. class MyRunnable implements Runnable{ public void run(){ for(int i=0;i<3;i++){ System.out.println("i="+i+" ,ThreadName="+Thread.currentThread().getName()); } } } public class MyClass { public static void main(String...args) throws InterruptedException{ System.out.println("In main() method"); MyRunnable runnable=new MyRunnable(); Thread thread1=new Thread(runnable); Thread thread2=new Thread(runnable); thread1.start(); thread1.join(); thread2.start(); thread2.join(); System.out.println("end main() method"); } } Answer. We use join() methodto ensure all threads that started from main must end in order in which they started and also main should end in last. In other words join() method waited for this thread to die. /*OUTPUT In main() method i=0 ,ThreadName=Thread-0 i=1 ,ThreadName=Thread-0 i=2 ,ThreadName=Thread-0 i=0 ,ThreadName=Thread-1 i=1 ,ThreadName=Thread-1 i=2 ,ThreadName=Thread-1 end main() method */ Question 63. class MyRunnable implements Runnable { public void run() { try { while (!Thread.currentThread().isInterrupted()) { Thread.sleep(1000); System.out.println("x"); } } catch (InterruptedException e) { System.out.println(Thread.currentThread().getName() + " ENDED"); } } } public class MyClass { public static void main(String args[]) throws Exception { MyRunnable obj = new MyRunnable(); Thread t = new Thread(obj, "Thread-1"); t.start(); System.out.println("press enter"); System.in.read(); t.interrupt(); } } Answer. "press enter" will be printed first then thread1 will keep on printing x until enter is pressed, once enter is pressed "Thread-1 ENDED" will be printed. System.in.read() causes main thread to go from running to waiting state (thread waits for user input) /* OUTPUT press enter x x x x Thread-1 ENDED */ Question 64. class MyRunnable implements Runnable{ public void run(){ synchronized (this) { System.out.println("1 "); try { this.wait(); System.out.println("2 "); } catch (InterruptedException e) { e.printStackTrace(); } } } } public class MyClass { public static void main(String[] args) { MyRunnable myRunnable=new MyRunnable(); Thread thread1=new Thread(myRunnable,"Thread-1"); thread1.start(); } } Answer. Thread acquires lock on myRunnable object so 1 was printed but notify wasn't called so 2 will never be printed, this is called frozen process. Deadlock is formed, These type of deadlocksare called Frozen processes. /*OUTPUT 1 */ Question 65. import java.util.ArrayList; /* Producer is producing, Producer will allow consumer to * consume only when 10 products have been produced (i.e. when production is over). */ class Producer implements Runnable{ ArrayList sharedQueue; Producer(){ sharedQueue=new ArrayList(); } @Override public void run(){ synchronized (this) { for(int i=1;i<=3;i++){ //Producer will produce 10 products sharedQueue.add(i); System.out.println("Producer is still Producing, Produced : "+i); try{ Thread.sleep(1000); }catch(InterruptedException e){e.printStackTrace();} } System.out.println("Production is over, consumer can consume."); this.notify(); } } } class Consumer extends Thread{ Producer prod; Consumer(Producer obj){ prod=obj; } public void run(){ synchronized (this.prod) { System.out.println("Consumer waiting for production to get over."); try{ this.prod.wait(); }catch(InterruptedException e){e.printStackTrace();} } int productSize=this.prod.sharedQueue.size(); for(int i=0;i Q61- Q80
June 6, 2015
by Ankit Mittal
· 13,683 Views · 3 Likes
article thumbnail
Top 80 Thread- Java Interview Questions and Answers (Part 1)
Question 1. What is Thread in java? Answer. Threads consumes CPU in best possible manner, hence enables multi processing. Multi threading reduces idle time of CPU which improves performance of application. Thread are light weight process. A thread class belongs to java.lang package. We can create multiple threads in java, even if we don’t create any Thread, one Thread at least do exist i.e. main thread. Multiple threads run parallely in java. Threads have their own stack. Advantage of Thread : Suppose one thread needs 10 minutes to get certain task, 10 threads used at a time could complete that task in 1 minute, because threads can run parallely. Question 2. What is difference between Process and Thread in java? Answer. One process can have multiple Threads, Thread are subdivision of Process. One or more Threads runs in the context of process. Threads can execute any part of process. And same part of process can be executed by multiple Threads. Processes have their own copy of the data segment of the parent process while Threads have direct access to the data segment of its process. Processes have their own address while Threads share the address space of the process that created it. Process creation needs whole lot of stuff to be done, we might need to copy whole parent process, but Thread can be easily created. Processes can easily communicate with child processes but interprocess communication is difficult. While, Threads can easily communicate with other threads of the same process using wait() and notify() methods. In process all threads share system resource like heap Memory etc. while Thread has its own stack. Any change made to process does not affect child processes, but any change made to thread can affect the behavior of the other threads of the process. Example to see where threads on are created on different processes and same process. Question 3. How to implement Threads in java? Answer. This is very basic threading question. Threads can be created in two ways i.e. by implementing java.lang.Runnable interface or extending java.lang.Thread class and then extending run method. Thread has its own variables and methods, it lives and dies on the heap. But a thread of execution is an individual process that has its own call stack. Thread are lightweight process in java. Thread creation by implementingjava.lang.Runnableinterface. We will create object of class which implements Runnable interface : MyRunnable runnable=new MyRunnable(); Thread thread=new Thread(runnable); 2) And then create Thread object by calling constructor and passing reference of Runnable interface i.e. runnable object : Thread thread=new Thread(runnable); Question 4 . Does Thread implements their own Stack, if yes how? (Important) Answer. Yes, Threads have their own stack. This is very interesting question, where interviewer tends to check your basic knowledge about how threads internally maintains their own stacks. I’ll be explaining you the concept by diagram. Question 5. We should implement Runnable interface or extend Thread class. What are differences between implementing Runnable and extending Thread? Answer. Well the answer is you must extend Thread only when you are looking to modify run() and other methods as well. If you are simply looking to modify only the run() method implementing Runnable is the best option (Runnable interface has only one abstract method i.e. run() ). Differences between implementing Runnable interface and extending Thread class - Multiple inheritance in not allowed in java : When we implement Runnable interface we can extend another class as well, but if we extend Thread class we cannot extend any other class because java does not allow multiple inheritance. So, same work is done by implementing Runnable and extending Thread but in case of implementing Runnable we are still left with option of extending some other class. So, it’s better to implement Runnable. Thread safety : When we implement Runnable interface, same object is shared amongst multiple threads, but when we extend Thread class each and every thread gets associated with new object. Inheritance (Implementing Runnable is lightweight operation) : When we extend Thread unnecessary all Thread class features are inherited, but when we implement Runnable interface no extra feature are inherited, as Runnable only consists only of one abstract method i.e. run() method. So, implementing Runnable is lightweight operation. Coding to interface : Even java recommends coding to interface. So, we must implement Runnable rather than extending thread. Also, Thread class implements Runnable interface. Don’t extend unless you wanna modify fundamental behaviour of class, Runnable interface has only one abstract method i.e. run() : We must extend Thread only when you are looking to modify run() and other methods as well. If you are simply looking to modify only the run() method implementing Runnable is the best option (Runnable interface has only one abstract method i.e. run() ). We must not extend Thread class unless we're looking to modify fundamental behaviour of Thread class. Flexibility in code when we implement Runnable : When we extend Thread first a fall all thread features are inherited and our class becomes direct subclass of Thread , so whatever action we are doing is in Thread class. But, when we implement Runnable we create a new thread and pass runnable object as parameter,we could pass runnable object to executorService & much more. So, we have more options when we implement Runnable and our code becomes more flexible. ExecutorService : If we implement Runnable, we can start multiple thread created on runnable object with ExecutorService (because we can start Runnable object with new threads), but not in the case when we extend Thread (because thread can be started only once). Question 6. How can you say Thread behaviour is unpredictable? (Important) Answer. The solution to question is quite simple, Thread behaviour is unpredictable because execution of Threads depends on Thread scheduler, thread scheduler may have different implementation on different platforms like windows, unix etc. Same threading program may produce different output in subsequent executions even on same platform. To achieve we are going to create 2 threads on same Runnable Object, create for loop in run() method and start both threads. There is no surety that which threads will complete first, both threads will enter anonymously in for loop. Question 7 . When threads are not lightweight process in java? Answer. Threads are lightweight process only if threads of same process are executing concurrently. But if threads of different processes are executing concurrently then threads are heavy weight process. Question 8. How can you ensure all threads that started from main must end in order in which they started and also main should end in last? (Important) Answer. Interviewers tend to know interviewees knowledge about Thread methods. So this is time to prove your point by answering correctly. We can use join() methodto ensure all threads that started from main must end in order in which they started and also main should end in last.In other words waits for this thread to die. Calling join() method internally calls join(0); DETAILED DESCRIPTION : Join() method - ensure all threads that started from main must end in order in which they started and also main should end in last. Types of join() method with programs- 10 salient features of join. Question 9.What is difference between starting thread with run() and start() method? (Important) Answer. This is quite interesting question, it might confuse you a bit and at time may make you think is there really any difference between starting thread with run() and start() method. When you call start() method, main thread internally calls run() method to start newly created Thread, so run() method is ultimately called by newly created thread. When you call run() method main thread rather than starting run() method with newly thread it start run() method by itself. Question 10. What is significance of using Volatile keyword? (Important) Answer. Java allows threads to access shared variables. As a rule, to ensure that shared variables are consistently updated, a thread should ensure that it has exclusive use of such variables by obtaining a lock that enforces mutual exclusion for those shared variables. If a field is declared volatile, in that case the Java memory model ensures that all threads see a consistent value for the variable. Few small questions> Q. Can we have volatile methods in java? No, volatile is only a keyword, can be used only with variables. Q. Can we have synchronized variable in java? No, synchronized can be used only with methods, i.e. in method declaration. Question 11. Differences between synchronized and volatile keyword in Java? (Important) Answer.Its very important question from interview perspective. Volatilecan be used as a keyword against the variable, we cannot use volatile against method declaration. volatile void method1(){} //it’s illegal, compilation error. While synchronization can be used in method declaration or we can create synchronization blocks (In both cases thread acquires lock on object’s monitor). Variables cannot be synchronized. Synchronized method: synchronized void method2(){} //legal Synchronized block: void method2(){ synchronized (this) { //code inside synchronized block. } } Synchronized variable (illegal): synchronized int i;//it’s illegal, compilatiomn error. Volatile does not acquire any lock on variable or object, but Synchronization acquires lock on method or block in which it is used. Volatile variables are not cached, but variables used inside synchronized method or block are cached. When volatile is used will never create deadlock in program, as volatile never obtains any kind of lock . But in case if synchronization is not done properly, we might end up creating dedlock in program. Synchronization may cost us performance issues, as one thread might be waiting for another thread to release lock on object. But volatile is never expensive in terms of performance. DETAILED DESCRIPTION : Differences between synchronized and volatile keyword in detail with programs. Question 12. Can you again start Thread? Answer.No, we cannot start Thread again, doing so will throw runtimeException java.lang.IllegalThreadStateException. The reason is once run() method is executed by Thread, it goes into dead state. Let’s take an example- Thinking of starting thread again and calling start() method on it (which internally is going to call run() method) for us is some what like asking dead man to wake up and run. As, after completing his life person goes to dead state. Question 13. What is race condition in multithreading and how can we solve it? (Important) Answer. This is very important question, this forms the core of multi threading, you should be able to explain about race condition in detail. When more than one thread try to access same resource without synchronization causes race condition. So we can solve race condition by using either synchronized block or synchronized method. When no two threads can access same resource at a time phenomenon is also called as mutual exclusion. Few sub questions> What if two threads try to read same resource without synchronization? When two threads try to read on same resource without synchronization, it’s never going to create any problem. What if two threads try to write to same resource without synchronization? When two threads try to write to same resource without synchronization, it’s going to create synchronization problems. Question 14. How threads communicate between each other? Answer. This is very must know question for all the interviewees, you will most probably face this question in almost every time you go for interview. Threads can communicate with each other by using wait(), notify() and notifyAll() methods. Question 15. Why wait(), notify() and notifyAll() are in Object class and not in Thread class? (Important) Answer. Every Object has a monitor, acquiring that monitors allow thread to hold lock on object. But Thread class does not have any monitors. wait(), notify() and notifyAll()are called on objects only >When wait() method is called on object by thread it waits for another thread on that object to release object monitor by calling notify() or notifyAll() method on that object. When notify() method is called on object by thread it notifies all the threads which are waiting for that object monitor that object monitor is available now. So, this shows that wait(), notify() and notifyAll() are called on objects only. Now, Straight forward question that comes to mind is how thread acquires object lock by acquiring object monitor? Let’s try to understand this basic concept in detail? Wait(), notify() and notifyAll() method being in Object class allows all the threads created on that object to communicate with other. . As multiple threads exists on same object. Only one thread can hold object monitor at a time. As a result thread can notify other threads of same object that lock is available now. But, thread having these methods does not make any sense because multiple threads exists on object its not other way around (i.e. multiple objects exists on thread). Now let’s discuss one hypothetical scenario, what will happen if Thread class contains wait(), notify() and notifyAll() methods? Having wait(), notify() and notifyAll() methods means Thread class also must have their monitor. Every thread having their monitor will create few problems - >Thread communication problem. >Synchronization on object won’t be possible- Because object has monitor, one object can have multiple threads and thread hold lock on object by holding object monitor. But if each thread will have monitor, we won’t have any way of achieving synchronization. >Inconsistency in state of object (because synchronization won't be possible). Question 16. Is it important to acquire object lock before calling wait(), notify() and notifyAll()? Answer.Yes, it’s mandatory to acquire object lock before calling these methods on object. As discussed above wait(), notify() and notifyAll() methods are always called from Synchronized block only, and as soon as thread enters synchronized block it acquires object lock (by holding object monitor). If we call these methods without acquiring object lock i.e. from outside synchronize block then java.lang. IllegalMonitorStateException is thrown at runtime. Wait() method needs to enclosed in try-catch block, because it throws compile time exception i.e. InterruptedException. Question 17. How can you solve consumer producer problem by using wait() and notify() method? (Important) Answer. Here come the time to answer very very important question from interview perspective. Interviewers tends to check how sound you are in threads inter communication. Because for solving this problem we got to use synchronization blocks, wait() and notify() method very cautiously. If you misplace synchronization block or any of the method, that may cause your program to go horribly wrong. So, before going into this question first i’ll recommend you to understand how to use synchronized blocks, wait() and notify() methods. Key points we need to ensure before programming : >Producer will produce total of 10 products and cannot produce more than 2 products at a time until products are being consumed by consumer. Example> when sharedQueue’s size is 2, wait for consumer to consume (consumer will consume by calling remove(0) method on sharedQueue and reduce sharedQueue’s size). As soon as size is less than 2, producer will start producing. >Consumer can consume only when there are some products to consume. Example> when sharedQueue’s size is 0, wait for producer to produce (producer will produce by calling add() method on sharedQueue and increase sharedQueue’s size). As soon as size is greater than 0, consumer will start consuming. Explanation of Logic > We will create sharedQueue that will be shared amongst Producer and Consumer. We will now start consumer and producer thread. Note: it does not matter order in which threads are started (because rest of code has taken care of synchronization and key points mentioned above) First we will start consumerThread > consumerThread.start(); consumerThread will enter run method and call consume() method. There it will check for sharedQueue’s size. -if size is equal to 0 that means producer hasn’t produced any product, wait for producer to produce by using below piece of code- synchronized (sharedQueue) { while (sharedQueue.size() == 0) { sharedQueue.wait(); } } -if size is greater than 0, consumer will start consuming by using below piece of code. synchronized (sharedQueue) { Thread.sleep((long)(Math.random() * 2000)); System.out.println("consumed : "+ sharedQueue.remove(0)); sharedQueue.notify(); } Than we will start producerThread > producerThread.start(); producerThread will enter run method and call produce() method. There it will check for sharedQueue’s size. -if size is equal to 2 (i.e. maximum number of products which sharedQueue can hold at a time), wait for consumer to consume by using below piece of code- synchronized (sharedQueue) { while (sharedQueue.size() == maxSize) { //maxsize is 2 sharedQueue.wait(); } } -if size is less than 2, producer will start producing by using below piece of code. synchronized (sharedQueue) { System.out.println("Produced : " + i); sharedQueue.add(i); Thread.sleep((long)(Math.random() * 1000)); sharedQueue.notify(); } DETAILED DESCRIPTION with program : Solve Consumer Producer problem by using wait() and notify() methods in multithreading. Question 18. How to solve Consumer Producer problem without using wait() and notify() methods, where consumer can consume only when production is over.? Answer. In this problem, producer will allow consumer to consume only when 10 products have been produced (i.e. when production is over). We will approach by keeping one boolean variable productionInProcess and initially setting it to true, and later when production will be over we will set it to false. Question 19. How can you solve consumer producer pattern by using BlockingQueue? (Important) Answer. Now it’s time to gear up to face question which is most probably going to be followed up by previous question i.e. after how to solve consumer producer problem using wait() and notify() method. Generally you might wonder why interviewer's are so much interested in asking about solving consumer producer problem using BlockingQueue, answer is they want to know how strong knowledge you have about java concurrent Api’s, this Api use consumer producer pattern in very optimized manner, BlockingQueue is designed is such a manner that it offer us the best performance. BlockingQueue is a interface and we will use its implementation class LinkedBlockingQueue. Key methods for solving consumer producer pattern are > put(i); //used by producer to put/produce in sharedQueue. take();//used by consumer to take/consume from sharedQueue. Question 20. What is deadlock in multithreading? Write a program to form DeadLock in multi threading and also how to solve DeadLock situation. What measures you should take to avoid deadlock? (Important) Answer. This is very important question from interview perspective. But, what makes this question important is it checks interviewees capability of creating and detecting deadlock. If you can write a code to form deadlock, than I am sure you must be well capable in solving that deadlock as well. If not, later on this post we will learn how to solve deadlock as well. First question comes to mind is, what is deadlock in multi threading program? Deadlock is a situation where two threads are waiting for each other to release lock holded by them on resources. But how deadlock could be formed : Thread-1 acquires lock on String.class and then calls sleep() method which gives Thread-2 the chance to execute immediately after Thread-1 has acquired lock on String.class and Thread-2 acquires lock on Object.class then calls sleep() method and now it waits for Thread-1 to release lock on String.class. Conclusion: Now, Thread-1 is waiting for Thread-2 to release lock on Object.class and Thread-2 is waiting for Thread-1 to release lock on String.class and deadlock is formed. //Code called by Thread-1 public void run() { synchronized (String.class) { Thread.sleep(100); synchronized (Object.class) { } } } //Code called by Thread-2 publicvoid run() { synchronized (Object.class) { Thread.sleep(100); synchronized (String.class) { } } } Here comes the important part, how above formed deadlock could be solved : Thread-1 acquires lock on String.class and then calls sleep() method which gives Thread-2 the chance to execute immediately after Thread-1 has acquired lock on String.class and Thread-2 tries to acquire lock on String.class but lock is holded by Thread-1. Meanwhile, Thread-1 completes successfully. As Thread-1 has completed successfully it releases lock on String.class, Thread-2 can now acquire lock on String.class and complete successfully without any deadlock formation. Conclusion: No deadlock is formed. //Code called by Thread-1 publicvoid run() { synchronized (String.class) { Thread.sleep(100); synchronized (Object.class) { } } } //Code called by Thread-2 publicvoid run() { synchronized (String.class) { Thread.sleep(100); synchronized (Object.class) { } } } Few important measures to avoid Deadlock > Lock specific member variables of class rather than locking whole class: We must try to lock specific member variables of class rather than locking whole class. Use join() method: If possible try touse join() method, although it may refrain us from taking full advantage of multithreading environment because threads will start and end sequentially, but it can be handy in avoiding deadlocks. If possible try avoid using nested synchronization blocks. Question 21. Have you ever generated thread dumps or analyzed Thread Dumps? (Important) Answer. Answering this questions will show your in depth knowledge of Threads. Every experienced must know how to generate Thread Dumps. VisualVM is most popular way to generate Thread Dump and is most widely used by developers. It’s important to understand usage of VisualVM for in depth knowledge of VisualVM. I’ll recommend every developer must understand this topic to become master in multi threading. It helps us in analyzing threads performance, thread states, CPU consumed by threads, garbage collection and much more. For detailed information see Generating and analyzing Thread Dumps using VisualVM - step by step detail to setup VisualVM with screenshots jstack is very easy way to generate Thread dump and is widely used by developers. I’ll recommend every developer must understand this topic to become master in multi threading. For creating Thread dumps we need not to download any jar or any extra software. For detailed information see Generating and analyzing Thread Dumps using JSATCK - step by step detail to setup JSTACK with screenshots. Question 22. What is life cycle of Thread, explain thread states? (Important) Answer. Thread states/ Thread life cycle is very basic question, before going deep into concepts we must understand Thread life cycle. Thread have following states > New Runnable Running Waiting/blocked/sleeping Terminated (Dead) Thread states/ Thread life cycle in diagram > Thread states in detail > New : When instance of thread is created using new operator it is in new state, but the start() method has not been invoked on the thread yet, thread is not eligible to run yet. Runnable : When start() method is called on thread it enters runnable state. Running : Thread scheduler selects thread to go fromrunnable to running state. In running state Thread starts executing by entering run() method. Waiting/blocked/sleeping : In this state a thread is not eligible to run. >Thread is still alive, but currently it’s not eligible to run. In other words. > How can Thread go from running to waiting state? By calling wait()method thread go from running to waiting state. In waiting state it will wait for other threads to release object monitor/lock. > How can Thread go from running to sleeping state? By calling sleep() methodthread go from running to sleeping state. In sleeping state it will wait for sleep time to get over. Terminated (Dead) : A thread is considered dead when its run() method completes. Question 23. Are you aware of preemptive scheduling and time slicing? Answer. In preemptive scheduling, the highest priority thread executes until it enters into the waiting or dead state. In time slicing, a thread executes for a certain predefined time and then enters runnable pool. Than thread can enter running state when selected by thread scheduler. Question 24. What are daemon threads? Answer.Daemon threads are low priority threads which runs intermittently in background for doing garbage collection. 12 Few salient features of daemon() threads> Thread scheduler schedules these threads only when CPU is idle. Daemon threads are service oriented threads, they serves all other threads. These threads are created before user threads are created and die after all other user threads dies. Priority of daemon threads is always 1 (i.e. MIN_PRIORITY). User created threads are non daemon threads. JVM can exit when only daemon threads exist in system. we can use isDaemon() method to check whether thread is daemon thread or not. we can use setDaemon(boolean on) method to make any user method a daemon thread. If setDaemon(boolean on) is called on thread after calling start() method than IllegalThreadStateException is thrown. You may like to see how daemon threads work, for that you can use VisualVM or jStack. I have provided Thread dumps over there which shows daemon threads which were intermittently running in background. Some of the daemon threads which intermittently run in background are > "RMI TCP Connection(3)-10.175.2.71" daemon"RMI TCP Connection(idle)" daemon"RMI Scheduler(0)" daemon"C2 CompilerThread1" daemon "GC task thread#0 (ParallelGC)" Question 25. Why suspend() and resume() methods are deprecated? Answer.Suspend() method is deadlock prone. If the target thread holds a lock on object when it is suspended, no thread can lock this object until the target thread is resumed. If the thread that would resume the target thread attempts to lock this monitor prior to calling resume, it results in deadlock formation. These deadlocksare generally called Frozen processes. Suspend() method puts thread from running to waiting state. And thread can go from waiting to runnable state only when resume() method is called on thread. It is deprecated method. Resume() method is only used with suspend() method that’s why it’s also deprecated method. Question 26. Why destroy() methods is deprecated? Answer. This question is again going to check your in depth knowledge of thread methods i.e. destroy() method is deadlock prone. If the target thread holds a lock on object when it is destroyed, no thread can lock this object (Deadlock formed are similar to deadlock formed when suspend() and resume() methods are used improperly). It results in deadlock formation. These deadlocksare generally called Frozen processes. Additionally you must know calling destroy() method on Threads throw runtimeException i.e. NoSuchMethodError. Destroy() method puts thread from running to dead state. Question 27. As stop() method is deprecated, How can we terminate or stop infinitely running thread in java? (Important) Answer. This is very interesting question where interviewees thread basics basic will be tested. Interviewers tend to know user’s knowledge about main thread’s and thread invoked by main thread. We will try to address the problem by creating new thread which will run infinitely until certain condition is satisfied and will be called by main Thread. Infinitely running thread can be stopped using boolean variable. Infinitely running thread can be stopped using interrupt() method. Let’s understand Why stop() method is deprecated : Stopping a thread with Thread.stop() causes it to release all of the monitors that it has locked. If any of the objects previously protected by these monitors were in an inconsistent state, the damaged objects become visible to other threads, which might lead to unpredictable behavior. Question 28. what is significance of yield() method, what state does it put thread in? yield() is a native method it’s implementation in java 6 has been changed as compared to its implementation java 5. As method is native it’s implementation is provided by JVM. In java 5, yield() method internally used to call sleep() method giving all the other threads of same or higher priority to execute before yielded thread by leaving allocated CPU for time gap of 15 millisec. But java 6, calling yield() method gives a hint to the thread scheduler that the current thread is willing to yield its current use of a processor. The thread scheduler is free to ignore this hint. So, sometimes even after using yield() method, you may not notice any difference in output. salient features of yield() method > Definition : yield() method when called on thread gives a hint to the thread scheduler that the current thread is willing to yield its current use of a processor.The thread scheduler is free to ignore this hint. Thread state : when yield() method is called on thread it goes from running to runnable state, not in waiting state. Thread is eligible to run but not running and could be picked by scheduler at anytime. Waiting time : yield() method stops thread for unpredictable time. Static method : yield()is a static method, hence calling Thread.yield() causes currently executing thread to yield. Native method : implementation of yield() method is provided by JVM. Let’s see definition of yield() method as given in java.lang.Thread - public static native void yield(); synchronized block : thread need not to to acquire object lock before calling yield()method i.e. yield() method can be called from outside synchronized block. Question 29.What is significance of sleep() method in detail, what statedoes it put thread in ? sleep() is a native method, it’s implementation is provided by JVM. 10 salient features of sleep() method > Definition : sleep() methods causes current thread to sleep for specified number of milliseconds (i.e. time passed in sleep method as parameter). Ex- Thread.sleep(10) causes currently executing thread to sleep for 10 millisec. Thread state : when sleep() is called on thread it goes from running to waiting state and can return to runnable state when sleep time is up. Exception : sleep() method must catch or throw compile time exception i.e. InterruptedException. Waiting time : sleep() method have got few options. sleep(long millis) - Causes the currently executing thread to sleep for the specified number of milliseconds public static native void sleep(long millis) throws InterruptedException; sleep(long millis, int nanos) - Causes the currently executing thread to sleep for the specified number of milliseconds plus the specified number of nanoseconds. public static native void sleep(long millis,int nanos) throws InterruptedException; static method : sleep()is a static method, causes the currently executing thread to sleep for the specified number of milliseconds. Belongs to which class :sleep() method belongs to java.lang.Thread class. synchronized block : thread need not to to acquire object lock before calling sleep()method i.e. sleep() method can be called from outside synchronized block. Question 30. Difference between wait() and sleep() ? (Important) Answer. Should be called from synchronized block :wait() method is always called from synchronized block i.e. wait() method needs to lock object monitor before object on which it is called. But sleep() method can be called from outside synchronized block i.e. sleep() method doesn’t need any object monitor. IllegalMonitorStateException : if wait() method is called without acquiring object lock than IllegalMonitorStateException is thrown at runtime, but sleep() methodnever throws such exception. Belongs to which class : wait() method belongs to java.lang.Object class but sleep() method belongs to java.lang.Thread class. Called on object or thread : wait() method is called on objects but sleep() method is called on Threads not objects. Thread state : when wait() method is called on object, thread that holded object’s monitor goes from running to waiting state and can return to runnable state only when notify() or notifyAll()method is called on that object. And later thread scheduler schedules that thread to go from from runnable to running state. when sleep() is called on thread it goes from running to waiting state and can return to runnable state when sleep time is up. When called from synchronized block :when wait() method is called thread leaves the object lock. But sleep()method when called from synchronized block or method thread doesn’t leaves object lock. Question 31. Differences and similarities between yield() and sleep()? Answer. Differences yield() and sleep() : Definition : yield() method when called on thread gives a hint to the thread scheduler that the current thread is willing to yield its current use of a processor.The thread scheduler is free to ignore this hint. sleep() methods causes current thread to sleep for specified number of milliseconds (i.e. time passed in sleep method as parameter). Ex- Thread.sleep(10) causes currently executing thread to sleep for 10 millisec. Thread state : when sleep() is called on thread it goes from running to waiting state and can return to runnable state when sleep time is up. when yield() method is called on thread it goes from running to runnable state, not in waiting state. Thread is eligible to run but not running and could be picked by scheduler at anytime. Exception : yield() method need not to catch or throw any exception. But sleep() method must catch or throw compile time exception i.e. InterruptedException. Waiting time : yield() method stops thread for unpredictable time, that depends on thread scheduler. But sleep() method have got few options. sleep(long millis) - Causes the currently executing thread to sleep for the specified number of milliseconds sleep(long millis, int nanos) - Causes the currently executing thread to sleep for the specified number of milliseconds plus the specified number of nanoseconds. similarity between yield() and sleep(): > yield() and sleep() method belongs to java.lang.Thread class. > yield() and sleep() method can be called from outside synchronized block. > yield() and sleep() method are called on Threads not objects. Question 32. Mention some guidelines to write thread safe code, most important point we must take care of in multithreading programs? Answer. In multithreading environment it’s important very important to write thread safe code, thread unsafe code can cause a major threat to your application. I have posted many articles regarding thread safety. So overall this will be revision of what we have learned so far i.e. writing thread safe healthy code and avoiding any kind of deadlocks. If method is exposed in multithreading environment and it’s not synchronized (thread unsafe) than it might lead us to race condition, we must try to use synchronized block and synchronized methods. Multiple threads may exist on same object but only one thread of that object can enter synchronized method at a time, though threads on different object can enter same method at same time. Even static variables are not thread safe, they are used in static methods and if static methods are not synchronized then thread on same or different object can enter method concurrently. Multiple threads may exist on same or different objects of class but only one thread can enter static synchronized method at a time, we must consider making static methods as synchronized. If possible, try to use volatile variables. If a field is declared volatile all threads see a consistent value for the variable. Volatile variables at times can be used as alternate to synchronized methods as well. Final variables are thread safe because once assigned some reference of object they cannot point to reference of other object. s is pointing to String object. public class MyClass { final String s=new String("a"); void method(){ s="b"; //compilation error, s cannot point to new reference. } } If final is holding some primitive value it cannot point to other value. public class MyClass { final inti=0; void method(){ i=0; //compilation error, i cannot point to new value. } } Usage of local variables : If possible try to use local variables, local variables are thread safe, because every thread has its own stack, i.e. every thread has its own local variables and its pushes all the local variables on stack. public class MyClass { void method(){ inti=0; //Local variable, is thread safe. } } Using thread safe collections : Rather than using ArrayList we must Vector and in place of using HashMap we must use ConcurrentHashMap or HashTable. We must use VisualVM or jstack to detect problems such as deadlocks and time taken by threads to complete in multi threading programs. Using ThreadLocal:ThreadLocal is a class which provides thread-local variables. Every thread has its own ThreadLocal value that makes ThreadLocal value threadsafe as well. Rather than StringBuffer try using immutable classes such as String. Any change to String produces new String. Question 33. How thread can enter waiting, sleeping and blocked state and how can they go to runnable state ? Answer. This is very prominently asked question in interview which will test your knowledge about thread states. And it’s very important for developers to have in depth knowledge of this thread state transition. I will try to explain this thread state transition by framing few sub questions. I hope reading sub questions will be quite interesting. > How can Thread go from running to waiting state ? By calling wait()method thread go from running to waiting state. In waiting state it will wait for other threads to release object monitor/lock. > How can Thread return from waiting to runnable state ? Once notify() or notifyAll()method is called object monitor/lock becomes available and thread can again return to runnable state. > How can Thread go from running to sleeping state ? By calling sleep() methodthread go from running to sleeping state. In sleeping state it will wait for sleep time to get over. > How can Thread return from sleeping to runnable state ? Once specified sleep time is up thread can again return to runnable state. Suspend() method can be used to put thread in waiting state and resume() method is the only way which could put thread in runnable state. Thread also may go from running to waiting state if it is waiting for some I/O operation to take place. Once input is available thread may return to running state. >When threads are in running state, yield()method can make thread to go in Runnable state. Question 34. Difference between notify() and notifyAll() methods, can you write a code to prove your point? Answer. Goodness. Theoretically you must have heard or you must be aware of differences between notify() and notifyAll().But have you created program to achieve it? If not let’s do it. First, I will like give you a brief description of what notify() and notifyAll() methods do. notify()- Wakes up a single thread that is waiting on this object's monitor. If any threads are waiting on this object, one of them is chosen to be awakened. The choice is random and occurs at the discretion of the implementation. A thread waits on an object's monitor by calling one of the wait methods. The awakened threads will not be able to proceed until the current thread relinquishes the lock on this object. public final native void notify(); notifyAll()- Wakes up all threads that are waiting on this object's monitor. A thread waits on an object's monitor by calling one of the wait methods. The awakened threads will not be able to proceed until the current thread relinquishes the lock on this object. public final native void notifyAll(); Now it’s time to write down a program to prove the point. Question 35. Does thread leaves object lock when sleep() method is called? Answer. When sleep() method is called Thread does not leaves object lock and goes from running to waiting state. Thread waits for sleep time to over and once sleep time is up it goes from waiting to runnable state. Question 36. Does thread leaves object lock when wait() method is called? Answer. When wait() method is called Thread leaves the object lock and goes from running to waiting state. Thread waits for other threads on same object to call notify() or notifyAll() and once any of notify() or notifyAll() is called it goes from waiting to runnable state and again acquires object lock. Question 37. What will happen if we don’t override run method? Answer. This question will test your basic knowledge how start and run methods work internally in Thread Api. When we call start() method on thread, it internally calls run() method with newly created thread. So, if we don’t override run() method newly created thread won’t be called and nothing will happen. class MyThread extends Thread { //don't override run() method } publicclass DontOverrideRun { publicstaticvoid main(String[] args) { System.out.println("main has started."); MyThread thread1=new MyThread(); thread1.start(); System.out.println("main has ended."); } } /*OUTPUT main has started. main has ended. */ As we saw in output, we didn’t override run() method that’s why on calling start() method nothing happened. Question 38. What will happen if we override start method? Answer. This question will again test your basic core java knowledge how overriding works at runtime, what what will be called at runtime and how start and run methods work internally in Thread Api. When we call start() method on thread, it internally calls run() method with newly created thread. So, if we override start() method, run() method will not be called until we write code for calling run() method. class MyThread extends Thread { @Override publicvoid run() { System.out.println("in run() method"); } @Override publicvoid start(){ System.out.println("In start() method"); } } publicclass OverrideStartMethod { publicstaticvoid main(String[] args) { System.out.println("main has started."); MyThread thread1=new MyThread(); thread1.start(); System.out.println("main has ended."); } } /*OUTPUT main has started. In start() method main has ended. */ If we note output. we have overridden start method and didn’t called run() method from it, so, run() method wasn’t call. Question 39. Can we acquire lock on class? What are ways in which you can acquire lock on class? Answer. Yes, we can acquire lock on class’s class object in 2 ways to acquire lock on class. Thread can acquire lock on class’s class object by- Entering synchronized block or Let’s say there is one class MyClass. Now we can create synchronization block, and parameter passed with synchronization tells which class has to be synchronized. In below code, we have synchronized MyClass synchronized (MyClass.class) { //thread has acquired lock on MyClass’s class object. } by entering static synchronized methods. public staticsynchronizedvoid method1() { //thread has acquired lock on MyRunnable’s class object. } As soon as thread entered Synchronization method, thread acquired lock on class’s class object. Thread will leave lock when it exits static synchronized method. Question 40. Difference between object lock and class lock? Answer. It is very important question from multithreading point of view. We must understand difference between object lock and class lock to answer interview, ocjp answers correctly. Object lock Class lock Thread can acquire object lock by- Entering synchronized block or by entering synchronized methods. Thread can acquire lock on class’s class object by- Entering synchronized block or by entering static synchronized methods. Multiple threads may exist on same object but only one thread of that object can enter synchronized method at a time. Threads on different object can enter same method at same time. Multiple threads may exist on same or different objects of class but only one thread can enter static synchronized method at a time. Multiple objects of class may exist and every object has it’s own lock. Multiple objects of class may exist but there is always one class’s class object lock available. First let’s acquire object lock by entering synchronized block. Example- Let’s say there is one class MyClassand we have created it’s object and reference to that object is myClass. Now we can create synchronization block, and parameter passed with synchronization tells which object has to be synchronized. In below code, we have synchronized object reference by myClass. MyClass myClass=newMyclass(); synchronized (myClass) { } As soon thread entered Synchronization block, thread acquired object lock on object referenced by myClass (by acquiring object’s monitor.) Thread will leave lock when it exits synchronized block. First let’s acquire lock on class’s class object by entering synchronized block. Example- Let’s say there is one class MyClass. Now we can create synchronization block, and parameter passed with synchronization tells which class has to be synchronized. In below code, we have synchronized MyClass synchronized (MyClass.class) { } As soon as thread entered Synchronization block, thread acquired MyClass’s class object. Thread will leave lock when it exits synchronized block. publicsynchronizedvoid method1() { } As soon as thread entered Synchronization method, thread acquired object lock. Thread will leave lock when it exits synchronized method. public staticsynchronizedvoid method1() {} As soon as thread entered static Synchronization method, thread acquired lock on class’s class object. Thread will leave lock when it exits synchronized method. Let’s me give you some tricky situation based question, Question 41. Suppose you have 2 threads (Thread-1 and Thread-2) on same object. Thread-1 is in synchronized method1(), can Thread-2 enter synchronized method2() at same time? Answer.No, here when Thread-1 is in synchronized method1() it must be holding lock on object’s monitor and will release lock on object’s monitor only when it exits synchronized method1(). So, Thread-2 will have to waitfor Thread-1 to release lock on object’s monitor so that it could enter synchronized method2(). Likewise, Thread-2 even cannot enter synchronized method1() which is being executed by Thread-1. Thread-2 will have to wait for Thread-1 to release lock on object’s monitor so that it could enter synchronized method1(). Now, let’s see a program to prove our point. Question 42. Suppose you have 2 threads (Thread-1 and Thread-2) on same object. Thread-1 is in static synchronized method1(), can Thread-2 enter static synchronized method2() at same time? Answer.No, here when Thread-1 is in static synchronized method1() it must be holding lock on class class’s object and will release lock on class’s classobject only when it exits static synchronized method1(). So, Thread-2 will have to wait for Thread-1 to release lock on class’s classobject so that it could enter static synchronized method2(). Likewise, Thread-2 even cannot enter static synchronized method1() which is being executed by Thread-1. Thread-2 will have to wait for Thread-1 to release lock on class’s classobject so that it could enter static synchronized method1(). Now, let’s see a program to prove our point. Question 43. Suppose you have 2 threads (Thread-1 and Thread-2) on same object. Thread-1 is in synchronized method1(), can Thread-2 enter static synchronized method2() at same time? Answer.Yes, here when Thread-1 is in synchronized method1() it must be holding lock on object’s monitor and Thread-2 can enter static synchronized method2() by acquiring lock on class’s class object. Now, let’s see a program to prove our point. Question 44. Suppose you have thread and it is in synchronized method and now can thread enter other synchronized method from that method? Answer.Yes, here when thread is in synchronized method it must be holding lock on object’s monitor and using that lock thread can enter other synchronized method. Now, let’s see a program to prove our point. Question 45. Suppose you have thread and it is in static synchronized method and now can thread enter other static synchronized method from that method? Answer. Yes, here when thread is in static synchronized method it must be holding lock on class’s class object and using that lock thread can enter other static synchronized method. Now, let’s see a program to prove our point. Question 46. Suppose you have thread and it is in static synchronized method and now can thread enter other non static synchronized method from that method? Answer.Yes, here when thread is in static synchronized method it must be holding lock on class’s class object and when it enters synchronized method it will hold lock on object’s monitor as well. So, now thread holds 2 locks (it’s also called nested synchronization)- >first one on class’s class object. >second one on object’s monitor (This lock will be released when thread exits non static method).Now, let’s see a program to prove our point. Question 47. Suppose you have thread and it is in synchronized method and now can thread enter other static synchronized method from that method? Answer.Yes, here when thread is in synchronized method it must be holding lock on object’s monitor and when it enters static synchronized method it will hold lock on class’s class object as well. So, now thread holds 2 locks (it’s also called nested synchronization)- >first one on object’s monitor. >second one on class’s class object.(This lock will be released when thread exits static method).Now, let’s see a program to prove our point. Question 48. Suppose you have 2 threads (Thread-1 on object1 and Thread-2 on object2). Thread-1 is in synchronized method1(), can Thread-2 enter synchronized method2() at same time? Answer.Yes, here when Thread-1 is in synchronized method1() it must be holding lock on object1’s monitor. Thread-2 will acquire lock on object2’s monitor and enter synchronized method2(). Likewise, Thread-2 even enter synchronized method1() as well which is being executed by Thread-1 (because threads are created on different objects). Now, let’s see a program to prove our point. Question 49. Suppose you have 2 threads (Thread-1 on object1 and Thread-2 on object2). Thread-1 is in static synchronized method1(), can Thread-2 enter static synchronized method2() at same time? Answer.No, it might confuse you a bit that threads are created on different objects. But, not to forgot that multiple objects may exist but there is always one class’s class object lock available. Here, when Thread-1 is in static synchronized method1() it must be holding lock on class class’s object and will release lock on class’s classobject only when it exits static synchronized method1(). So, Thread-2 will have to wait for Thread-1 to release lock on class’s classobject so that it could enter static synchronized method2(). Likewise, Thread-2 even cannot enter static synchronized method1() which is being executed by Thread-1. Thread-2 will have to wait for Thread-1 to release lock on class’s classobject so that it could enter static synchronized method1(). Now, let’s see a program to prove our point. Question 50. Difference between wait() and wait(long timeout), What are thread states when these method are called? Answer. wait() wait(long timeout) When wait() method is called on object, it causes causes the current thread to wait until another thread invokes the notify() or notifyAll() method for this object. wait(long timeout) - Causes the current thread to wait until either another thread invokes the notify() or notifyAll() methods for this object, or a specified timeout time has elapsed. When wait() is called on object - Thread enters from running to waiting state. It waits for some other thread to call notify so that it could enter runnable state. When wait(1000) is called on object - Thread enters from running to waiting state. Than even if notify() or notifyAll() is not called after timeout time has elapsed thread will go from waiting to runnable state. Question 51. How can you implement your own Thread Pool in java? Answer. What is ThreadPool? ThreadPool is a pool of threads which reuses a fixed number of threads to execute tasks. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. ThreadPool implementation internally uses LinkedBlockingQueue for adding and removing tasks. In this post i will be using LinkedBlockingQueue provide by java Api, you can refer this post for implementing ThreadPool using custom LinkedBlockingQueue. Need/Advantage of ThreadPool? Instead of creating new thread every time for executing tasks, we can create ThreadPool which reuses a fixed number of threads for executing tasks. As threads are reused, performance of our application improves drastically. How ThreadPool works? We will instantiate ThreadPool, in ThreadPool’s constructor nThreads number of threads are created and started. ThreadPool threadPool=new ThreadPool(2); Here 2 threads will be created and started in ThreadPool. Then, threads will enter run() method of ThreadPoolsThread class and will call take() method on taskQueue. If tasks are available thread will execute task by entering run() method of task (As tasks executed always implements Runnable). publicvoid run() { . . . while (true) { . . . Runnable runnable = taskQueue.take(); runnable.run(); . . . } . . . } Else waits for tasks to become available. When tasks are added? When execute() method of ThreadPool is called, it internally calls put() method on taskQueue to add tasks. taskQueue.put(task); Once tasks are available all waiting threads are notified that task is available. Question 52. What is significance of using ThreadLocal? Answer. This question will test your command in multi threading, can you really create some perfect multithreading application or not. ThreadLocal is a class which provides thread-local variables. What is ThreadLocal ? ThreadLocal is a class which provides thread-local variables. Every thread has its own ThreadLocal value that makes ThreadLocal value threadsafe as well. For how long Thread holds ThreadLocal value? Thread holds ThreadLocal value till it hasn’t entered dead state. Can one thread see other thread’s ThreadLocal value? No, thread can see only it’s ThreadLocal value. Are ThreadLocal variables thread safe. Why? Yes, ThreadLocal variables are thread safe. As every thread has its own ThreadLocal value and one thread can’t see other threads ThreadLocal value. Application of ThreadLocal? ThreadLocal are used by many web frameworks for maintaining some context (may be session or request) related value. In any single threaded application, same thread is assigned for every request made to same action, so ThreadLocal values will be available in next request as well. In multi threaded application, different thread is assigned for every request made to same action, so ThreadLocal values will be different for every request. When threads have started at different time they might like to store time at which they have started. So, thread’s start time can be stored in ThreadLocal. Creating ThreadLocal > private ThreadLocal threadLocal = new ThreadLocal(); We will create instance of ThreadLocal. ThreadLocal is a generic class, i will be using String to demonstrate threadLocal. All threads will see same instance of ThreadLocal, but a thread will be able to see value which was set by it only. How thread set value of ThreadLocal > threadLocal.set( new Date().toString()); Thread set value of ThreadLocal by calling set(“”) method on threadLocal. How thread get value of ThreadLocal > threadLocal.get() Thread get value of ThreadLocal by calling get() method on threadLocal. See here for detailed explanation of threadLocal. Question 53. What is busy spin? Answer. What is busy spin? When one thread loops continuously waiting for another thread to signal. Performance point of view - Busy spin is very bad from performance point of view, because one thread keeps on looping continuously ( and consumes CPU) waiting for another thread to signal. Solution to busy spin - We must use sleep() or wait() and notify() method. Using wait() is better option. Why using wait() and notify() is much better option to solve busy spin? Because in case when we use sleep() method, thread will wake up again and again after specified sleep time until boolean variable is true. But, in case of wait() thread will wake up only when when notified by calling notify() or notifyAll(), hence end up consuming CPU in best possible manner. Program - Consumer Producer problem with busy spin > Consumer thread continuously execute (busy spin) in while loop tillproductionInProcess is true. Once producer thread has ended it will make boolean variable productionInProcess false and busy spin will be over. while(productionInProcess){ System.out.println("BUSY SPIN - Consumer waiting for production to get over"); } Question 54. Can a constructor be synchronized? Answer. No, constructor cannot be synchronized. Because constructor is used for instantiating object, when we are in constructor object is under creation. So, until object is not instantiated it does not need any synchronization. Enclosing constructor in synchronized block will generate compilation error. Using synchronized in constructor definition will also show compilation error. COMPILATION ERROR = Illegal modifier for the constructor in type ConstructorSynchronizeTest; only public, protected & private are permitted Though we can use synchronized block inside constructor. Read More about : Constructor in java cannot be synchronized Question 55. Can you find whether thread holds lock on object or not? Answer. holdsLock(object) method can be used to find out whether current thread holds the lock on monitor of specified object. holdsLock(object) method returns true if the current thread holds the lock on monitor of specified object. Question 56. What do you mean by thread starvation? Answer. When thread does not enough CPU for its execution Thread starvation happens. Thread starvation may happen in following scenarios > Low priority threads gets less CPU (time for execution) as compared to high priority threads. Lower priority thread may starve away waiting to get enough CPU to perform calculations. In deadlock two threads waits for each other to release lock holded by them on resources. There both Threads starves away to get CPU. Thread might be waiting indefinitely for lock on object’s monitor (by calling wait() method), because no other thread is calling notify()/notifAll() method on object. In that case, Thread starves away to get CPU. Thread might be waiting indefinitely for lock on object’s monitor (by calling wait() method), but notify() may be repeatedly awakening some other threads. In that case also Thread starves away to get CPU. Question 57. What is addShutdownHook method in java? Answer. addShutdownHook method in java > addShutdownHook method registers a new virtual-machine shutdown hook. A shutdown hook is a initialized but unstarted thread. When JVM starts its shutdown it will start all registered shutdown hooks in some unspecified order and let them run concurrently. When JVM (Java virtual machine) shuts down > When the last non-daemon thread finishes, or when the System.exit is called. Once JVM’s shutdown has begunnew shutdown hook cannot be registered neither previously-registered hook can be de-registered. Any attempt made to do any of these operations causes an IllegalStateException. For more detail with program read : Threads addShutdownHook method in java Question 58. How you can handle uncaught runtime exception generated in run method? Answer. We can use setDefaultUncaughtExceptionHandler method which can handle uncaught unchecked(runtime) exception generated in run() method. What is setDefaultUncaughtExceptionHandler method? setDefaultUncaughtExceptionHandler method sets the default handler which is called when a thread terminates due to an uncaught unchecked(runtime) exception. setDefaultUncaughtExceptionHandler method features > setDefaultUncaughtExceptionHandler method sets the default handler which is called when a thread terminates due to an uncaught unchecked(runtime) exception. setDefaultUncaughtExceptionHandler is a static method method, so we can directly call Thread.setDefaultUncaughtExceptionHandler to set the default handler to handle uncaught unchecked(runtime) exception. It avoids abrupt termination of thread caused by uncaught runtime exceptions. Defining setDefaultUncaughtExceptionHandler method > Thread.setDefaultUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler(){ publicvoid uncaughtException(Thread thread, Throwable throwable) { System.out.println(thread.getName() + " has thrown " + throwable); } }); Question 59. What is ThreadGroup in java, What is default priority of newly created threadGroup, mention some important ThreadGroup methods ? Answer. When program starts JVM creates a ThreadGroup named main. Unless specified, all newly created threads become members of the main thread group. ThreadGroup is initialized with default priority of 10. ThreadGroup important methods > getName() name of ThreadGroup. activeGroupCount() count of active groups in ThreadGroup. activeCount() count of active threads in ThreadGroup. list() list() method has prints ThreadGroups information getMaxPriority() Method returns the maximum priority of ThreadGroup. setMaxPriority(int pri) Sets the maximum priority of ThreadGroup. Question 60. What are thread priorities? Answer. Thread Priority range is from 1 to 10. Where 1 is minimum priority and 10 is maximum priority. Thread class provides variables of final static int type for setting thread priority. /* The minimum priority that a thread can have. */ publicfinalstaticintMIN_PRIORITY= 1; /* The default priority that is assigned to a thread. */ publicfinalstaticintNORM_PRIORITY= 5; /* The maximum priority that a thread can have. */ publicfinalstaticintMAX_PRIORITY= 10; Thread with MAX_PRIORITY is likely to get more CPU as compared to low priority threads. But occasionally low priority thread might get more CPU. Because thread scheduler schedules thread on discretion of implementation and thread behaviour is totally unpredictable. Thread with MIN_PRIORITY is likely to get less CPU as compared to high priority threads. But occasionally high priority thread might less CPU. Because thread scheduler schedules thread on discretion of implementation and thread behaviour is totally unpredictable. setPriority()method is used for Changing the priority of thread. getPriority()method returns the thread’s priority.
May 29, 2015
by Ankit Mittal
· 338,374 Views · 38 Likes
article thumbnail
How To Set Up a Tomcat, Apache and mod_jk Cluster
In this article I will go through a common set-up for a small production environment. A single tier, load balanced application server cluster. Overview A high level overview of what we will be doing. Downloading and installing Apache HTTP server and mod_jk Downloading Tomcat Downloading Java Configuring two local Tomcat servers Clustering the two Tomcat servers Configuring Apache to use mod_jk to forward request to Tomcat Deploying application to Tomcat server that tests our set-up Introduction What is Apache? Apache is an HTTP server. What is mod_jk? It is an Apache module that allows AJP communication between Apache and a back end application server like Tomcat.I am running this on Ubuntu 14.04LTS installed on a dual boot PC with Windows 7. Download Apache2 We are going to use Ubuntu's APT package maintenance system to obtain and install Apache2. sudo apt-get install apache2 This will install in /etc/apache2 Download and install mod_jk The mod_jk module is not included in the Apache2 download so must be obtained and installed separately. The installation requires that the mod_jk module is visible to Apache and configured to ensure that Apache knows where to look for it and what to do with the requests you want to proxy. sudo apt-get install libapache2-mod-jk This will install in /etc/libapache2-mod-jk also two files have been added to the /etc/apache2/mods-available folder. Downloading and installing Tomcat 8 At the time of writing this Tomcat 8 does not have a package in APT so you must download the binaries from the tomcat website.http://tomcat.apache.org/download-80.cgi select the appropriate binary distribution and extract it as follows. tar xvzf apache-tomcat-8.0.5.tar.gz We need two copies of the Tomcat server to be load balanced. I created two directories in the /opt/ location: /opt/tomcat-server1/ and /opt/tomcat-server2/ and copied tomcat into each one. Download and install Java Download Java from APT as follows: apt-get install openjdk-7-jdk and set JAVA_HOME in .bashrc vim ~/.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 Configure two local Tomcat servers We will edit only the server.xml of the server2 installation of tomcat. We need to change port numbers to avoid conflicts.We change the following: and comment out the HTTP Connector as we only want the web application to be accessible through the load balancer.Here is my server2 Tomcat server.xml configuration. Configure mod_jk Load balancing is configured in the workers.properties file, located /etc/libapache2-mod-jk/ where workers represent actual or virtual workers.We will define two actual workers and two virtual workers which map to the Tomcat servers. In the worker.list property I have defined two virtual workers: status and loadbalancer, I will refer to these later in the Apache configuration.Workers for each server have been defined using values for the server.xml configuration files. I used the port values for the AJP connectors and I have included an lbfactor that sets the preference that the load balancer will show for that server.Finally we define the virtual workers. The loadbalancer worker is set to type lb and set the workers that represent the Tomcat servers in the balancer_workers properties. The status only needs to be set to type status. worker.list=loadbalancer,status worker.server1.port=8009worker.server1.host=localhostworker.server1.type=ajp13 worker.server2.port=9009worker.server2.host=localhostworker.server2.type=ajp13 worker.server1.lbfactor=1worker.server2.lbfactor=1 worker.loadbalancer.type=lbworker.loadbalancer.balance_workers=server1,server2 worker.status.type=status Ensure that you remove any other worker configuration that are not being used. Configure Apache Web Server to forward requests You will need to add the following to the Apache configurations located in etc/apache2/sites-enabled/000-default.conf JkMount /status status JkMount /* loadbalancer Verify the installation To test that all has been configured correctly we need to deploy an application. A sample application that has been used for years to test such configurations is called the ClusterJSP sample application. You can find it by googling in or from the JBoss site.Now deploy the war to the webapps folder on both servers and start each server using the start-up script /opt/tomcat-server1/bin/startup.sh.Go to http://localhost/clusterjsp/HaJsp.jsp and you should see the page show HttpSession information. Now lets look at the mod_jk status page: http://localhost/status. You will see that this page shows information about the load balancer workers and the workers it is balancing. If everything is working you will see the worker error state show OK or OK/IDLE if they are not currently balancing load. Things to try out Enable sticky sessions: Configure jvmRoute in the server.xml configuration. Further reading Loadbalancing with mod_jk and ApacheWorking with mod_jk Connecting Apache's Web Server to Multiple Instances of Tomcat
May 19, 2015
by Alex Theedom
· 10,787 Views · 1 Like
article thumbnail
A Look at Nanomsg and Scalability Protocols (Why ZeroMQ Shouldn’t Be Your First Choice)
Earlier this month, I explored ZeroMQ and how it proves to be a promising solution for building fast, high-throughput, and scalable distributed systems. Despite lending itself quite well to these types of problems, ZeroMQ is not without its flaws. Its creators have attempted to rectify many of these shortcomings through spiritual successors Crossroads I/O and nanomsg. The now-defunct Crossroads I/O is a proper fork of ZeroMQ with the true intention being to build a viable commercial ecosystem around it. Nanomsg, however, is a reimagining of ZeroMQ—a complete rewrite in C1. It builds upon ZeroMQ’s rock-solid performance characteristics while providing several vital improvements, both internal and external. It also attempts to address many of the strange behaviors that ZeroMQ can often exhibit. Today, I’ll take a look at what differentiates nanomsg from its predecessor and implement a use case for it in the form of service discovery. Nanomsg vs. ZeroMQ A common gripe people have with ZeroMQ is that it doesn’t provide an API for new transport protocols, which essentially limits you to TCP, PGM, IPC, and ITC. Nanomsg addresses this problem by providing a pluggable interface for transports and messaging protocols. This means support for new transports (e.g. WebSockets) and new messaging patterns beyond the standard set of PUB/SUB, REQ/REP, etc. Nanomsg is also fully POSIX-compliant, giving it a cleaner API and better compatibility. No longer are sockets represented as void pointers and tied to a context—simply initialize a new socket and begin using it in one step. With ZeroMQ, the context internally acts as a storage mechanism for global state and, to the user, as a pool of I/O threads. This concept has been completely removed from nanomsg. In addition to POSIX compliance, nanomsg is hoping to be interoperable at the API and protocol levels, which would allow it to be a drop-in replacement for, or otherwise interoperate with, ZeroMQ and other libraries which implement ZMTP/1.0 and ZMTP/2.0. It has yet to reach full parity, however. ZeroMQ has a fundamental flaw in its architecture. Its sockets are not thread-safe. In and of itself, this is not problematic and, in fact, is beneficial in some cases. By isolating each object in its own thread, the need for semaphores and mutexes is removed. Threads don’t touch each other and, instead, concurrency is achieved with message passing. This pattern works well for objects managed by worker threads but breaks down when objects are managed in user threads. If the thread is executing another task, the object is blocked. Nanomsg does away with the one-to-one relationship between objects and threads. Rather than relying on message passing, interactions are modeled as sets of state machines. Consequently, nanomsg sockets are thread-safe. Nanomsg has a number of other internal optimizations aimed at improving memory and CPU efficiency. ZeroMQ uses a simple trie structure to store and match PUB/SUB subscriptions, which performs nicely for sub-10,000 subscriptions but quickly becomes unreasonable for anything beyond that number. Nanomsg uses a space-optimized trie called a radix tree to store subscriptions. Unlike its predecessor, the library also offers a true zero-copy API which greatly improves performance by allowing memory to be copied from machine to machine while completely bypassing the CPU. ZeroMQ implements load balancing using a round-robin algorithm. While it provides equal distribution of work, it has its limitations. Suppose you have two datacenters, one in New York and one in London, and each site hosts instances of “foo” services. Ideally, a request made for foo from New York shouldn’t get routed to the London datacenter and vice versa. With ZeroMQ’s round-robin balancing, this is entirely possible unfortunately. One of the new user-facing features that nanomsg offers is priority routing for outbound traffic. We avoid this latency problem by assigning priority one to foo services hosted in New York for applications also hosted there. Priority two is then assigned to foo services hosted in London, giving us a failover in the event that foos in New York are unavailable. Additionally, nanomsg offers a command-line tool for interfacing with the system called nanocat. This tool lets you send and receive data via nanomsg sockets, which is useful for debugging and health checks. Scalability Protocols Perhaps most interesting is nanomsg’s philosophical departure from ZeroMQ. Instead of acting as a generic networking library, nanomsg intends to provide the “Lego bricks” for building scalable and performant distributed systems by implementing what it refers to as “scalability protocols.” These scalability protocols are communication patterns which are an abstraction on top of the network stack’s transport layer. The protocols are fully separated from each other such that each can embody a well-defined distributed algorithm. The intention, as stated by nanomsg’s author Martin Sustrik, is to have the protocol specifications standardized through the IETF. Nanomsg currently defines six different scalability protocols: PAIR, REQREP, PIPELINE, BUS, PUBSUB, and SURVEY. PAIR (Bidirectional Communication) PAIR implements simple one-to-one, bidirectional communication between two endpoints. Two nodes can send messages back and forth to each other. REQREP (Client Requests, Server Replies) The REQREP protocol defines a pattern for building stateless services to process user requests. A client sends a request, the server receives the request, does some processing, and returns a response. PIPELINE (One-Way Dataflow) PIPELINE provides unidirectional dataflow which is useful for creating load-balanced processing pipelines. A producer node submits work that is distributed among consumer nodes. BUS (Many-to-Many Communication) BUS allows messages sent from each peer to be delivered to every other peer in the group. PUBSUB (Topic Broadcasting) PUBSUB allows publishers to multicast messages to zero or more subscribers. Subscribers, which can connect to multiple publishers, can subscribe to specific topics, allowing them to receive only messages that are relevant to them. SURVEY (Ask Group a Question) The last scalability protocol, and the one in which I will further examine by implementing a use case with, is SURVEY. The SURVEY pattern is similar to PUBSUB in that a message from one node is broadcasted to the entire group, but where it differs is that each node in the group responds to the message. This opens up a wide variety of applications because it allows you to quickly and easily query the state of a large number of systems in one go. The survey respondents must respond within a time window configured by the surveyor. Implementing Service Discovery As I pointed out, the SURVEY protocol has a lot of interesting applications. For example: What data do you have for this record? What price will you offer for this item? Who can handle this request? To continue exploring it, I will implement a basic service-discovery pattern. Service discovery is a pretty simple question that’s well-suited for SURVEY: what services are out there? Our solution will work by periodically submitting the question. As services spin up, they will connect with our service discovery system so they can identify themselves. We can tweak parameters like how often we survey the group to ensure we have an accurate list of services and how long services have to respond. This is great because 1) the discovery system doesn’t need to be aware of what services there are—it just blindly submits the survey—and 2) when a service spins up, it will be discovered and if it dies, it will be “undiscovered.” Here is the ServiceDiscovery class: from collections import defaultdict import random from nanomsg import NanoMsgAPIError from nanomsg import Socket from nanomsg import SURVEYOR from nanomsg import SURVEYOR_DEADLINE class ServiceDiscovery(object): def __init__(self, port, deadline=5000): self.socket = Socket(SURVEYOR) self.port = port self.deadline = deadline self.services = defaultdict(set) def bind(self): self.socket.bind('tcp://*:%s' % self.port) self.socket.set_int_option(SURVEYOR, SURVEYOR_DEADLINE, self.deadline) def discover(self): if not self.socket.is_open(): return self.services self.services = defaultdict(set) self.socket.send('service query') while True: try: response = self.socket.recv() except NanoMsgAPIError: break service, address = response.split('|') self.services[service].add(address) return self.services def resolve(self, service): providers = self.services[service] if not providers: return None return random.choice(tuple(providers)) def close(self): self.socket.close() The discover method submits the survey and then collects the responses. Notice we construct a SURVEYOR socket and set the SURVEYOR_DEADLINE option on it. This deadline is the number of milliseconds from when a survey is submitted to when a response must be received—adjust it accordingly based on your network topology. Once the survey deadline has been reached, a NanoMsgAPIError is raised and we break the loop. The resolve method will take the name of a service and randomly select an available provider from our discovered services. We can then wrap ServiceDiscovery with a daemon that will periodically run discover. import os import time from service_discovery import ServiceDiscovery DEFAULT_PORT = 5555 DEFAULT_DEADLINE = 5000 DEFAULT_INTERVAL = 2000 def start_discovery(port, deadline, interval): discovery = ServiceDiscovery(port, deadline=deadline) discovery.bind() print 'Starting service discovery [port: %s, deadline: %s, interval: %s]' \ % (port, deadline, interval) while True: print discovery.discover() time.sleep(interval / 1000) if __name__ == '__main__': port = int(os.environ.get('PORT', DEFAULT_PORT)) deadline = int(os.environ.get('DEADLINE', DEFAULT_DEADLINE)) interval = int(os.environ.get('INTERVAL', DEFAULT_INTERVAL)) start_discovery(port, deadline, interval) The discovery parameters are configured through environment variables which I inject into a Docker container. Services must connect to the discovery system when they start up. When they receive a survey, they should respond by identifying what service they provide and where the service is located. One such service might look like the following: import os from threading import Thread from nanomsg import REP from nanomsg import RESPONDENT from nanomsg import Socket DEFAULT_DISCOVERY_HOST = 'localhost' DEFAULT_DISCOVERY_PORT = 5555 DEFAULT_SERVICE_NAME = 'foo' DEFAULT_SERVICE_PROTOCOL = 'tcp' DEFAULT_SERVICE_HOST = 'localhost' DEFAULT_SERVICE_PORT = 9000 def register_service(service_name, service_address, discovery_host, discovery_port): socket = Socket(RESPONDENT) socket.connect('tcp://%s:%s' % (discovery_host, discovery_port)) print 'Starting service registration [service: %s %s, discovery: %s:%s]' \ % (service_name, service_address, discovery_host, discovery_port) while True: message = socket.recv() if message == 'service query': socket.send('%s|%s' % (service_name, service_address)) def start_service(service_name, service_protocol, service_port): socket = Socket(REP) socket.bind('%s://*:%s' % (service_protocol, service_port)) print 'Starting service %s' % service_name while True: request = socket.recv() print 'Request: %s' % request socket.send('The answer is 42') if __name__ == '__main__': discovery_host = os.environ.get('DISCOVERY_HOST', DEFAULT_DISCOVERY_HOST) discovery_port = os.environ.get('DISCOVERY_PORT', DEFAULT_DISCOVERY_PORT) service_name = os.environ.get('SERVICE_NAME', DEFAULT_SERVICE_NAME) service_host = os.environ.get('SERVICE_HOST', DEFAULT_SERVICE_HOST) service_port = os.environ.get('SERVICE_PORT', DEFAULT_SERVICE_PORT) service_protocol = os.environ.get('SERVICE_PROTOCOL', DEFAULT_SERVICE_PROTOCOL) service_address = '%s://%s:%s' % (service_protocol, service_host, service_port) Thread(target=register_service, args=(service_name, service_address, discovery_host, discovery_port)).start() start_service(service_name, service_protocol, service_port) Once again, we configure parameters through environment variables set on a container. Note that we connect to the discovery system with a RESPONDENT socket which then responds to service queries with the service name and address. The service itself uses a REP socket that simply responds to any requests with “The answer is 42,” but it could take any number of forms such as HTTP, raw socket, etc. The full code for this example, including Dockerfiles, can be found on GitHub. Nanomsg or ZeroMQ? Based on all the improvements that nanomsg makes on top of ZeroMQ, you might be wondering why you would use the latter at all. Nanomsg is still relatively young. Although it has numerous language bindings, it hasn’t reached the maturity of ZeroMQ which has a thriving development community. ZeroMQ has extensive documentation and other resources to help developers make use of the library, while nanomsg has very little. Doing a quick Google search will give you an idea of the difference (about 500,000 results for ZeroMQ to nanomsg’s 13,500). That said, nanomsg’s improvements and, in particular, its scalability protocols make it very appealing. A lot of the strange behaviors that ZeroMQ exposes have been resolved completely or at least mitigated. It’s actively being developed and is quickly gaining more and more traction. Technically, nanomsg has been in beta since March, but it’s starting to look production-ready if it’s not there already.
May 4, 2015
by Tyler Treat
· 16,002 Views · 1 Like
article thumbnail
Testing the NGINX Load Balancing Efficiency with ApacheBench
providing numerous prominent features and possibilities, jelastic allows you to host applications of any complexity and in such a way, gives your customers exactly what they need. however, when your project becomes highly demanded and visited, you face another problem – the necessity to increase your hardware productivity, as it should be able to handle and rapidly serve all of the incoming users’ requests. adding more resources will temporarily improve the situation, saving your server from the failure, but it won’t solve the root issue. and this results in the need to set up a clustering solution with embedded automatic load balancing. application cluster adjusting is quite easy with jelastic – just add a few more application server instances to your environment via the topology wizard . in addition, you’ll automatically get the nginx-balancer server enabled in front of your project. it will be responsible for the even load distribution among the stated number of app server nodes, performed by virtue of the http load balancing . in such a way, your application performance grows significantly, increasing the number of requests that can be served at one time. as a nice bonus, you decrease the risks of app inaccessibility, since if one server fails, all the rest continue working. in order to prove this scheme is that efficient, we’ll show you how to perform the load balancing testing with the help of apachebench (ab) tool. it provides a number of possibilities for testing the servers’ ability to cope with the increasing and changeable load. though ab was designed for apache installations testing, it can be used to benchmark any http server. so, let’s get started and test it in real time. create an environment and deploy the application 1. log into the jelastic platform and click the create environment button in the upper left corner of the dashboard. 2. the environment topology dialog window will instantly appear. here you can choose the desired programming language, application/web server and database. as we are going to test the apachephp server loading, select it and specify the resource usage limits by means of cloudlet sliders. then, attach the public ip address for this server and type the name of a new environment (e.g. balancer ). click create. 3. in just a minute your environment will appear at the dashboard. 4. once the environment is successfully created, you can deploy your application to it. here we’ll use the default helloworld.zip package, so you just need to deploy it to the desired environment with the corresponding button and confirm the deployment in the opened frame. control point testing to analyze the results you’ll need something to compare them with, so let’s make a control point test, using the created environment with just a single application server node. as it was mentioned above, we’ll use the apachebench (ab) tool for these purposes. it can generate a single-threaded load by sending the stated number of concurrent requests to a server. follow the steps below. 1. apachebench is a part of standard apache source distribution, so if you still don’t have it, run the following command through your terminal (or skip this step if you do). apt-get install apache2-utils detailed information about all the further used ab commands can be found by following this link . 2. enter the next line in the terminal: ab -n 500 -c 10 -g res1.tsv {url_to_your_env} substitute the {url_to_your_env} part with a link to your environment (e.g. http://balancer.jelastic.com/ in our case). in order to get it, click the open in browser button next to your environment and copy the corresponding url from the browser’s address bar. the specified command will send the total amount of 500 requests to the stated environment, which are divided into the packs of 10 concurrent requests at one time. all the results will be stored in the res1.tsv file inside your home folder (or enter the full path to the desired directory if you would like to change the file location). also, you can specify your custom parameters for the abovementioned command if you want. this test may take some time depending on the parameters you’ve set, therefore be patient. 3. the created file with results should look like the image below: change the environment configuration once you’ve got the initial information regarding application performance, it’s time to extend your environment’s topology and adjust it for the further testing. 1. return to the jelastic dashboard and click change environment topology for your balancer environment. 2. within the opened environment topology frame, add more application servers (e.g. one more apache instance) – use the + button in the horizontal scaling wizard section for that. nginx-balancer node will be automatically added to your environment as an entry point of your application. enable public ip for your load balancer and state the resource limits. clickapply to proceed. 3. when all of the required changes are successfully applied, you should disable the sticky sessions for the balancer server. otherwise, all the requests from one ip address will be redirected to the same instance of the application server. therefore, click the config button next to the nginx node. 4. navigate to the conf > nginx-jelastic.conf file. it’s not editable, so copy all its content and paste it to the nginx.conf file (located in the same folder) instead of include /etc/nginx/nginx-jelastic.conf; line (circled at the following image). 5. then, find two mentions of the sticky path parameter in the code (in the default upstream and upstreams list sections) and comment them as it is shown below. note: don’t miss the closing curly braces after those sticky path strings, they should be uncommented. 6. save the changes applied and restart the nginx server. testing balancer and compare results now let’s proceed directly to load balancing testing. 1. switch back to your terminal and run the ab testing again with the same parameters (except the file with results – specify another name for it, e.g. res2.tsv ). ab -n 500 -c 10 -g res2.tsv {url_to_your_env} 2. in order to clarify the obtained results, we’ll use the freely distributed gnuplot graphs utility. install it (if you haven’t done this before) and enter its shell with a gnuplot command. 3. after that, you need to set up the parameters for our future graph: set size 1, 1 set title “benchmark testing” set key left top set grid y set xlabel ‘requests’ set ylabel “response time (ms)” set datafile separator ‘\t’ 4. now you’re ready to compose the graph: plot “/home/res1.tsv” every ::2 using 5 title ‘single server’ with lines, “/home/res2.tsv” every ::2 using 5 title ‘two servers with lb’ with lines this plot command will build 2 graphs (separated with comma in the command body). let’s consider the used parameters in more details: “/home/resn.tsv” represents paths to the files with your testing results every ::2 operator defines that gnuplot will start building from the second row (i.e. the first row with headings will be skipped) using 5 means that the fifth ttime column (the total response time) will be used for graph building title ‘n’ option sets the particular graph name for the easier separation of the test results with lines is used for our graph to be a solid line you’ll get an automatically created and opened image similar to the following: due to the specified options, the red graph shows the performance of a single apacheserver without balancer (control point testing results) and the green one – of two servers with nginx load balancer (the second testing phase results). note: that the received testing results (response time for each sent requests) are shown in the ascending order, i.e. not chronologically. as you can see, while serving the low load, both configurations’ performance is almost the same, but as the number of requests is increasing, the response time for an environment with a single app server instance grows significantly, resulting in serving less requests simultaneously. so, if you are expecting a high load for your application server, increasing the number of its instances in a bundle with a balancing server will be the best way to keep your customers happy. register now and try it out for yourself. enjoy all of the advantages of the jelastic cloud!
May 1, 2015
by Tetiana Markova
· 6,022 Views · 1 Like
article thumbnail
Linux - Simulating Degraded Network Conditions
When testing an application or service, it can be very useful to simulate degraded network conditions. This allows you to see how they might perform for users.
April 30, 2015
by Corey Goldberg
· 5,122 Views
article thumbnail
Why Elasticsearch is Suitable for Application Log Analytics
Handling Application Logs Enterprise application development using Web technologies has been around for a long time. In recent years we have seen a sharp increase in the deployment of such applications. This is partly due to the proliferation of ecommerce sites, social media sites, mobile application supporting sites, as well as the desire of enterprises to have their applications available 24x7. In most cases, such applications cater to huge load and are deployed on cloud infrastructure. Monitoring deployed applications is increasingly becoming a crucial task, as deployed applications are bound to fail, irrespective of the robust techniques used during development. Whenever an application fails, the most common resolution method starts by examining the application log. If the application has implemented logging properly, the logs can reveal the cause of application failure. Examination of log files is usually done by viewing the file using tools like vi, less, more, tail or grep. Another method is to download the file to a Windows system and viewing it using an editor like Notepad++. Engineers usually scan the log information to look for clues that point to the reasons for failure. Once the cause of failure is identified, suitable action is taken for restoring the application and/or service. The Key to Application Log Analytics This process, of logging onto a remote system and viewing logs is tedious. Additionally, many of the tools do not provide support to make the task of issue identification any simpler. Even when using tools like grep (if we know the pattern), we still need to view the logs in order to go through other information that has been logged, such as the log information that precedes the failure point. While it has always been possible to develop applications to parse application logs, the recent renewed interest in application log analytics is due to the acceptance of NoSQL-like technologies and the availability of standard tools to parse application logs. Though relational databases (RDBMS) have for many years provided the facility to store structured data, they are not well-suited for handling log data, as in many cases, the structure of the logged information is not the same across the file. This does not fit well in the rigidly defined world of an RDBMS. In comparison, NoSQL allows document flexibility and documents with different schemas can be stored in the same database / index / store. The ability to convert log data into a well-defined structure, as well as the ability to search, are key to implement a modern log analytics solution. In this document, we cover how Elasticsearch. Elasticsearch can store documents, giving us the benefit of structured storage without the overheads of a database system. The Suitability of Elasticsearch In the following subsections, we share our views as to why Elasticsearch is a suitable data store for an application log analytics solution. Elasticsearch is part of a popular trio of tools, commonly known as ELK. Of these, L stands for Logstash, the log parser; E stands for Elasticsearch, the document store; and K stands for Kibana, the visualization tool. Storing Documents Logstash can be used to parse plain text data into structured text. Once data has some structure, it becomes easy to find information by enabling search on it. While parsing application logs is not a challenge, the challenge has been in storing the data and enabling search on it. Most prior solutions have used an RDBMS for storage, but the varying structure and textual nature of application logs makes it difficult to use an RDBMS table structure to store data. RDBMSs are not geared toward ‘search’. They are geared for maintaining a ‘single value of truth’ for the data, defining relations between the data, ensuring their consistency and so on. Search is also not a strong point for RDBMSs as they use exact matches for values, while Elasticsearch supports exact matches as well as partial matches. It also supports document scoring, which attaches a confidence factor to the documents located. Elasticsearch supports documents in JSON format and uses the NoSQL philosophy for document storage. This has the advantage of allowing a flexible schema for the data. Unlike an RDBMS, Elasticsearch is a search engine at heart and hence is built for the same. Though Elasticsearch uses NoSQL for storing documents, it does not provide robust methods to update stored data. Not supporting updates is a serious disadvantage in most cases. In the case of application logs, not supporting updates actually works in favour of Elasticsearch. In case of machine logs, updates are not really required. Application logs are generated from a debugging perspective – having data handy for debugging purposes in the event of application crash or incorrect execution. They usually record important events from application execution and provide additional information to allow application developers to identify the reasons for failure. Additionally, existing information in application logs is rarely, if ever, updated. New information is continually being written to the logs, with no need to refer to old information. This plays to Elasticsearch’s strength, which is able to ingest and index new information very quickly. Search One of the easiest ways of locating information from large volumes of logs is to perform a search. Elasticsearch is well suited not only to handle search, it also supports huge volume of data, using distributed computing (implemented using Shards). While Kibana is one of the commonly used tools to display and visualize information stored in Elasticsearch, it is more suited to display standard charts like bar chart, column chart and pie chart. If the features provided by Kibana are not enough, we can always use Elasticsearch’s REST API support and it’s Query DSL (Domain-Specific Language), to search for required information. The Query DSL and the result of the query are in JSON format. Though this format makes it easy for applications to parse and process, users would need a friendly user interface to interact with the data. Handling Voluminous Data Elasticsearch supports distributed search out of the box – using the concept of ‘shards’. A shard is a single Lucene instance and is managed by Elasticsearch. Two types of shards, namely ‘primary shard’ and ‘replica shard’ are supported. By default, a document is first indexed on the primary shard and then on the replica shards. The number of primary shards can be specified, to cater to the expected volume. By default, Elasticsearch creates five shards for an index. But, once the number of primary shards is decided, it cannot be changed. A replica shards are copies the primary shard. They are used to handle fail-over and the increase performance. While performance across voluminous data can be handled by sharding, it is important to note that shards, once created for an index, cannot be changed. Thus, the sharding strategy of the data has to be decided in advance, after an assessment of the data and an estimation of its growth. In the case of application logs, the sharding strategy can be based on the application name, the business unit ID, the application OD or the application’s geolocation, just to name a few. Analytics By storing data in a structure, analytics can be enabled on the data. Not only can application perform a simple search, it is also possible to restrict the search for specific terms or over a specified time period. Structured storage also makes it easier to develop reports with well-defined visualizations, which in turn makes it easy to understand the current state of applications. It is also possible to perform various analytics operations like time series analysis using the timestamp and identification of patterns from the data using machine learning techniques (assuming, we have the right kind of data in the logs). Though Elasticsearch does not provide built-in support for analytics, applications can benefit from its fast search capability and also from its ability to handle voluminous data sets. In Closing One of the main hurdles for application logs has been the ability to search for information from the huge volume of data. By parsing application log files using Logstash, we can convert a flat file into structured data. Structured data, once stored in Elasticsearch, is easier to search and locate. Visualizations and business logic for generating alerts and tickets is easier to develop on structured data. Elasticsearch, which stores and searches documents, along with its ability to scale over huge volume of data, is a good candidate for inclusion in an application log analytics solution.
April 22, 2015
by Bipin Patwardhan
· 11,682 Views · 2 Likes
article thumbnail
Patterns of API Virtualization
[This article was written by Matthew Heusser.] When Christopher Alexander wrote A Pattern Language in 1977, he was looking for a more powerful way to describe how towns and buildings were laid out. These patterns would allow architects, builders and planners to work together, to use the same words, mean the same thing, and create systems that were beautiful and worked, instead of more urban sprawl. Twenty years later, Gamma, Helms, Johnson and Vlissdes took the pattern idea and applied it to object-oriented software, which at the time was struggling to figure out how to create windows-based applications. Today the struggle is figuring out how to break software into small components that can be tested independently, and then having those components interact, typically over internet protocols. Raw SQL commands are giving way to service oriented systems that interact through APIs, sometimes all within one company, sometimes outside with Microsoft, Google, Amazon, or other APIs like a manufacturing company or supplier. While I do not claim to be Christopher Alexander or the Gang of Four, I am seeing some patterns emerge – a set of solutions to a defined problem – and would like to share a few of those today. What do you mean API? Alistair Cockburn’s Hexagonal Architecture (below) presents a way to think about APIs. The application we want to develop is in the middle and has a set of adapters to the external world. Those adapters might be an API we expose, like a ‘search’ interface to an online catalog, or the API’s we call, including the database, an email gateway, or the ‘permissions’ service, to see what types of search results we should show to this user. Cockburn’s Hexagonal Architecture gives us two ways to think about APIs: Our own, and the services we call. (Source: http://alistair.cockburn.us/Hexagonal+architecture) That’s a lot of APIs. Let’s explore about some ways to virtualize these services – and why. Automated Build and Continuous Integration Say, for example, you are working on a piece of software to analyze trending terms on social media – such as a customer complaint that is being liked and tweeted. You want companies to find these problems when they start to trend up, then reach out to the customer and solve it, or, perhaps, reach out to say “thank you” and amplify it. Modern build systems, like Jenkins, TFS, and TeamCity can compile, deploy, and even run the system to check for known scenarios. The trouble is those pesky adapters to external systems, like Twitter and Facebook. The software could do its job, but there is no way to know if the application is correct in its guesses about trends and importance. Getting the data from the providers can turn a quick build into a slow process that uses a lot of network traffic. By recording and storing known answers to predictable requests, then simulating the service and playing back known (“canned”) data, API Virtualization allows build systems to do more, with faster, more predictable results. This does not remove the need for end-to-end testing, but it does allow the team to have more confidence with each build. Performance Testing Your Application Like build/deploy systems, performance testing the application (the inside of the hexagon) with live, external services can cause problems. All that extra traffic can cause problems with the actual company network infrastructure; it could cause bandwidth problems at the point of the ISP. Some 3rd Party APIs charge a micro-fee per transaction, or limit bandwidth. Many of them lack a ‘test’ sandbox to develop in, so performance testing could interact with real, production work. Standing up a virtual server to return pre-planned data means you can performance test your application – not the third party – prevent bandwidth throttles, not step on production data, and avoid paying fees intended for real (production) use that is actually being used to test our environment. Avoid Integration Environment Inconsistency A few years ago I worked at a large organization that was wrapping old code in proxy services, so they could be consumed by other teams. Login, add-to-cart, search catalog, create custom catalog, permissions, all of it was possible to access through API calls, most of it as simple as a web URL that returned some text. The problem was the “System Integration Test” environment, or SIT. Every team tested its services in SIT, which meant about a third of the time, something was broken. After finding a bug in the current build, we would track it back to the catalog service, walk over to that team, bring up the issue, and they would say “thanks, we are testing a new build of catalog.” We expected catalog to work in SIT. Anything else meant a waste of someone’s time. Automated tools reporting false errors were even worse. When teams performance tested their services, everything calling the service got slow, if it worked at all. By virtualizing services we could test our application end-to-end against known data, without the troubles of SIT, or having to build additional expensive test-lab-like copies of production. Best of all, creating the virtual services is a snap – just record the live service with a tool and instruct it to play back similar requests. Flip Integration Tests from Virtual To Real for Final Checking All this API virtualization creates a risk that the team will move from test to production and something will be different between the Virtual API and the live one. If the Virtual API server is just returning the same thing product did when we recorded it and we have automated checks in place, we can change our test server to point to the real service and re-run all the automated checks. As long as the source data hasn’t changed and we are reading, not writing, from production, the checks should all pass. If the production API has changed, we will get failures, and they will be easy enough to fix and retest. Simulate Slow or Unresponsive Service In The Middle Of A Long Running Transaction Sometimes you want to test if a server is overloaded or down. Calling Facebook and asking them to turn off their servers is unlikely to work; even just coordinating with the team down the hall could create a lot of overhead. You also might want to test this often – every day or every hour – and manually pulling a plug or coordinating with the Login team every hour might not be realistic. The trick is to bring the service down once and record the exact behavior of the system, then use a virtual server to simulate that behavior, over and over again, every day. That means you’ll get the exact behavior, not a guess, and know exactly how the application under test can deal with it. Early Development of System against an Undeployed API Sometimes the API you are testing against does not exist, even in test. It’s still possible to create a Virt (virtual API) which returns some roughly equivalent data, and makes it possible to move forward on the core application without introducing new risks. Avoid Configuration and Copying Hassles Many companies use a test system that is a copy of production, and then refresh the system periodically. Sometimes, you want test scenarios that do not exist in production, so you have to create them … and lose them during a refresh. The same problem happens with 3rd party APIs, when, for example, a part is discontinued, and you are testing ordering that part, or the sample person you check for insurance coverage leaves the company. If the request for the part of the coverage goes through an API, you can record known good results that don’t change, even after a database refresh – then leave the real, end-to-end testing for an exploratory step that will be lighter, quicker, more accurate, and have more confidence. A Fistful of Techniques Today we discussed a half-dozen common patterns to API virtualization, mostly around testing systems in isolation that consume data through an API, like a 3rd party or an internal service. These ideas are new, and evolving. What are a few of your favorites?
April 9, 2015
by Denis Goodwin
· 4,118 Views
article thumbnail
How to Configure a Simple JBoss Cluster in Domain Mode
Clustering is a very important thing to master for any serious user of an application server. Clustering allows for high availability by making your application available on secondary servers when the primary instance is down or it lets you scale up or out by increasing the server density on the host, or by adding servers on other hosts. It can even help to increase performance with effective load balancing between servers based on their respective hardware. Andy Overton has already covered how to set up a cluster of servers in standalone mode fronted by mod_cluster for load balancing, so in this post I'll cover clustering in domain mode. I won't rehash mod_cluster settings, so this will just cover the set up of a doman controller on one host, and the host controller and server instances on another host. To follow along with this blog, you'll need to download either JBoss EAP 6.x or WildFly. I'll be using WildFly 8.2 on Xubuntu 14.04. I'll be using $WF_HOME to refer to your WildFly home directory. Configuring the Domain Controller The domain controller needs both the domain.xml and host.xml configured. In the $WF_HOME/domain/configuration directory, you'll see that those two files are joined by a host-master.xml and a host-slave.xml. These are preconfigured host.xml files which you can use to give you a head start in making a host.xml for the domain controller (master) and host controller (slave) to use. You can either change the name of the file to be host.xml, so it will get picked up and used by default, or you can specify the host configuration you want to use on the command line by adding the --host-config argument: domain.sh --host-config=host-master.xml Whether you choose to modify the host.xml or the host-master.xml, you need to make sure that the empty element has been added to the section. This is so that when WildFly looks to see which server is the domain controller, it knows to become the domain controller itself. The other change is optional, but recommended. We need to tell the domain controller to bind its management interface to the correct IP address because, by default, it will bind to localhost, so the management communication it needs to do with the remote hosts won't be able to reach the domain controller at all! We can set this address permanently in the host.xml by making sure the inet-address value is set to the right IP, by changing the 127.0.0.1 in the example below to the correct IP: The result of that is that the default bind IP of the management interface is no longer localhost, although you can still override this value by starting JBoss with the variable left of the colon as a -D argument: domain.sh -Djboss.bind.address.management=10.0.0.1 Next, we need to modify the domain.xml file, where we need to define our server groups; essentially just defining the cluster. Each server group is named, so we can reference it later, and references a particular profile which needs to be one of the profiles named and defined in the same XML file. As I mentioned in my previous blog, domain mode has several profiles in the same file (domain.xml) rather than multiple files for each, like standalone mode (standalone.xml, standalone-ha.xml etc.). In the screenshot, there are two server groups defined - "main-server-group" which references the "full" profile, and "other-server-group" which references the "full-ha" profile. These are just the defaults which come with WildFly, so you're free to use them and modify the settings or create your own from scratch. Whichever you choose, it's a good idea to rename your server group to something meaningful, like a description of the workload, or the application name. Configuring the Host Controllers Every host server which you want to be part of the cluster must have the host.xml file configured. We've already configured the host.xml on the domain controller, so now we'll focus on the host controller. Remember, this process can be repeated on any number of hosts, depending on how many servers you want in your server group and their topology. First, we need to make sure that the domain controller and the host controller can communicate, and to do that we need a valid management user. On the domain controller, run the add-user.sh or add-user.bat script. You will need to make sure to: Choose a management user Make sure the user is different than the one you would use to log in to the web console Confirm that the new user will connect one AS process to another AS process Make a note of the secret value (this is very important!) You will find that you get prompts similar to the following: mike@mike-C2B2:~$ /opt/wildfly/wildfly-8.2.0.Final/bin/add-user.sh What type of user do you wish to add? a) Management User (mgmt-users.properties) b) Application User (application-users.properties) (a): a Enter the details of the new user to add. Using realm 'ManagementRealm' as discovered from the existing property files. Username : mgmt Password recommendations are listed below. To modify these restrictions edit the add-user.properties configuration file. - The password should not be one of the following restricted values {root, admin, administrator} - The password should contain at least 8 characters, 1 alphabetic character(s), 1 digit(s), 1 non-alphanumeric symbol(s) - The password should be different from the username Password : Re-enter Password : What groups do you want this user to belong to? (Please enter a comma separated list, or leave blank for none)[ ]: About to add user 'mgmt' for realm 'ManagementRealm' Is this correct yes/no? yes Added user 'mgmt' to file '/opt/wildfly/wildfly-8.2.0.Final/standalone/configuration/mgmt-users.properties' Added user 'mgmt' to file '/opt/wildfly/wildfly-8.2.0.Final/domain/configuration/mgmt-users.properties' Added user 'mgmt' with groups to file '/opt/wildfly/wildfly-8.2.0.Final/standalone/configuration/mgmt-groups.properties' Added user 'mgmt' with groups to file '/opt/wildfly/wildfly-8.2.0.Final/domain/configuration/mgmt-groups.properties' Is this new user going to be used for one AS process to connect to another AS process? e.g. for a slave host controller connecting to the master or for a Remoting connection for server to server EJB calls. yes/no? yes To represent the user add the following to the server-identities definition Once we have the secret value for our management user, we can add it to the host.xml file. I'm choosing to modify the host-slave.xml file, since much of the configuration I need is done for me: Next, we need to tell the host controller where to look for the domain controller. We set this to for the domain controller's host.xml file, but in the host-slave.xml we have an example tag filled out for us. All we need to do is add the domain controller's IP or hostname exactly as we did for the management bind address earlier. So our host-slave.xml should go from this: to this: This way, like with the management interface on the domain controller, the default address will be 10.0.0.1, but it can also be overridden on the command line if needed. Once we've sorted the communication out, we need to tell the host controller to actually start some server instances! At the bottom of the host-slave.xml file, there are two predefined servers to use: These are already configured to become members of the two server groups configured in the domain.xml. Note that the second server has to have a port offset. Despite it being in a different server group, it's still going to run on the same host and will attempt to bind to the same ports as the first server unless we tell it not to! We would also need to do the same thing if we added other server instances. Optionally, we can make things a little easier for ourselves when managing a lot of servers on a lot of hosts. We can give each server instance its own unique name, but we can also name the host by adding a name attribute to the parent tag, changing it from: to So both in the logs and in the admin console, you should see this host controller referred to as "host1". Now, if you wanted to name your server instances the same across hosts, you'll be able to tell which is which! If all you wanted was to configure a single domain controller and a single host controller, then that's all we need to do to get them speaking to each other. You can then carry on and configure mod_cluster and Apache to forward requests on to the right server, or just deploy your applications and connect to them directly.
April 3, 2015
by Mike Croft
· 23,538 Views
article thumbnail
To Shard, or Not to Shard
When I talk with customers about sharding decisions I often start by telling the following true story… A couple of years ago, a customer came to me looking for advice on how to shard his system. He told me he was already convinced he needed to do that since he read that some smart people at MySQL giants like Facebook and Twitter were sharding—so naturally this was something he should be doing, too. I paused for a moment and then I asked him what the size of his database was. “10GB,” he said. I nodded and asked if he handles many queries or if they were very complicated. “No,” he said. “Just a few hundred queries per second, and they have not been loading down the system by more than a few percent.” I asked him whether he was expecting exponential growth in the near future—looking to double every week or something like that. “No, our load and data size grew about 7 percent last year and we expect about the same growth this year and for the foreseeable future.” My recommendation to him was not to waste time and effort on sharding because it is just not needed in his company’s case. Before you decide how to shard, you’d best understand whether or not you really need to shard to begin with. Yes, on the extremely large-scale side of database demands, sharding is the only game in town. And not just for MySQL, but for pretty much any technology out there. Yet thanks to emerging technologies there is an increasing amount of applications that can run databases without sharding. Today we can easily run with terabytes of data per MySQL instance and serve tens of thousands of queries in many OLTP environments. This allows organizations to build very large applications without needing to shard. And keep this in mind: Sharding is a pain under all circumstances. Even if you have sharding provided out of the box by the database system, it is a pain because it introduces more components and complexity. Creating good distributed query execution plans is a very complicated task that needs to take network topology and load into account in addition to the data distribution and load of individual nodes. Before you decide if you need to shard, you should look at alternatives to scale your application. In the MySQL world, the solutions are typically as follows: Alternatives to Sharding Functional Partitioning: In many environments a single MySQL instance becomes a dumping ground for all kinds of databases—you might end up having your main application share a database instance with Drupal, which powers your website, with WordPress, which powers your blog, and with vBulletin, which powers your forums. Splitting those pieces into different database instances is something you should consider before you look into sharding. Custom-made systems will often have many applications using different data sets that can be easily split out. Replication: Many applications are read-heavy, so scaling reads becomes the issue earlier than it does with scaling writes. Replication is a great solution for this. MySQL’s built-in replication is very robust, though due to its asynchronous nature it adds complexity to the application. The developer must decide which of the reads can be done from the replica servers and which can’t, because you must be absolutely certain that you’re reading the most recent, actual data. This is the reason that alternative, synchronous replication technologies for MySQL like Percona XtraDB Cluster, are gaining popularity: They provide single database-like behavior from the cluster in most cases. Caching and Queueing: Caching is a great technology for reducing the amount of reads that hit the database. There are many applications that have reduced read load on the database by 80-95% using this technology. Queueing, in contrast, optimizes writes. It does this by merging multiple write operations together so they hit the database efficiently. Most large-scale applications should rely heavily on both of these technologies. Memcached and Redis are two popular caching technologies in the MySQL space. For queueing, the most popular technologies are ActiveMQ and RabbitMQ [1]. Supplemental Technologies: MySQL is great at many things but not at everything. If you’re looking for high-performance full-text search, consider ElasticSearch, Sphinx, or Lucene. If you’re looking at large-scale data analytics, a Hadoop-based infrastructure or Vertica might work well for you. You should let MySQL handle the things it is good at, and leave the rest to supporting tools. Optimizations to Make Before Sharding Scaling isn’t just about architecture either. You also need to make sure your system is reasonably optimized. Many people decide sharding is inevitable for them even though there are much easier and more cost-effective ways to get the performance and scale they are looking for. All of which, I might add, are also going to be valuable if sharding is indeed eventually needed. Hardware: Are you using the right hardware? I’ve seen many people looking into sharding when in fact simply purchasing decent hardware would solve their problems for years to come. Make sure you have plenty of memory and high-performance flash storage if you’re working with a large database. In many cases it can transform your system so much it will look like magic. MySQL version and Configuration: Use a recent MySQL version. By that I mean the latest GA version (MySQL 5.6 at the time of this article’s publication). Percona Server, which is free, often offers additional performance improvements for demanding workloads. Use the most recent operating system too, especially if you’re using modern hardware. Finally, make sure MySQL is configured properly. The difference in MySQL performance between poorly configured MySQL and well-tuned MySQL can be 10x or more. Schema and Queries: The same application logic can be expressed using a variety of schema and queries. I’ve seen a lot of similar applications approaching things differently, and the difference in the performance between an optimal approach and a poor one (but still used in production) can be 100x or more. Many of the changes can be retrofitted to existing schema—such as minor query changes and changes to the index structure—however, if your schema doesn’t fit your application needs well, then you might be looking at a complete redesign. So it is a good idea to think things through early. When to Shard So when should you start thinking about sharding? Basically, if none of the measures listed above have given you the performance you need, it might be time to consider sharding. Sharding does have the advantage of allowing you to potentially use lower-cost hardware or cheaper cloud instances. Most developers are using agile development methods these days and there is a common term, “Architectural Runway,” which defines how far the application can go with its current architecture. If you’ve already found success using replication in particular, it might be a bad decision to add sharding because it will force your developers to deal with the complexity of sharding and asynchronous replication. However, replication is still typically used to achieve high availability even if you’re already sharding, but in this case it’s not for scaling reads. If you’ve come to the point where you’re sure you need to shard, here are some of the questions you need to ask about how you’ll implement your sharding strategy: Shard Level: At which level should we shard? It does not have to be at the database level. Many applications, SaaS in particular, often “shard” on higher levels, deploying multiple copies of their full stack to offer complete isolation for availability, performance, security etc. In many large scale applications you will see multiple copies of a full stack deployed, each having its own sharded MySQL environment. Shard Key: How do we shard? In many cases the choice depends on whether you’re authenticating for user accounts or your organization, but in other cases it is not so obvious. When making a sharding choice, you need to think about two things: 1) as many data access points as possible should go into a single shard, because cross-shard access is expensive if supported at all, and 2) making sure such sharding does not produce a shard that is too large to handle either in terms of data size or traffic. For example, sharding by country is a poor idea because the requirements to handle Belgium traffic won’t be the same for the United States or China, which require a lot more resources. Shard by Schema or Instance: What is the unit of your shard? The typical choices are MySQL instance or database (schema). I like the shard = database approach, which doesn’t limit you to a single MySQL instance per physical box. That way you do not have to run too many MySQL instances, but you can run more than one if the application works better that way. Shard Unit: If you shard by a single MySQL server, you will run into a problem with high availability very soon. When you have 100 MySQL servers there are roughly 100 more chances for one of them to crash compared with having only one, so ensuring there is a high availability solution becomes critical. Instead of sharing across MySQL servers you will usually be sharding across “Replication Clusters,” such as one MySQL primary node and one or several replica or PXC (Percona XtraDB Cluster) nodes. Shard Technology: What technology can you use to assist you with sharding? Within the MySQL world there is no standard sharding technology as of yet that everyone uses. Most of the large web properties have implemented something in-house for their sharding needs, and some have released their solutions as open source projects. One example is Vitess, contributed by Google, and another is JetPants, contributed by Tumblr. Rolling out your own simple sharding framework might look easy for some developers until you have to deal with operational issues like balancing the shards, resharding, etc., on a large scale. There are a number of purpose-built technologies that can help you with sharding if this doesn’t sound like something your team can manage. Sharding Technologies Here are technologies that you should consider: MySQL Fabric: This is the sharding technology being developed by the MySQL team at Oracle. MySQL Fabric is GA, but its functionality right now is rather limited, especially in terms of their support for multi-sharded queries. Given more time however, it has the potential to become the standard sharding technology for MySQL. Tesora: Tesora has a proxy-based solution for MySQL sharding that became open source some time ago. I would be especially looking at Tesora if you’re also looking at deploying OpenStack, as they’ve invested a lot into the integration. ScaleArc: ScaleArc is a commercial database proxy solution that can do caching, filtering, routing, and sharding. It is a pretty mature solution that handles multiple database technologies and not just MySQL. ScaleBase: ScaleBase is a sharding solution designed specifically for MySQL and the cloud, which similarly to MySQL, operates at the proxy level. There are many technologies in the MySQL space that can help you scale your application without sharding. If you’re going to build the next “Facebook,” however, you will surely need to shard, and there are a number of technologies that can help you do it as painlessly as possible. Large-scale applications on large-scale databases will always introduce complexity, which makes them more complicated to develop against and manage. Success comes with cost. [1] http://dzone.com/research/guide-to-enterprise-integration Peter Zaitsev co-founded Percona in 2006, assuming the role of CEO. Percona helps companies of all sizes maximize their success with MySQL. Peter enjoys mixing business leadership with hands on technical expertise. Peter is also the co-author of O’Reilly’s High Performance MySQL, one of the most popular books on MySQL performance.
April 2, 2015
by Peter Zaitsev
· 20,870 Views · 1 Like
article thumbnail
How CAS (Compare And Swap) in Java works
Before we dig into CAS (Compare And Swap) strategy and how is it used by atomic constructs like AtomicInteger, first consider this code: public class MyApp { private volatile int count = 0; public void upateVisitors() { ++count; //increment the visitors count } } This sample code is tracking the count of visitors to the application. Is there anything wrong with this code? What will happen if multiple threads try to update count? Actually the problem is simply marking count as volatile does not guarantee atomicity and ++count is not an atomic operations. To read more check this. Can we solve this problem if we mark the method itself synchronized as shown below: public class MyApp { private int count = 0; public synchronized void upateVisitors() { ++count; //increment the visitors count } } Will this work? If yes then what changes have we made actually? Does this code guarantee atomicity? Yes. Does this code guarantee visibility? Yes. Then what is the problem? It makes use of locking and that introduces lot of delay and overhead. Check this article. This is very expensive way of making things work. To overcome these problems atomic constructs were introduced. If we make use of an AtomicInteger to track the count it will work. public class MyApp { private AtomicInteger count = new AtomicInteger(0); public void upateVisitors() { count.incrementAndGet(); //increment the visitors count } } The classes that support atomic operations e.g. AtomicInteger, AtomicLong etc. makes use of CAS. CAS does not make use of locking rather it is very optimistic in nature. It follows these steps: Compare the value of the primitive to the value we have got in hand. If the values do not match it means some thread in between has changed the value. Else it will go ahead and swap the value with new value. Check the following code in AtomicLong class: public final long incrementAndGet() { for (;;) { long current = get(); long next = current + 1; if (compareAndSet(current, next)) return next; } } In JDK 8 the above code has been changed to a single intrinsic: public final long incrementAndGet() { return unsafe.getAndAddLong(this, valueOffset, 1L) + 1L; } What advantage this single intrinsic have? Actually this single line is JVM intrinsic which is translated by JIT into an optimized instruction sequence. In case of x86 architecture it is just a single CPU instruction LOCK XADD which might yield better performance than classic load CAS loop. Now think about the possibility when we have high contention and a number of threads want to update the same atomic variable. In that case there is a possibility that locking will outperform the atomic variables but in realistic contention levels atomic variables outperform lock. There is one more construct introduced in Java 8, LongAdder. As per the documentation: This class is usually preferable to AtomicLong when multiple threads update a common sum that is used for purposes such as collecting statistics, not for fine-grained synchronization control. Under low update contention, the two classes have similar characteristics. But under high contention, expected throughput of this class is significantly higher, at the expense of higher space consumption. So LongAdder is not always a replacement for AtomicLong. We need to consider the following aspects: When no contention is present AtomicLong performs better. LongAdder will allocate Cells (a final class declared in abstract class Striped64) to avoid contention which consumes memory. So in case we have a tight memory budget we should prefer AtomicLong. That's all folks. Hope you enjoyed it.
April 1, 2015
by Akhil Mittal
· 71,068 Views · 2 Likes
article thumbnail
Retry-After HTTP Header in Practice
Retry-After is a lesser known HTTP response header.
February 20, 2015
by Tomasz Nurkiewicz
· 16,788 Views
article thumbnail
Code Coverage for Embedded Target with Eclipse, gcc and gcov
The great thing with open source tools like Eclipse and GNU (gcc, gdb) is that there is a wealth of excellent tools: one thing I had in mind to explore for a while is how to generate code coverage of my embedded application. Yes, GNU and Eclipse come with code profiling and code coverage tools, all for free! The only downside seems to be that these tools seem to be rarely used for embedded targets. Maybe that knowledge is not widely available? So here is my attempt to change this :-). Or: How cool is it to see in Eclipse how many times a line in my sources has been executed? Line Coverage in Eclipse And best of all, it does not stop here…. Coverage with Eclipse To see how much percentage of my files and functions are covered? gcov in Eclipse Or even to show the data with charts? Coverage Bar Graph View Outline In this tutorial I’m using a Freescale FRDM-K64F board: this board has ARM Cortex-M4F on it, with 1 MByte FLASH and 256 KByte of RAM. The approach used in this tutorial can be used with any embedded target, as long there is enough RAM to store the coverage data on the target. I’m using Eclipse Kepler with the ARM Launchpad GNU tools (q3 2014 release), but with small modifications any Eclipse version or GNU toolchain could be used. To generate the Code Coverage information, I’m using gcov. Freescale FRDM-K64F Board Generating Code Coverage Information with gcov gcov is an open source program which can generate code coverage information. It tells me how often each line of a program is executed. This is important for testing, as that way I can know which parts of my application actually has been executed by the testing procedures. Gcov can be used as well for profiling, but in this post I will use it to generate coverage information only. The general flow to generate code coverage is: Instrument code: Compile the application files with a special option. This will add (hidden) code and hooks which records how many times a piece of code is executed. Generate Instrumentation Information: as part of the previous steps, the compiler generates basic block and line information. This information is stored on the host as *.gcno (Gnu Coverage Notes Object?) files. Run the application: While the application is running on the target, the instrumented code will record how many the lines or blocks in the application are executed. This information is stored on the target (in RAM). Dump the recorded information: At application exit (or at any time), the recorded information needs to be stored and sent to the host. By default gcov stores information in files. As a file system might not be alway available, other methods can be used (serial connection, USB, ftp, …) to send and store the information. In this tutorial I show how the debugger can be used for this. The information is stored as *.gcda (Gnu Coverage Data Analysis?) files. Generate the reports and visualize them with gcov. General gcov Flow gcc does the instrumentation and provides the library for code coverage, while gcov is the utility to analyze the generated data. Coverage: Compiler and Linker Options To generate the *.gcno files, the following option has to be added for each file which should generate coverage information: -fprofile-arcs -ftest-coverage :idea: There is as well the ‘–coverage’ option (which is a shortcut option) which can be used both for the compiler and linker. But I prefer the ‘full’ options so I know what is behind the options. -fprofile-arcs Compiler Option The option -fprofile-arcs adds code to the program flow to so execution of source code lines are counted. It does with instrumenting the program flow arcs. From https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html: -fprofile-arcs Add code so that program flow arcs are instrumented. During execution the program records how many times each branch and call is executed and how many times it is taken or returns. When the compiled program exits it saves this data to a file called auxname.gcda for each source file. The data may be used for profile-directed optimizations (-fbranch-probabilities), or for test coverage analysis (-ftest-coverage). Each object file’s auxname is generated from the name of the output file, if explicitly specified and it is not the final executable, otherwise it is the basename of the source file. In both cases any suffix is removed (e.g. foo.gcda for input file dir/foo.c, or dir/foo.gcda for output file specified as -o dir/foo.o). See Cross-profiling. If you are not familiar with compiler technology or graph theory: An ‘Arc‘ (alternatively ‘edge’ or ‘branch’) is a directed link between a pair ‘Basic Blocks‘. A Basic is a sequence of code which has no branching in it (it is executed in a single sequence). For example if you have the following code: k = 0; if (i==10) { i += j; j++; } else { foo(); } bar(); Then this consists of the following four basic blocks: Basic Blocks The ‘Arcs’ are the directed edges (arrows) of the control flow. It is important to understand that not every line of the source gets instrumented, but only the arcs: This means that the instrumentation overhead (code size and data) depends how ‘complicated’ the program flow is, and not how many lines the source file has. However, there is an important aspect to know about gcov: it provides ‘condition coverage‘ if a full expression evaluates to TRUE or FALSE. Consider the following case: if (i==0 || j>=20) { In other words: I get coverage how many times the ‘if’ has been executed, but *not* how many times ‘i==0′ or ‘j>=20′ (which would be ‘decision coverage‘, which is not provided here). See http://www.bullseye.com/coverage.html for all the details. -ftest-coverage Compiler Option The second option for the compiler is -ftest-coverage (from https://gcc.gnu.org/onlinedocs/gcc-3.4.5/gcc/Debugging-Options.html): -ftest-coverage Produce a notes file that the gcov code-coverage utility (see gcov—a Test Coverage Program) can use to show program coverage. Each source file’s note file is called auxname.gcno. Refer to the -fprofile-arcs option above for a description of auxname and instructions on how to generate test coverage data. Coverage data will match the source files more closely, if you do not optimize. So this option generates the *.gcno file for each source file I decided to instrument: gcno file generated This file is needed later to visualize the data with gcov. More about this later. Adding Compiler Options So with this knowledge, I need to add -fprofile-arcs -ftest-coverage as compiler option to every file I want to profile. It is not necessary profile the full application: to save ROM and RAM and resources, I can add this option only to the files needed. Actually as a starter, I recommend to instrument a single source file only at the beginning. For this I select the properties (context menu) of my file Test.c I add the options in ‘other compiler flags': Coverage Added to Compilation File -fprofile-arcs Linker Option Profiling not only needs a compiler option: I need to tell the linker that it needs to link with the profiler library. For this I add -fprofile-arcs to the linker options: -fprofile-arcs Linker Option Coverage Stubs Depending on your library settings, you might now get a lot of unresolved symbol linker errors. This is because by default the profiling library assumes to write the profiling information to a file system. However, most file systems do *not* have a file system. To overcome this, I add a stubs for all the needed functions. I have them added with a file to my project (see latest version of that file on GitHub): /* * coverage_stubs.c * * These stubs are needed to generate coverage from an embedded target. */ #include #include #include #include #include #include #include "UTIL1.h" #include "coverage_stubs.h" /* prototype */ void gcov_exit(void); /* call the coverage initializers if not done by startup code */ void static_init(void) { void (**p)(void); extern uint32_t __init_array_start, __init_array_end; /* linker defined symbols, array of function pointers */ uint32_t beg = (uint32_t)&__init_array_start; uint32_t end = (uint32_t)&__init_array_end; while(begst_mode = S_IFCHR; return 0; } int _getpid(void) { return 1; } int _isatty(int file) { switch (file) { case STDOUT_FILENO: case STDERR_FILENO: case STDIN_FILENO: return 1; default: errno = EBADF; return 0; } } int _kill(int pid, int sig) { (void)pid; (void)sig; errno = EINVAL; return (-1); } int _lseek(int file, int ptr, int dir) { (void)file; (void)ptr; (void)dir; return 0; /* return offset in file */ } #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wreturn-type" __attribute__((naked)) static unsigned int get_stackpointer(void) { __asm volatile ( "mrs r0, msp \r\n" "bx lr \r\n" ); } #pragma GCC diagnostic pop void *_sbrk(int incr) { extern char __HeapLimit; /* Defined by the linker */ static char *heap_end = 0; char *prev_heap_end; char *stack; if (heap_end==0) { heap_end = &__HeapLimit; } prev_heap_end = heap_end; stack = (char*)get_stackpointer(); if (heap_end+incr > stack) { _write (STDERR_FILENO, "Heap and stack collision\n", 25); errno = ENOMEM; return (void *)-1; } heap_end += incr; return (void *)prev_heap_end; } int _read(int file, char *ptr, int len) { (void)file; (void)ptr; (void)len; return 0; /* zero means end of file */ } :idea: In this code I’m using the UTIL1 (Utility) Processor Expert component, available on SourceForge. If you do not want/need this, you can remove the lines with UTIL1. - Coverage Stubs File in Project Coverage Constructors There is one important thing to mention: the coverage data structures need to be initialized, similar to constructors for C++. Depending on your startup code, this might *not* be done automatically. Check your linker .map file for some _GLOBAL__ symbols: .text._GLOBAL__sub_I_65535_0_TEST_Test 0x0000395c 0x10 ./Sources/Test.o Such a symbol should exist for every source file which has been instrumented with coverage information. These are functions which need to be called as part of the startup code. Set a breakpoint in your code at the given address to check if it gets called. If not, you need to call it yourself. :!: Typically I use the linker option ‘-nostartfiles’), and I have my startup code. In that case, these constructors are not called by default, so I need to do myself. See http://stackoverflow.com/questions/6343348/global-constructor-call-not-in-init-array-section In my linker file I have this: .init_array : { PROVIDE_HIDDEN (__init_array_start = .); KEEP (*(SORT(.init_array.*))) KEEP (*(.init_array*)) PROVIDE_HIDDEN (__init_array_end = .); } > m_text This means that there is a list of constructor function pointers put together between __init_array_start and __init_array_end. So all what I need is to iterate through this array and call the function pointers: /* call the coverage initializers if not done by startup code */ void static_init(void) { void (**p)(void); extern uint32_t __init_array_start, __init_array_end; /* linker defined symbols, array of function pointers */ uint32_t beg = (uint32_t)&__init_array_start; uint32_t end = (uint32_t)&__init_array_end; while(beg stack) { _write (STDERR_FILENO, "Heap and stack collision\n", 25); errno = ENOMEM; return (void *)-1; } heap_end += incr; return (void *)prev_heap_end; } :!: It might be that several kBytes of heap are needed. So if you are running in a memory constraint system, be sure that you have enough RAM available. The above implementation assumes that I have space between my heap end and the stack area. :!: If your memory mapping/linker file is different, of course you will need to change that _sbrk() implementation. Compiling and Building Now the application should compile and link without errors.Check that the .gcno files are generated: :idea: You might need to refresh the folder in Eclipse. - .gcno files generated In the next steps I’m showing how to get the coverage data as *.gcda files to the host using gdb. Using Debugger to get the Coverage Data The coverage data gets dumped when _exit() gets called by the application. Alternatively I could call gcov_exit() or __gcov_flush() any time. What it then does is Open the *.gcda file with _open() for every instrumented source file. Write the data to the file with _write(). So I can set a breakpoint in the debugger to both _open() and _write() and have all the data I need :-) With _open() I get the file name, and I store it in a global pointer so I can reference it in _write(): static const unsigned char *fileName; /* file name used for _open() */ int _open (const char *ptr, int mode) { (void)mode; fileName = (const unsigned char*)ptr; /* store file name for _write() */ return 0; } In _write() I get a pointer to the data and the length of the data. Here I can dump the data to a file using the gdb command: dump binary memory I could use a calculator to calculate the memory dump range, but it is much easier if I let the program generate the command line for gdb :-): int _write(int file, char *ptr, int len) { static unsigned char gdb_cmd[128]; /* command line which can be used for gdb */ (void)file; /* construct gdb command string */ UTIL1_strcpy(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)"dump binary memory "); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), fileName); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)" 0x"); UTIL1_strcatNum32Hex(gdb_cmd, sizeof(gdb_cmd), (uint32_t)ptr); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)" 0x"); UTIL1_strcatNum32Hex(gdb_cmd, sizeof(gdb_cmd), (uint32_t)(ptr+len)); return 0; } That way I can copy the string in the gdb debugger: Generated GDB Memory Dump Command That command gets pasted and executed in the gdb console: gdb command line After execution of the program, the *.gcda file gets created (refresh might be necessary to show it up): gcda file created Repeat this for all instrumented files as necessary. Showing Coverage Information To show the coverage information, I need the *.gcda, the *.gcno plus the .elf file. :idea: Use Refresh if not all files are shown in the Project Explorer view Files Ready to Show Coverage Information Then double-click on the gcda file to show coverage results: Double Click on gcda File Press OK, and it opens the gcov view. Double click on file in that view to show the details: gcov Views Use the chart icon to create a chart view: Chart view Bar Graph View Video of Steps to Create and Use Coverage The following video summarizes the steps needed: Data and Code Overhead Instrumenting code to generate coverage information means that it is an intrusive method: it impacts the application execution speed, and needs extra RAM and ROM. How much heavily depends on the complexity of the control flow and on the number of arcs. Higher compiler optimizations would reduce the code size footprint, however optimizations are not recommended for coverage sessions, as this might make the job of the coverage much harder. I made a quick comparison using my test application. I used the ‘size’ GNU command (see “Printing Code Size Information in Eclipse”). Without coverage enabled, the application footprint is: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 6360 1112 5248 12720 31b0 FRDM-K64F_Coverage.elf With coverage enabled only for Test.c gave: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 39564 2376 9640 51580 c97c FRDM-K64F_Coverage.elf Adding main.c to generate coverage gives: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 39772 2468 9700 51940 cae4 FRDM-K64F_Coverage.elf So indeed there is some initial add-up because of the coverage library, but afterwards adding more source files does not add up much. Summary It took me a while and reading many articles and papers to get code coverage implemented for an embedded target. Clearly, code coverage is easier if I have a file system and plenty of resources available. But I’m now able to retrieve coverage information from a rather small embedded system using the debugger to dump the data to the host. It is not practical for large sets of files, but at least a starting point :-). I have committed my Eclipse Kepler/Launchpad project I used in this tutorial on GitHub. Ideas I have in my mind: Instead using the debugger/gdb, use FatFS and SD card to store the data Exploring how to use profiling Combining multiple coverage runs Happy Covering :-) Links: Blog article who helped me to explore gcov for embedded targets: http://simply-embedded.blogspot.ch/2013/08/code-coverage-introduction.html Paper about using gcov for Embedded Systems: http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/gcov-on-an-embedded-system.pdf Article about coverage options for GNU compiler and linker: http://bobah.net/d4d/tools/code-coverage-with-gcov How to call static constructor methods manually: http://stackoverflow.com/questions/6343348/global-constructor-call-not-in-init-array-section Article about using gcov with lcov: https://qiaomuf.wordpress.com/2011/05/26/use-gcov-and-lcov-to-know-your-test-coverage/ Explanation of different coverage methods and terminology: http://www.bullseye.com/coverage.html
January 28, 2015
by Erich Styger
· 15,694 Views
article thumbnail
An Impatient New User's Introduction to API Management with JBoss apiman 1.0
API Management? Did you say “API Management?” Software application development models are evolutionary things. New technologies are always being created and require new approaches. It’s frequently the case today, that a service oriented architecture (SOA) model is used and that the end product is a software service that can be used by applications. The explosion in growth of mobile devices has only accelerated this trend. Every new mobile phone sold is another platform onto which applications are deployed. These applications are often built from services provided from multiple sources. The applications often consume these services through their APIs. OK, that’s all interesting, but why does this matter? Here’s why: If you are providing a service, you’d probably like to receive payment when it’s used by an application. For example, let’s say that you’ve spent months creating a new service that provides incredibly accurate and timely driving directions. You can imagine every mobile phone GPS app making use of your service someday. That is, however, assuming that you can find a way to enforce a contract on consumers of the API and provide them with a service level agreement (SLA). Also, you have to find a way to actually track consumers’ use of the API so that you can actually enforce that SLA. Finally, you have to have the means to update a service and publish new versions of services. Likewise, if you are consuming a service, for example, if you want to build the killer app that will use that cool new mapping service, you have to have the means to find the API, identify the API’s endpoint, and register your usage of the API with its provider. The approach that is followed to fulfill both service providers’ and consumers’ needs is...API Management. JBoss apiman 1.0 apiman is JBoss’ open source API Management system. apiman fulfills service API providers’ and consumers’ needs by implementing: API Manager - The API Manager provides an easy way for API/service providers to use a web UI to define service contracts for their APIs, apply these contracts across multiple APIs, and control role-based user access and API versioning. These contracts can govern access to services and limits on the rate at which consumers can access services. The same UI enables API consumers to easily locate and access APIs. API Gateway - The gateway applies the service contract policies of API Management by enforcing at runtime the rules defined in the contracts and tracking the service API consumers’ use of the APIs for every request made to the services. The way that the API Gateway works is that the consumer of the service accesses the service through a URL that designates the API Gateway as a proxy for the service. If the policies defined to govern access to the service (see a later section in this post for a discussion of apiman polices), the API Gateway then proxies requests to the service’s backend API implementation. The best way to understand API Management with apiman is to see it in action. In this post, we’ll install apiman 1.0, configure an API with contracts through the API Manager, and watch the API Gateway control access to the API and track its use. Prerequisites We don’t need very much to run apiman out of the box. Before we install apiman, you’ll have to have Java (version 1.7 or newer) installed on your system. You’ll also need to git and maven installed to be able to build the example service that we’ll use. A note on software versions: In this post we’ll use the latest available version of apiman as of December 2014. As if this writing, version 1.0 of apiman was just released (December 2014). Depending on the versions of software that you use, some screen displays may look a bit different. Getting apiman Like all JBoss software, installation of apiman is simple. First, you will need an application server on which to install and run apiman. We’ll use the open source JBoss WildFly server release 8.2 (http://www.wildfly.org/). To make things easier, apiman includes a pointer to JBoss WildFly on its download page here: http://www.apiman.io/latest/download.html To install WildFly, simply download http://download.jboss.org/wildfly/8.2.0.Final/wildfly-8.2.0.Final.zip and unzip the file into the directory in which you want to run the sever. Then, download the apiman 1.0 WildFly overlay zip file inside the directory that was created when you un-zipped the WildFly download. The apiman 1.0 WildFly overlay zip file is available here: http://downloads.jboss.org/overlord/apiman/1.0.0.Final/apiman-distro-wildfly8-1.0.0.Final-overlay.zip The commands that you will execute will look something like this: mkdir apiman cd apiman unzip wildfly-8.2.0.Final.zip unzip -o apiman-distro-wildfly8-1.0.0.Final-overlay.zip -d wildfly-8.2.0.Final Then, to start the server, execute these commands: cd wildfly-8.2.0.Final ./bin/standalone.sh -c standalone-apiman.xml The server will write logging messages to the screen. When you see some messages that look like this, you’ll know that the server is up and running with apiman installed: 13:57:03,229 INFO [org.jboss.as.server] (ServerService Thread Pool -- 29) JBAS018559: Deployed "apiman-ds.xml" (runtime-name : "apiman-ds.xml") 13:57:03,261 INFO [org.jboss.as] (Controller Boot Thread) JBAS015961: Http management interface listening on http://127.0.0.1:9990/management 13:57:03,262 INFO [org.jboss.as] (Controller Boot Thread) JBAS015951: Admin console listening on http://127.0.0.1:9990 13:57:03,262 INFO [org.jboss.as] (Controller Boot Thread) JBAS015874: WildFly 8.2.0.Final "Tweek" started in 5518ms - Started 754 of 858 services (171 services are lazy, passive or on-demand) If this were a production server, the first thing that we’d do is to change the OOTB default admin username and/or password. apiman is configured by default to use JBoss KeyCloak (http://keycloak.jboss.org/) for password security. Also, the default database used by apiman to store contract and service information is the H2 database. For a production server, you’d want to reconfigure this to use a production database. Note: apiman includes DDLs for both MySQL and PostgreSQL. For the purposes of our demo, we’ll keep things simple and use the default configuration. To access apiman’s API Manager UI, go to: http://localhost:8080/apiman-manager, and log in. The admin user account that we’ll use has a username of “admin” and a password of “admin123!” You should see a screen that looks like this: Before we start using apiman, let’s take a look at how apiman defines how services and the meta data on which they depend are organized. Policies, Plans, and Organizations apiman uses a hierarchical data model that consists of these elements: Polices, Plans, and Organizations: Policies Policies are at the lowest level of the data model, and they are the basis on which the higher level elements of the data model are built. A policy defines an action that is performed by the API Gateway at runtime. Everything defined in the API Manager UI is there to enable apiman to apply policies to requests made to services. When a request to a service is made, apiman creates a chain of policies to be applied to that request. apiman policy chains define a specific sequence order in which the policies defined in the API Manager UI are applied to service requests. The sequence in which incoming service requests have policies applied is: First, at the application level. In apiman, an application is contracted to use one or more services. Second, at the plan level. In apiman, policies are organized into groups called plans. (We’ll discuss plans in the next section of this post.) Third, at the individual service level. What happens is that when a service request is received by the API Gateway at runtime, the policy chain is applied in the order of application, plan, and service. If no failures, such as a rate counter being exceeded, occur, the API Gateway sends the request to the service’s backend API implementation. As we mentioned earlier in this post, the API Gateway acts as a proxy for the service: Next, when the API Gateway receives a response from the service’s backend implementation, the policy chain is applied again, but this time in the reverse order. The service policies are applied first, then the plan policies, and finally the application policies. If no failures occur, then the service response is sent back to the consumer of the service. By applying the policy chain twice, both for the originating incoming request and the resulting response, apiman allows policy implementations two opportunities to provide management functionality during the lifecycle. The following diagram illustrates this two-way approach to applying policies: Plans In apiman, a “plan” is a set policies that together define the level of service that apiman provides for service. Plans enable apiman users to define multiple different levels of service for their APIs, based on policies. It’s common to define different plans for the same service, where the differences depend on configuration options. For example, a group or company may offer both a “gold” and “silver” plan for the same service. The gold plan may be more expensive than the silver plan, but it may offer a higher level of service requests in a given (and configurable) time period. Organizations The “organization” is at top level of the apiman data model. An organization contains and manages all elements used by a company, university, group inside a company, etc. for API management with apiman. All plans, services, applications, and users for a group are defined in an apiman organization. In this way, an organization acts as a container of other elements. Users must be associated with an organization before they can use apiman to create or consume services. apiman implements role-based access controls for users. The role assigned to a user defines the actions that a user can perform and the elements that a user can manage. Before we can define a service, the policies that govern how it is accessed, the users who will be able to access, and the organizations that will create and consume it, we need a service and a client to access that service. Luckily, creating the service and deploying it to our WildFly server, and accessing it through a client are easy. Getting and Building and Deploying the Example Service The source code for the example service is contained in a git repo (http://git-scm.com) hosted at github (https://github.com/apiman). To download a copy of the example service, navigate to the directory in which you want to build the service and execute this git command: git clone [email protected]:apiman/apiman-quickstarts.git As the source code is downloading, you'll see output that looks like this: git clone [email protected]:apiman/apiman-quickstarts.git Initialized empty Git repository in /tmp/tmp/apiman-quickstarts/.git/ remote: Counting objects: 104, done. remote: Total 104 (delta 0), reused 0 (delta 0) Receiving objects: 100% (104/104), 18.16 KiB, done. Resolving deltas: 100% (40/40), done. And, after the download is complete, you'll see a populated directory tree that looks like this: └── apiman-quickstarts ├── echo-service │ ├── pom.xml │ ├── README.md │ └── src │ └── main │ ├── java │ │ └── io │ │ └── apiman │ │ └── quickstarts │ │ └── echo │ │ ├── EchoResponse.java │ │ └── EchoServlet.java │ └── webapp │ └── WEB-INF │ ├── jboss-web.xml │ └── web.xml ├── LICENSE ├── pom.xml ├── README.md ├── release.sh └── src └── main └── assembly └── dist.xml As we mentioned earlier in the post, the example service is very simple. The only action that the service performs is to echo back in responses the meta data in the REST (http://en.wikipedia.org/wiki/Representational_state_transfer) requests that it receives. Maven is used to build the service. To build the service into a deployable .war file, navigate to the directory into which you downloaded the service example: cd apiman-quickstarts/echo-service And then execute this maven command: mvn package As the service is being built into a .war file, you'll see output that looks like this: [INFO] Scanning for projects... [INFO] [INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1 [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building apiman-quickstarts-echo-service 1.0.1-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ apiman-quickstarts-echo-service --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /jboss/local/redhat_git/apiman-quickstarts/echo-service/src/main/resources [INFO] [INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ apiman-quickstarts-echo-service --- [INFO] Compiling 2 source files to /jboss/local/redhat_git/apiman-quickstarts/echo-service/target/classes [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ apiman-quickstarts-echo-service --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /jboss/local/redhat_git/apiman-quickstarts/echo-service/src/test/resources [INFO] [INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ apiman-quickstarts-echo-service --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ apiman-quickstarts-echo-service --- [INFO] No tests to run. [INFO] [INFO] --- maven-war-plugin:2.2:war (default-war) @ apiman-quickstarts-echo-service --- [INFO] Packaging webapp [INFO] Assembling webapp in [/jboss/local/redhat_git/apiman-quickstarts/echo-service/target/apiman-quickstarts-echo-service-1.0.1-SNAPSHOT] [INFO] Processing war project [INFO] Copying webapp resources [/jboss/local/redhat_git/apiman-quickstarts/echo-service/src/main/webapp] [INFO] Webapp assembled in [23 msecs] [INFO] Building war: /jboss/local/redhat_git/apiman-quickstarts/echo-service/target/apiman-quickstarts-echo-service-1.0.1-SNAPSHOT.war [INFO] WEB-INF/web.xml already added, skipping [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.184 s [INFO] Finished at: 2014-12-26T16:11:19-05:00 [INFO] Final Memory: 14M/295M [INFO] ------------------------------------------------------------------------ If you look closely, near the end of the output, you'll see the location of the .war file: /jboss/local/redhat_git/apiman-quickstarts/echo-service/target/apiman-quickstarts-echo-service-1.0.1-SNAPSHOT.war To deploy the service, we can copy the .war file to our WildFly server's "deployments" directory. After you copy the service's .war file to the deployments directory, you'll see output like this generated by the WildFly server: 16:54:44,313 INFO [org.jboss.as.server.deployment] (MSC service thread 1-7) JBAS015876: Starting deployment of "apiman-quickstarts-echo-service-1.0.1-SNAPSHOT.war" (runtime-name: "apiman-quickstarts-echo-service-1.0.1-SNAPSHOT.war") 16:54:44,397 INFO [org.wildfly.extension.undertow] (MSC service thread 1-16) JBAS017534: Registered web context: /apiman-echo 16:54:44,455 INFO [org.jboss.as.server] (DeploymentScanner-threads - 1) JBAS018559: Deployed "apiman-quickstarts-echo-service-1.0.1-SNAPSHOT.war" (runtime-name : "apiman-quickstarts-echo-service-1.0.1-SNAPSHOT.war") Make special note of this line of output: 16:54:44,397 INFO [org.wildfly.extension.undertow] (MSC service thread 1-16) JBAS017534: Registered web context: /apiman-echo This output indicates that the URL of the deployed example service is: [a href="http://localhost:8080/apiman-echo" style="text-decoration: none;"]http://localhost:8080/apiman-echo Remember, however, that this is the URL of the deployed example service if we access it directly. We'll refer to this as the "unmanaged service" as we are able to connect to the service directly, without going through the API Gateway. The URL to access the service through the API Gateway ("the managed service") at runtime will be different. Now that our example service is installed, it’s time to install and configure our client to access the server. Accessing the Example Service Through a Client There are a lot of options available when it comes to what we can use for a client to access our service. We’ll keep the client simple so that we can keep our focus on apiman and simply install a REST client into the FireFox browser. The REST Client FireFox add-on (http://restclient.net/) is available here: https://addons.mozilla.org/en-US/firefox/addon/restclient/versions/2.0.3 After you install the client into FireFox, you can access the deployed service using the URL that we just defined. If you execute a GET command, you’ll see output that looks like this: Now that our example service is built, deployed and running, it’s time to create the organizations for the service provider and the service consumer. The differences between the requirements of the two organizations will be evident in their apiman configuration properties. Creating Users for the Service Provider and Consumer Before we create the organizations, we have to create a user for each organization. We'll start by creating the service provider user. To do this, logout from the admin account in the API Manager UI. The login dialog will then be displayed. Select the "New user" Option and register the service provider user: Then, logout and repeat the process to register a new application developer user too: Now that the new users are registered we can create the organizations. Creating the Service Producer Organization To create the service producer organization, log back into the API Manager UI as the servprov user and select “Create a new Organization”: Select a name and description for the organization, and press “Create Organization”: And, here’s our organization: Note that in a production environment, users would request membership in an organization. The approval process for accepting new members into an organization would follow the organization's workflow, but this would be handled outside of the API Manager. For the purposes of our demonstration, we'll keep things simple. Configuring the Service, its Policies, and Plans To configure the service, we’ll first create a plan to contain the policies that we want applied by the API Gateway at runtime when requests to the service are made. To create a new plan, select the “Plans” tab. We’ll create a “gold” plan: Once the plan is created, we will add policies to it: apiman provides several OOTB policies. Since we want to be able to demonstrate a policy being applied, we’ll select a Rate Limiting Policy, and set its limit to a very low level. If our service receives more than 10 requests in a day, the policy should block all subsequent requests. So much for a “gold” level of service! After we create the policy and add it to the plan, we have to lock the plan: And, here is the finished, and locked plan: At this point, additional plans can be defined for the service. We’ll also create a “silver” plan, that will offer a lower level of service (i.e., a request rate limit lower than 10 per day) than the gold plan. Since the process to create this silver plan is identical to that of the gold plan, we’ll skip the screenshots. Now that the two plans are complete and locked, it’s time to define the service. We’ll give the service an appropriate name, so that providers and consumers alike will be able to run a query in the API Manager to find it. After the service is defined, we have to define its implementation. In the context of the API Manager, the API Endpoint is the service’s direct URL. Remember that the API Gateway will act as a proxy for the service, so it must know the service’s actual URL. In the case of our example service, the URL is: http://localhost:8080/apiman-echo The plans tab shows which plans are available to be applied to the service: Let’s make our service more secure by adding an authentication policy that will require users to login before they can access the service. Select the Policies tab, and then define a simple authentication policy. Remember the user name and password that you define here as we’ll need them later on when send requests to the service. After the authentication policy is added, we can publish the service to the API Gateway: And, here it is, the published service: OK, that finishes the definition of the service provider organization and the publication of the service. Next, we'll switch over to the service consumer side and create the service consumer organization and register an application to connect to the managed service through the proxy of the API Gateway. The Service Consumer Organization We'll repeat the process that we used to create the application development organization. Log in to the API Manager UI as the “appdev” user and create the organization: Unlike the process we used when we created the elements used by the service provider, the first step that we’ll take is to create a new application and then search for the service to be used by the application: Searching for the service is easy, as we were careful to set the service name to something memorable: Select the service name, and then specify the plan to be used. We’ll splurge and use the gold plan: Next, select “create contract” for the plan: Then, agree to the contract terms (which seem to be written in a strange form of Latin in the apiman 1.0 release): The last step is to register the application with the API Gateway so that the gateway can act as a proxy for the service: Congratulations! All the steps necessary to provide and consume the service are complete! There’s just one more step that we have to take in order for clients to be able access the service through the API Gateway. Remember the URL that we used to access the unmanaged service directly? Well, forget it. In order to access the managed service through the API Gateway acting as a proxy for other service we have to obtain the managed service's URL. In the API Manager UI, head on over to the "APIs" tab for the application, click on the the “>” character to the left of the service name. This will expose the API Key and the service’s HTTP endpoint in the API Gateway: In order to be able access the service through the API Gateway, we have to provide the API Key with each request. The API Key can be provided either through an HTTP Header (X-API-Key) or a URL query parameter. Luckily, the API Manager UI does the latter for us. Select the icon to the right of the HTTP Endpoint and this dialog is displayed: Copy the URL into the clipboard. We’ll need to enter this into the client in a bit. The combined API Key and HTTP endpoint should look something like this: http://localhost:8080/apiman-gateway/ACMEServices/echo/1.0?apikey=c374c202-d4b3-4442-b9e4-c6654f406e3d Accessing the Managed Service Through the apiman API Gateway, Watching the Policies at Runtime Thanks for hanging in there! The set up is done. Now, we can fire up the client and watch the policies in action as they are applied at runtime by the API Gateway, for example: Open the client, and enter the URL for the managed service (http://localhost:8080/apiman-gateway/ACMEServices/echo/1.0?apikey=c374c202-d4b3-4442-b9e4-c6654f406e3d) What happens first is that the authentication policy is applied and a login dialog is then displayed: Enter the username and password (user1/password) that we defined when we created the authentication policy to access the service. The fact that you are seeing this dialog confirms that you are accessing the managed service and are not accessing the service directly. When you send a GET request to the service, you should see a successful response: So far so good. Now, send 10 more requests and you will see a response that looks like this as the gold plan rate limit is exceeded: And there it is. Your gold plan has been exceeded. Maybe next time you’ll spend a little more and get the platinum plan! ;-) Wrap-up Let’s recap what we just accomplished in this demo: We installed apiman 1.0 onto a WildFly server instance. We used git to download and maven to build a sample REST client. As a service provider, we created an organization, defined policies based on service use limit rates and user authentication, and a plan, and assigned them to a service. As a service consumer, we searched for and found that service, and assigned it to an application. As a client, we accessed the service and observed how the API Gateway managed the service. And, if you note, in the process of doing all this, the only code that we had to write or build was for the client. We were able to fully configure the service, policies, plans, and the application in the API Manager UI. What’s Next? In this post, we’ve only scratched the surface of API Management with apiman. To learn more about apiman, you can explore its website here: http://www.apiman.io/ Join the project mailing list here: https://lists.jboss.org/mailman/listinfo/apiman-user And, better still, get involved! Contribute bug reports or feature requests. Write about your own experiences with apiman. Download the apiman source code, take a look around, and contribute your own additions. apiman 1.0 was just released, there’s no better time to join in and contribute! Acknowledgements The author would like to acknowledge Eric Wittmann for his (never impatient) review comments and suggestions on writing this post! Downloads Used in this Article REST Client (http://restclient.net/) FireFox Add-On - https://addons.mozilla.org/en-US/firefox/addon/restclient/versions/2.0.3 Echo service source code - https://github.com/EricWittmann/apiman-quickstarts apiman 1.0 - http://downloads.jboss.org/overlord/apiman/1.0.0.Final/apiman-distro-wildfly8-1.0.0.Final-overlay.zip WildFly 8.2.0 - http://download.jboss.org/wildfly/8.2.0.Final/wildfly-8.2.0.Final.zip Git - http://git-scm.com Maven - http://maven.apache.org References http://www.apiman.io/ apiman tutorial videos - https://vimeo.com/user34396826 http://www.softwareag.com/blog/reality_check/index.php/soa-what/what-is-api-management/ http://keycloak.jboss.org/
January 9, 2015
by Len DiMaggio
· 13,306 Views
article thumbnail
Need Micro Caching? Memoization to the Rescue
Caching solves wide sort of performance problems. There are many ways to integrate caching into our applications. For example when we use Spring there is easy to use @Cacheable support. Quite easy but we still have to configure cache manager, cache regions, etc. Sometimes it's unfortunately like taking a sledgehammer to crack a nut. So what can we do to "go lighter"? There is a technique called memoization. Technically it's as easy as pie but true genius lies in simplicity. Model solution looks as follows: public Foo getValue() { if (storedValue == null) { storedValue = retrieveFoo(); } return storedValue; } As you can see there is no problem in implementing it manually, but as long as we remember about DRY rule we can use already implemented solutions which additionally provides thread safety. Pretty good idea is to use Guava library. // create Supplier memoizer = Suppliers.memoize(this::retrieveFoo); // and use Foo variable = memoizer.get(); Sometimes it's enough but what can we do if we need to specify TTL for our value? We have to store (cache) retrieved value only for few seconds and after exceeding defined duration get this value one more time? One more time we can use functionality provided by Guava. Supplier memoizer = Suppliers.memoizeWithExpiration(this::retrieveFoo, 5, TimeUnit.SECONDS); The above line builds memoizer with TTL = 5 seconds. As you can see - simple... but powerful :)
January 6, 2015
by Jakub Kubrynski
· 17,753 Views
article thumbnail
Remote JMX access to WildFly (or JBoss AS7) using JConsole
One of the goals of JBoss AS7 was to make it much more secure by default, when compared to previous versions. One of the areas which was directly impacted by this goal was that you could no longer expect the server to expose some service on a port and get access to it without any authentication/authorization. Remember that in previous versions of JBoss AS you could access the JNDI port, the JMX port without any authentication/authorization, as long as those ports were opened for communication remotely. Finer grained authorizations on such ports for communications, in JBoss AS7, allows the server to control who gets to invoke operations over that port. Of course, this is not just limited to JBoss AS7 but continues to be the goal in WildFly (which is the rename of JBoss Application Server). In fact, WildFly has gone one step further and now has the feature of "one single port" for all communication. JMX communication in JBoss AS7 and WildFly With that background, we'll now focus on JMX communication in JBoss AS7 and WildFly. I'll use WildFly (8.2.0 Final) as a reference for the rest of this article, but the same details apply (with minor changes) to other major versions of JBoss AS7 and WildFly, that have been released till date. WildFly server is composed of "subsystems", each of which expose a particular set of functionality. For example, there's the EE subsystem which supports the Java EE feature set. Then there's the Undertow subsystem which supports web/HTTP server functionality. Similarly, there's a JMX subsystem which exposes the JMX feature set on the server. As you all are aware, I'm sure, JMX service is standardly used for monitoring and even managing Java servers and this includes managing the servers remotely. The JMX subsystem in WildFly allows remote access to the JMX service and port 9990 is what is used for that remote JMX communication. JConsole for remote JMX access against JBoss AS7 and WildFly Java (JDK) comes bundled with the JConsole tool which allows connecting to local or remote Java runtimes which expose the JMX service. The tool is easy to use, all you have to do is run the jconsole command it will show up a graphical menu listing any local Java processes and also an option to specify a remote URL to connect to a remote process: # Start the JConsole $JAVA_HOME/bin/jconsole Let's assume that you have started WildFly standalone server, locally. Now when you start the jconsole, you'll notice that the WildFly Java process is listed in the local running processes to which you can connect to. When you select the WildFly Java instance, you'll be auto connected to it and you'll notice MBeans that are exposed by the server. However, in the context of this article, this "local process" mode in JConsole isn't what we are interested in. Let's use the "Remote process" option in that JConsole menu which allows you to specify the remote URL to connect to the Java runtime and username and password to use to connect to that instance. Even though our WildFly server is running locally, we can use this "Remote process" option to try and connect to it. So let's try it out. Before that though, let's consider a the following few points: Remember that the JMX subsystem in WildFly allows remote access on port 9990 For remote access to JMX, the URL is of the format - service:jmx:[vendor-specific-protocol]://[host]:[port]. The vendor specific protocol is the interesting bit here. In the case of WildFly that vendor-specific-protocol is http-remoting-jmx. Remember that WildFly is secure by default which means that just because the JMX subsystem exposes 9990 port for remote communication, it doesn't mean it's open for communication to anyone. In order to be allowed to communicate over this port, the caller client is expected to be authenticated and authorized. This is backed by the "ManagementRealm" in WildFly. Users authenticated and authorized against this realm are allowed access to that port. Keeping those points in mind, let's first create a user in the Management Realm. This can be done using the add-user command line script (which is present in JBOSS_HOME/bin folder). I won't go into the details of that since there's enough documentation for that. Let's just assume that I created a user named "wflyadmin" with an appropriate password in the Management Realm. To verify that the user has been properly created, in the right realm, let's access the WildFly admin console at the URL http://localhost:9990/console. You'll be asked for username and password for access. Use the same username and password of the newly created user. If the login works, then you are good. If not, then make sure you have done things right while adding the new user (as I said I won't go into the details of adding a new user since it's going to just stretch this article unnecessarily long). So at this point we have created a user named "wflyadmin" belonging to ManagementRealm. We'll be using this same user account for accessing the JMX service on WildFly, through JConsole. So let's now bring up the jconsole as usual: $JAVA_HOME/bin/jconsole On the JConsole menu let's again select the "Remote process" option and use the following URL in the URL text box: service:jmx:http-remoting-jmx://localhost:9990 Note: For JBoss AS 7.x and JBoss EAP 6.x, the vendor specific protocol is remoting-jmx and the port for communication is 9999. So the URL will be service:jmx:remoting-jmx://localhost:9999 In the username and password textboxes, use the same user/pass that you newly created. Finally, click on Connect. What do you see? It doesn't work! The connection fails. So what went wrong? Why isn't the JConsole remote access to WildFly not working? You did all the obvious things necessary to access the WildFly JMX service remotely but you keep seeing that JConsole can't connect to it. What could be the reason? Remember, in one of those points earlier, I noted that the "vendor specific protocol" is an interesting bit? We use http-remoting-jmx and that protocol internally relies on certain WildFly/JBoss specific libraries, primarily for remote communication and authentication and authorization. These libraries are WildFly server specific and hence aren't part of the standard Java runtime environment. When you start jconsole, it uses a standard classpath which just has the relevant libraries that are part of the JDK/JRE. To solve this problem, what you need to do is bring in the WildFly server specific libraries into the classpath of JConsole. Before looking into how to do that, let's see which are the WildFly specific libraries that are needed. All the necessary classes for this to work are part of the jboss-cli-client.jar which is present in JBOSS_HOME/bin/client/ folder. So all we need to do in include this jar in the classpath of the jconsole tool. To do that we use the -J option of jconsole tool which allows passing parameters to the Java runtime of jconsole. The command to do that is: $JAVA_HOME/bin/jconsole -J-Djava.class.path=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/jconsole.jar:/opt/wildfly-8.2.0.Final/bin/client/jboss-cli-client.jar (Note that for Windows the classpath separator is the semi-colon character instead of the colon) Note, the server specific jar for JBoss AS 7.x and JBoss EAP 6.x is named jboss-client.jar and is present at the same JBOSS_HOME/bin/client directory location. So we are passing -Djava.class.path as the parameter to the jconsole Java runtime, using the -J option. Notice that we have specified more than just our server specific jar in that classpath. That's because, using the -Djava.class.path is expected to contain the complete classpath. We are including the jars from the Java JDK lib folder that are necessary for JConsole and also our server specific jar in that classpath. Running that command should bring up JConsole as usual and let's go ahead and select the "Remote process" option and specify the same URL as before: service:jmx:http-remoting-jmx://localhost:9990 and the same username and password as before and click Connect. This time you should be able to connect and should start seeing the MBeans and others services exposed over JMX. How about providing a script which does this necessary classpath setup? Since it's a common thing to try and use JConsole for remote access against WildFly, it's reasonable to expect to have a script which sets up the classpath (as above) and you could then just use that script. That's why WildFly ships such a script. It's in the JBOSS_HOME/bin folder and is called jconsole.sh (and jconsole.bat for Windows). This is just a wrapper script which internally invokes the jconsole tool present in Java JDK, after setting up the classpath appropriately. All you have to do is run: $JBOSS_HOME/bin/jconsole.sh What about using JConsole from a really remote machine, against WildFly? So far we were using the jconsole tool that was present on the same machine as the WildFly instance, which meant that we have filesystem access to the WildFly server specific jars present in the WildFly installation directory on the filesystem. This allowed us to setup the classpath for jconsole to point to the jar on the local filesystem? What if you wanted to run jconsole from a remote machine against a WildFly server which is installed and running on a different machine. In that case, your remote client machine won't be having filesystem access to the WildFly installation directory. So to get jconsole running in such a scenario, you will have to copy over the JBOSS_HOME/bin/jboss-cli-client.jar to your remote client machine, to a directory of your choice and then setup the classpath for jconsole tool as explained earlier and point it to that jar location. That should get you access to JMX services of WildFly from jconsole on a remote machine. More questions? If you still have problems getting this to work or have other questions, please start a discussion in the JBoss community forums here https://developer.jboss.org/en/wildfly/content.
January 5, 2015
by Jaikiran Pai
· 62,243 Views · 3 Likes
article thumbnail
Deploying Static Content on JBoss Server
Learn how to deploy static content on a JBoss server.
December 31, 2014
by Ravi Isnab
· 26,278 Views · 1 Like
article thumbnail
Configuring RBAC in JBoss EAP and Wildfly - Part One
In this blog post I will look into the basics of configuring Role Based Access Control (RBAC) in EAP and Wildfly. RBAC was introduced in EAP 6.2 and WildFly 8 so you will need either of those if you wish to use RBAC. For the purposes of this blog I will be using the following: OS - Ubuntu 14 Java - 1.7.0_67 JBoss - EAP 6.3 Although I'm using EAP these instructions should work just the same on Wildfly. What is RBAC? Role Based Access Control is designed to restrict system access by specifying permissions for management users. Each user with management access is given a role and that role defines what they can and cannot access. In EAP 6.2+ and Wildfly 8+ there are seven predefined roles each of which has different permissions. Details on each of the roles can be found here: https://access.redhat.com/documentation/en-US/JBoss_Enterprise_Application_Platform/6.2/html/Security_Guide/Supported_Roles.html In order to authenticate users one of the three standard authentication providers must be used. These are: Local User - The local user is automatically added as a SuperUser so a user on the server machine has full access. This user should be removed in a production system and access locked down to named users. Username/Password - using either the mgmt-users.properties file, or an LDAP server. Client Certificate - using a trust store For the purposes of this blog and to keep things simple we will use username/passwords and the mgmt-users.properties file Why do we need RBAC? The easiest way to show this is through a practical demo. Configuration can be done either via the Management Console or via the Command Line Interface (CLI). However, only a limited set of tasks can be done via the management console whereas all tasks are available via the CLI. Therefore, for the purposes of this blog I will be doing all configuration via the CLI. In our test scenario we have 4 users: Andy - This user is the main sys-admin and therefore we want him to be able to access everything. Bob - This user is a lead developer and therefore will need to be able to deploy apps and make changes to certain application resources. Clare & Dave - These users are standard developers and will need to be able to view application resources but should not be able to make changes. First of all we will set up a number of users. In order to do so we will use the add-user.sh script which can be found in: /bin Create the following users in the stated groups. (Enter No for the final question for all users) Andy - no group Bob - lead-developers Clare - standard-developers Dave - standard-developers In /domain/configuration you will find a file called mgmt-users.properties. At the bottom of this file you will see a list of the users we've created similar to this: Andy=82153e0297590cceb14e7620ccd3b6ed Bob=06a61e836d9d2d5be98517b468ab72cc Clare=63a8ff615a122c56b1d47fc098ff5124 Dave=2df8d1e02e7f3d13dcea7f4b022d0165 In the same directory you will find a a file called mgmt-groups.properties, at the bottom of this file you will see a list of users and the groups they are in, like so: Andy= Bob=lead-developers Clare=developers Dave=developers Now point a browser at http://localhost:9990 and log in as the user Dave. Navigate around and you will see you have full access to everything. This is precisely why RBAC is needed! Allowing all users to not only access the management console but to be able to access and alter anything is a recipe for disaster and guaranteed to cause issues further down the line. Often users don't understand the implications of the changes they have made, it may just be a quick fix to resolve an immediate issue but it may have long term consequences that are not noticed until much further down the line when the changes that were made have been forgotten about or are not documented. As someone who works in support we see these kind of issues on a regular basis and they can be difficult to track down with no audit trail and users not realising that the minor change they made to one part of the system is now causing a major issue in some other part of the system. OK, so we now have our users set up but at the moment they have full access to everything. Next up we will configure these users and assign them to roles. First of all start up the CLI. Run the following command: /bin/jboss-cli.sh -c Change directory to the authorisation node cd /core-service=management/access=authorization Running the following command lists the current role names and the standard role names along with two other attributes ls -l The two we are interested in here are permission-combination-policy and provider. The permission-combination-policy defines how permissions are determined if a user is assigned more than one role. The default setting is permissive. This means that if a user is assigned to any role that allows a particular action then the user can perform that action. The opposite of this is rejecting. This means that if a user is assigned to multiple roles then all those roles must permit an action for a user to be able to perform that action. The other attribute of interest here is provider. This can be set to either simple (which is the default) or rbac. In simple mode all management users can access everything and make changes, as we have seen. In rbac mode users are assigned roles and each of those roles has difference privileges. Switching on RBAC OK, lets turn on RBAC... Run the following commands to turn on RBAC cd /core-service=management/access=authorization :write-attribute(name=provider, value=rbac) Restart JBoss Now point a browser at http://localhost:9990 and try to log in as the user Andy (who should be able to access everything). You should see the following message : Insufficient privileges to access this interface. This is because at the moment the user Andy isn't mapped to any role. Let's fix that now: If you look in domain.xml in the management element you will see the following: This shows that at the moment only the local user is mapped to the SuperUser role. Mapping users and groups to roles We need to map our users to the relevant roles to allow them access. In order to do this we need the following command: role-mapping=ROLENAME/include=ALIAS:add(name=USERNAME, type=USER) Where rolename is one of the pre-configured roles, alias is a unique name for the mapping and user is the name of the user to map. So, lets map the user Andy to the SuperUser role. ./role-mapping=SuperUser/include=user-Andy:add(name=Andy, type=USER) In domain.xml you will see that our user has been added to the SuperUser role: Now point a browser at http://localhost:9990 you should now be able to log in as the user Andy and have full access to everything. Next we need to add mappings for the other roles we want to use. ./role-mapping=Deployer:add ./role-mapping=Monitor:add Now we need to give role mappings to all our other users. As we have them in groups we can assign the groups to roles, rather than mapping by user. The command is basically the same as for a user but the type is GROUP rather than user. Here we are mapping lead developers to the Deployer role and standard developers to the Monitor role. ./role-mapping=Deployer/include=group-lead-devs:add(name=lead-developers, type=GROUP) ./role-mapping=Monitor/include=group-standard-devs:add(name=developers, type=GROUP) If you look in domain.xml you should now see the following showing that the user Andy is mapped to the SuperUser role and the two groups are mapped to the Deployer and Monitor roles. You can also view the role mappings in the admin console. Click on the Administration tab. Expand the Access Control item on the left and select Role Assignment. Select the Users tab - this shows users that are mapped to roles. Select the Groups tab and you will see the mapping between groups and roles. Log in as the different users and see the differences between what you can and can't access. Conclusion So, that's it for Part One. We have switched on RBAC, set up a number of users and groups and mapped those users and groups to particular roles to give them different levels of access. In Part Two of this blog I will look at constraints which allow more fine grained permission setting, scoped roles which allow you to set permissions on individual servers and audit logging which allows you to see who is accessing the management console and see what changes they are making.
December 9, 2014
by Andy Overton
· 11,358 Views
  • Previous
  • ...
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×