Microservices Resources

The Latest Microservices Topics

This post was originally written by Vlad Lesin for the MySQL Performance Blog. Percona Server 5.6.11-60.3 introduces a new “log archiving” feature. Percona XtraBackup 2.1.5 supports “apply archived logs.” What does it mean and how it can be used? Percona products propose three kinds of incremental backups. The first is full scan of data files and comparison the data with backup data to find some delta. This approach provides a history of changes and saves disk space by storing only data deltas. But the disadvantage is a full-data file scan that adds load to the disk subsystem. The second kind of incremental backup avoids extra disk load during data file scans. The idea is in reading only changed data pages. The information about what specific pages were changed is provided by the server itself which writes files with the information during work. It’s a good alternative but changed-pages tracking adds some small load. And Percona XtraBackup’s delta reading leads to non-sequential disk io. This is good alternative but there is one more option. The Innodb engine has a data log. It writes all operations which modify database pages to log files. This log is used in the case of unexpected server terminating to recover data. The Innodb log consists of the several log files which are filled sequentially in circular. The idea is to save those files somewhere and apply all modifications from archived logs to backup data files. The disadvantage of this approach is in using extra disk space. The advantage is there is no need to do an “explicit” backup on the host server. A simple script could sit and wait for logs to appear then scp/netcat them over to another machine. But why not use good-old replication? Maybe replication does not have such performance as logs recovering but it is more controlled and well-known. Archived logs allows you to do any number of things with them from just storing them to doing periodic log applying. You can not recover from a ‘DROP TABLE’, etc with replication. But with this framework one could maintain the idea of “point in time” backups. So the “archived logs” feature is one more option to organize incremental backups. It is not widely used as it was issued not so far and there is not A good understanding of how it works and how it can be used. We are open to any suggestions about its suggest improvements and use cases. The subject of this post is to describe how it works in depth. As log archiving is closely tied with innodb redo logs the internals of redo logs will be covered too. This post would be useful not only for DBA but also for Software Engineers because not only common principles are considered but the specific code too, and knowledge from this post can be used for further MySQL code exploring and patching. What is the innodb log and how it is written? Let’s remember what are innodb logs, why they are written, what they are used for. The Innodb engine has buffer pool. This is a cache of database pages. Any changes are done on page in buffer pool, then page is considered as “dirty,” which means it must be flushed, and pushed to the flush list which is processed periodically by special thread. If pages are not flushed to disk and server is terminated unexpectedly the changes will be lost. To avoid this innodb writes changes to redo log and recover data from redo log during start. This technique allows to delay buffer pool pages flushing. It can increase performance because several changes of one page can be accumulated in memory and then flushed by one io. Except that flushed pages can be grouped to decrease the number of non-sequential io’s. But the down-side of this approach is time for data recovering. Let’s consider how this log is stored, generated and used for data recovering. Log files Redo log consists of a several log files which are treated as a circular buffer. The number and the size of log files can be configured. Each log file has a header. The description of this header can be found in “storage/innobase/include/log0log.h” by “LOG_GROUP_ID” keyword. Each log file contains log records. Redo log records are written sequentially by log blocks of OS_FILE_LOG_BLOCK_SIZE size which is equal to 512 bytes by default and can be changed with innodb option. Each record has its LSN. LSN is a “Log Sequence Number” – the number of bytes written to log from the log creation to the certain log record. Each block consists of header, trailer and log records. Log blocks Let’s consider log block header. The first 4 bytes of the header is log block number. The block number is very similar as LSN but LSN is measured in bytes and block number is measured by OS_FILE_LOG_BLOCK_SIZE. Here is the simple formula how LSN is converted to block number: return(((ulint) (lsn / OS_FILE_LOG_BLOCK_SIZE) & 0x3FFFFFFFUL) + 1); This formula can be found in log_block_convert_lsn_to_no() function. The next two bytes is the number of bytes used in the block. The next two bytes is the offset of the first MTR log record in this block. What is MTR will be described below. Currently it can be considered as a synonym of bunch of log records which are gathered together as a description of some logical operation. For example it can be a group of log records for inserting new row to some table. This field is used when there are records of several MTR’s in one block. The next four bytes is a checkpoint number. The trailer is four bytes of log block checksum. The above description can be found in “storage/innobase/include/log0log.h” by “LOG_BLOCK_HDR_NO” keyword. Before writing to disk log blocks must be somehow formed and stored. And the question is: How log blocks are stored in memory and on disk? Where log blocks are stored before flushing to disk and how they are written and flushed? Global log object and log buffer The answer to the first part of the question is log buffer. Server holds very important global object log_sys in memory. It contains a lot of useful information about logging state. Log buffer is pointed by log_sys->buf pointer which is initialized in log_init(). I would highlight the following log_sys fields that are used for work with log buffer and flushing: log_sys->buf_size – the size of log buffer, can be set with innodb-log-buffer-size variable, the default value is 8M; log_sys->buf_free – the offset from which the next log record will be written; log_sys->max_buf_free – if log_sys->buf_free is greater then this value log buffer must be flushed, see log_free_check(); log_sys->buf_next_to_write – the offset of the next log record to write to disk; log_sys->write_lsn – the LSN up to which log is written; log_sys->flushed_to_disk_lsn – the LSN up to which log is written and flushed; log_sys->lsn – the last LSN in log buffer; So log_sys->buf_next_to_write is between 0 and log_sys->buf_free, log_sys->write_lsn is equal or less log_sys->lsn, log_sys->flushed_to_disk_lsn is less or equal to log_sys->write_lsn. The relationships for those fields can be easily traced with debugged by setting up watchpoints. Ok, we have log buffer, but how do log records come to this buffer? Where log records come from? Innodb has special objects that allow you to gather redo log records for some operations in one bunch before writing them to log buffer. These objects are called “mini-transactions” and corresponding functions and data types have “mtr” prefix in the code. The objects itself are described in mtr_t “c” structure. The most interesting fields of this structure are the following: mtr_t::log – contains log records for the mini-transaction, mtr_t::memo – contains pointers to pages which are changed or locked by the mini-transaction, it is used to push pages to flush list and release locks after logs records are copied to log buffer in mtr_commit() (see mtr_memo_pop_all() called in mtr_commit()). mtr_start() function initializes an object of mtr_t type and mtr_commit() writes log records from mtr_t::log to log_sys->buf + log_sys->buf_free. So the typical sequence of any operation which changes data is the following: mtr_start(); // initialize mtr object some_ops... // operations on data which are logged in mtr_t::log mtr_commit(); // write logged operations from mtr_t::log to log buffer log_sys->buf page_cur_insert_rec_write_log() is a good example of how mtr records can be written and mtr::memo can be filled. The low-level function which writes data to log buffer is log_write_low(). This function is invoked inside of mtr_commit() and not only copy the log records from mtr_t object to log buffer log_sys->buf but also creates a new log blocks inside of log_sys->buf, fills their header, trailer, calculates checksum. So log buffer contains log blocks which are sequentially filled with log records which are grouped in “mini-transactions” which logically can be treated as some logical operation over data which consists of a sequence of mini-operations(log records). As log records are written sequentially in log buffer one mini-transaction and even one log record can be written in two neighbour blocks. That is why the header field which would contain the offset of the first MTR in the block is necessary to calculate the point from which log records parsing can be started. This field was described in 2.2. So we have a buffer of log blocks in a memory. How is data from this buffer written to disk? The mysql documentation says that this depends on innodb_flush_log_at_trx_commit option. There can be three cases depending on the value of this option. Let’s consider each of them. Writing log buffer to disk: innodb_flush_log_at_trx_commit is 1 or 2. The first two cases is when innodb_flush_log_at_trx_commit is 1 or 2. In these cases flush log records are written for 2 and flushed for 1 on each transaction commit. If innodb_flush_log_at_trx_commit is 2 log records are flushed periodically by special thread which will be considered later. The low-level function which writes log records from buffer to file is log_group_write_buf(). But in the most cases it is not called directly but it is called from more high level log_write_up_to(). For the current case the calling stack is the following: (trx_commit_in_memory() or trx_commit_complete_for_mysql() or trx_prepare() e.t.c)-> trx_flush_log_if_needed()-> trx_flush_log_if_needed_low()-> log_write_up_to()-> log_group_write_buf(). It is quite easy to find the higher levels of calling stack, just set up breakpoint on log_group_write_buf() and execute any sql query that modifies innodb data. For example for the simple “insert” sql query the higher levels of calling stack are the following: mysql_execute_command()-> trans_commit_stmt()-> ha_commit_trans()-> TC_LOG_DUMMY::commit()-> ha_commit_low()-> innobase_commit()-> trx_commit_complete_for_mysql()-> trx_flush_log_if_needed()-> ... . log_io_complete() callback is invoked when i/o is finished for log files (see fil_aio_wait()). log_io_complete() flushes log files if this is not forbidden by innodb_flush_method or innodb_flush_log_at_trx_commit options. Writing log buffer to disk: innodb_flush_log_at_trx_commit is equal to 0 The third case is when innodb_flush_log_at_trx_commit is equal to 0. For this case log buffer is NOT written to disk on transaction commit, it is written and flushed periodically by separate thread “srv_master_thread”. If innodb_flush_log_at_trx_commit = 0 log files are flushed in the same thread by the same calls. The calling stack is the following: srv_master_thread()-> (srv_master_do_active_tasks() or srv_master_do_idle_tasks() or srv_master_do_shutdown_tasks())-> srv_sync_log_buffer_in_background()-> log_buffer_sync_in_background()->log_write_up_to()->... . Special cases for logs flushing While log_io_complete() do flushing depending on innodb_flush_log_at_trx_commit value among others log_write_up_to() has it’s own flushing criteria. This is flush_to_disk function argument. So it is possible to force log files flushing even if innodb_flush_log_at_trx_commit = 0. Here are examples of such cases: 1) buf_flush_write_block_low() Each page contains information about the last applied LSN(buf_flush_write_block_low::newest_modification), each log record is a description of change on certain page. Imagine we flushed some changed pages but log records for these pages were not flushed and server goes down. After starting the server some pages will have the newest modifications, but some of them were not flushed and the correspondent log records are lost too. We will have inconsistent database in this case. That is why log records must be flushed before the pages they refer. 2) srv_sync_log_buffer_in_background() As it was described above this function is called periodically by special thread and forces flushing. 3) log_checkpoint() When checkpoint is made log files must be reliably flushed. 4) The special handlerton innobase_flush_logs() which can be called through ha_flush_logs() from mysql server. For example ha_flush_logs() is called from MYSQL_BIN_LOG::reset_logs() when “RESET MASTER” or “RESET SLAVE” are executed. 5) srv_master_do_shutdown_tasks() – on shutdown, ha_innobase::create() – on table creating, ha_innobase::delete_table() – on table removing, innobase_drop_database() – on all database tables removing, innobase_rename_table() – on table rename e.t.c If log files are treated as circular buffer what happens when the buffer is overflown? Briefly. Innodb has a mechanism which allows you to avoid overflowing. It is called “checkpoints.” The checkpoint is a state when log files are synchronized with data files. In this case there is no need to keep the history of changes before checkpoint because all pages with the last modifications LSN less or equal to checkpoint LSN are flushed and the log files space from the last written LSN to the last checkpoint LSN can be reused. We will not describe a checkpoint process here because it is a separate interesting subject. The only thing we need to know is when checkpoint happens all pages with modification LSN less or equal to checkpoint LSN are reliably flushed. How archived logs are written by server. So the log contains information about page changes. But as we said, log files are the circular buffer. This means that they occupy fixed disk size and the oldest records can be rewritten by the newest ones as there are points when data files are synchronized with log files called checkpoints and there is no need to store the previous history of log records to guarantee database consistency. The idea is to save somewhere all log records to have the possibility of applying them to backuped data to have some kind if incremental backup. For example if we want to have an archive of log records. As log consists of log files it is reasonable to store log records in such files too, and these files are called “archived logs.” Archived log files are written to the directory which can be set with special innodb option. Each file has the same size as innodb log size and the suffix of each archived file is the LSN from which it is started. As well as log writing system log archiving system stores its data in global log_sys object. Here are the most valuable fields in log_sys from my point of view: log_sys->archive_buf, log_sys->archive_buf_size – logs archive buffer and its size, log records are copied from log buffer log_sys->buf to this buffer before writing to disk; log_sys->archiving_phase – the current phase of log archiving: LOG_ARCHIVE_READ when log records are being copied from log_sys->buf to log_sys->archive_buf, LOG_ARCHIVE_WRITE when log_sys->archive_buf is being written to disk; log_sys->archived_lsn – the LSN to which log files are written; log_sys->next_archived_lsn – the LSN to which write operations was invoked but not yet finished; log_sys->max_archived_lsn_age – the maximum difference between log_sys->lsn and log_sys->archived_lsn, if this difference exceeds the log are being archived synchronously, i.e. the difference is decreased; log_sys->archive_lock – this is rw-lock which is used for synchronizing LOG_ARCHIVE_WRITE and LOG_ARCHIVE_READ phases, it is x-locked on LOG_ARCHIVE_WRITE phase. So how is data copied from log_sys->buf to log_sys->archived_buf? log_archive_do() is used for this. It is not only set the proper state for archived log fields in log_sys but also invokes log_group_read_log_seg() with corresponding arguments which not only copy data from log buffer to archived log buffer but also invokes asynchronous write operation for archived log buffer. log_archive_do() can wait until io operations are finished using log_sys->archive_lock if corresponding function parameter is set. The main question is on what circumstances log_archive_do() is invoked, i.e. when log records are being written to archived log files. The first call stack is the following: log_free_check()-> log_check_margins()-> log_archive_margin()-> log_archive_do(). Here is text of log_free_check() with comments: /*********************************************************************// Checks if there is need for a log buffer flush or a new checkpoint, and does this if yes. Any database operation should call this when it has modified more than about 4 pages. NOTE that this function may only be called when the OS thread owns no synchronization objects except the dictionary mutex. */ UNIV_INLINE void log_free_check(void) /*================*/ { #ifdef UNIV_SYNC_DEBUG ut_ad(sync_thread_levels_empty_except_dict()); #endif /* UNIV_SYNC_DEBUG */ if (log_sys->check_flush_or_checkpoint) { log_check_margins(); } } log_sys->check_flush_or_checkpoint is set when there is no enough free space in log buffer or it is time to do checkpoint or any other bound case. log_archive_margin() is invoked only if the limit if the difference between log_sys->lsn and log_sys->archived_lsn is exceeded. Let’s refer to this difference as archived lsn age. One more call log_archive_do() is from log_open() when archived lsn age exceeds some limit. log_open() is called on each mtr_commit(). And for this case archived logs are written synchronously. The next synchronous call is from log_archive_all() during shutdown. Summarizing all above archived logs begins to be written when the log buffer is full enough to be written or when checkpoint happens or when the server is in the process of shut down. And there is no any delay between writing to archive log buffer and writing to disk. I mean there is no way to say that archived logs must be written once a second as it is possible for redo logs with innodb_flush_log_at_trx_commit = 0. As soon as data is copied to the buffer the write operation is invoked immediately for this buffer. Archived log buffer is not filled on each mtr_commit() so it does not slow down the usual logging process. The exception is when there are a lot of io operations what can be the reason of archive log age is too big. The result of big archive log age is the synchronous archived logs writing during mtr_commit(). Memory to memory copying is quite fast operation that is why the data is copied to archived log buffer and is written to disk asynchronously minimizing delays which can be caused by logs archiving. PS: Here is another call stack for writing archived log buffer to archived log files: log_io_complete()->log_io_complete_archive()->log_archive_check_completion_low()->log_archive_groups(). I propose to explore this stack yourself. Logs recovery process, how it is started and works inside. Archived logs applying. So we discovered how innodb redo logging works, and how redo logs are archived. And the last uncovered thing is how recovery works and how archived logs are applied. These two processes are very similar – that is why they are discussed in one section of this post. The story begins with innobase_start_or_create_for_mysql() which is invoked from innobase_init(). The following trident in innobase_start_or_create_for_mysql() can be used to search the relevant code: if (create_new_db) { ... } else if (srv_archive_recovery) { ... } else { ... } The second condition and the last one is the place from which archived logs applying and innodb logs recovery processes correspondingly start. These two blocks wrap two pairs of functions: recv_recovery_from_archive_start() recv_recovery_from_archive_finish() and recv_recovery_from_checkpoint_start() recv_recovery_from_checkpoint_finish() And all the magic happens in these pairs. As well as global log_sys object for redo logging there is global recv_sys object for innodb recovery and archived logs applying. It is created and initialized in recv_sys_create() and recv_sys_init() functions correspondingly. The following fields if recv_sys object are the most important from my point: recv_sys->limit_lsn – the LSN up to which recovery should be made, this value is initialized with the maximum value of uint64_t(see #define LSN_MAX) for the recovery process and with certain value which is passed as an argument of recv_recovery_from_archive_start() function and can be set via xtrabackup option for log applying; recv_sys->parse_start_lsn – the LSN from which logs parsing is started, for the the logs recovery this value equals to the last checkpoint LSN, for logs applying this is last applied LSN; recv_sys->scanned_lsn – the LSN up to which log files are scanned; recv_sys->recovered_lsn – the LSN up to which log records are applied, this value <= recv_sys->scanned_lsn; The first thing that must be done for starting recovery process is to find out the point in log files where the recovery must be started from. This is the last checkpoint LSN. recv_find_max_checkpoint() proceed this. As we can see in log_group_checkpoint() the following code writes checkpoint info into two places in the first log file depending on the checkpoint number: /* We alternate the physical place of the checkpoint info in the first log file */ if ((log_sys->next_checkpoint_no & 1) == 0) { write_offset = LOG_CHECKPOINT_1; } else { write_offset = LOG_CHECKPOINT_2; } So recv_find_max_checkpoint() reads checkpoint info from both places and selects the latest checkpoint. The same idea is applied for logs, too, but the last applied LSN instead of last checkpoint LSN must be found. Here is the call stack for reading last applied LSN: innobase_start_or_create_for_mysql()-> open_or_create_data_files()-> fil_read_first_page(). The last applied LSN is stored in the first page of data files in (min|max)_flushed_lsn fields(see FIL_PAGE_FILE_FLUSH_LSN offset). These values are written in fil_write_flushed_lsn_to_data_files() function on server shutdown. So the main difference between logs applying and recovery process at this stage is the manner of calculating LSN from which log records will be read. For logs applying the last flushed LSN is used but for recovery process it is the last checkpoint LSN. Why does this difference take place? Logs can be applied periodically. Assume we gather archived logs and apply them once an hour to have fresh backup. After applying the previous bunch of log files there can be unfinished transactions. For the recovery process any unfinished transactions are rolled back to have consistent db state at server starting. But for the logs applying process there is no need to roll back them because any unfinished transactions can be finished during the next logs applying. After calculating the start LSN the sequence of actions is the same for both recovering and applying. The next step is reading and parsing log records. See recv_group_scan_log_recs() which is invoked from recv_recovery_from_checkpoint_start_func() for logs recovering and recv_recovery_from_archive_start()->log_group_recover_from_archive_file() for logs applying. The first we read log records to some buffer and then invoke recv_scan_log_recs() to parse them. recv_scan_log_recs() checks each log block on consistency(checksum + comparing the log block number written in log block with log block number calculated from log block LSN) and other edge cases and copy it to parsing buffer recv_sys->buf with recv_sys_add_to_parsing_buf() function. The parsing buffer is then parsed by recv_parse_log_recs(). Log records are stored in hash table recv_sys->addr_hash. The key for this hash table is calculated basing on space id and page number pair. This pair refers to the page to which log records must be applied. The value of the hash table is object of recv_addr_t type. recv_addr_t type contains rec_list field which is the list of log records for applying to the (space id, page num) page (see recv_add_to_hash_table(). After parsing and storing log record in hash table recv_sys->addr_hash log records are applied. The function which is responsible for log records applying is recv_apply_hashed_log_recs(). It is invoked from recv_scan_log_recs() if there is no enough memory to store log records and at the end of recovering/applying process. For each element of recv_sys->addr_hash, i.e. for each DB page which must be changed with log records recv_recover_page() is invoked. It can be invoked as from recv_apply_hashed_log_recs() in the case if page is already in buffer pool of from buf_page_io_complete() on io completion, i.e. just after page was read from storage. Applying log records on page read completion is necessary and very convenient. Assume log records have not yet applied as we had enough memory to store the whole recovery log records. But we want for example to boot DB dictionary. I this case any records that concern to the pages of the dictionary will be applied to those pages just after reading them from storage to buffer pool. The function which applies log records to the certain page is recv_recover_page_func(). It gets the list of log records for the certain page from recv_sys->addr_hash hash table, for each element of this list it compares the lsn of last page changes with the LSN of the record, and if the former is greater the later it applies log record to the page. After applying all log records from archived logs xtrabackup writes last applied LSN to (min|max)_flushed LSN fields of each data file and finishes execution. The logs recovery process rollbacks all unfinished transactions unless this is forbidden with innodb-force-recovery parameter. Conclusion We covered the processes of redo logs writing and recovery in depth. These are very important processes as they provide data consistency on crashes. These two processes became a base for logs archiving and applying features. As log records can describe any data changes the idea is to store these records somewhere and then apply them to backups for organizing some kind of incremental backup. The features were implemented a short time ago and currently they are not widely used. So if you have something to say about them you are welcome to comment for discussion.

April 16, 2014

by Peter Zaitsev

· 6,211 Views

Circuit Breaker Pattern in Apache Camel

Camel is very often used in distributed environments for accessing remote resources. Remote services may fail for various reasons and periods. For services that are temporarily unavailable and recoverable after short period of time, a retry strategy may help. But some services can fail or hang for longer period of time making the calling application unresponsive and slow. A good strategy to prevent from cascading failures and exhaustion of critical resources is the Circuit Breaker pattern described by Michael Nygard in the Release It! book. Circuit Breaker is a stateful pattern that wraps the failure-prone resource and monitors for errors. Initially the Circuit Breaker is in closed state and passes all calls to the wrapped resource. When the failures reaches a certain threshold, the circuit moves to open state where it returns error to the caller without actually calling the wrapped resource. This prevents from overloading the already failing resource. While at this state, we need a mechanism to detect whether the failures are over and start calling the protected resource. This is where the third state called half-open comes into play. This state is reached after a certain time following the last failure. At this state, the calls are passed through to the protected resource, but the result of the call is important. If the call is successful, it is assumed that the protected resource has recovered and the circuit is moved into closed state, and if the call fails, the timeout is reset, and the circuit is moved back to open state where all calls are rejected. Here is the state diagram of Circuit Breaker from Martin Fowler's post: How Circuit Breaker is implemented in Camel? Circuit Breaker is available in the latest snapshot version of Camel as a Load balancer policy. Camel Load Balancer already has policies for Round Robin, Random, Failover, etc. and now also CircuiBreaker policy. Here is an example load balancer that uses Circuit Breaker policy with threshold of 2 errors and halfOpenAfter timeout of 1 second. Notice also that this policy applies only to errors caused by MyCustomException. new RouteBuilder() { public void configure() { from("direct:start").loadBalance() .circuitBreaker(2, 1000L, MyCustomException.class) .to("mock:result"); } }; And here is the same example using Spring XML DSL: MyCustomException

April 16, 2014

by Bilgin Ibryam

· 18,436 Views · 1 Like

A Docker ‘Hello World' With Mono

Docker is a lightweight virtualization technology for Linux that promises to revolutionize the deployment and management of distributed applications. Rather than requiring a complete operating system, like a traditional virtual machine, Docker is built on top of Linux containers, a feature of the Linux kernel, that allows light-weight Docker containers to share a common kernel while isolating applications and their dependencies. There’s a very good Docker SlideShare presentation here that explains the philosophy behind Docker using the analogy of standardized shipping containers. Interesting that the standard shipping container has done more to create our global economy than all the free-trade treaties and international agreements put together. A Docker image is built from a script, called a ‘Dockerfile’. Each Dockerfile starts by declaring a parent image. This is very cool, because it means that you can build up your infrastructure from a layer of images, starting with general, platform images and then layering successively more application specific images on top. I’m going to demonstrate this by first building an image that provides a Mono development environment, and then creating a simple ‘Hello World’ console application image that runs on top of it. Because the Dockerfiles are simple text files, you can keep them under source control and version your environment and dependencies alongside the actual source code of your software. This is a game changer for the deployment and management of distributed systems. Imagine developing an upgrade to your software that includes new versions of its dependencies, including pieces that we’ve traditionally considered the realm of the environment, and not something that you would normally put in your source repository, like the Mono version that the software runs on for example. You can script all these changes in your Dockerfile, test the new container on your local machine, then simply move the image to test and then production. The possibilities for vastly simplified deployment workflows are obvious. Docker brings concerns that were previously the responsibility of an organization’s operations department and makes them a first class part of the software development lifecycle. Now your infrastructure can be maintained as source code, built as part of your CI cycle and continuously deployed, just like the software that runs inside it. Docker also provides docker index, an online repository of docker images. Anyone can create an image and add it to the index and there are already images for almost any piece of infrastructure you can imagine. Say you want to use RabbitMQ, all you have to do is grab a handy RabbitMQ images such as https://index.docker.io/u/tutum/rabbitmq/ and run it like this: docker run -d -p 5672:5672 -p 55672:55672 tutum/rabbitmq The –p flag maps ports between the image and the host. Let’s look at an example. I’m going to show you how to create a docker image for the Mono development environment and have it built and hosted on the docker index. Then I’m going to build a local docker image for a simple ‘hello world’ console application that I can run on my Ubuntu box. First we need to create a Docker file for our Mono environment. I’m going to use the Mono debian packages from directhex. These are maintained by the official Debian/Ubuntu Mono team and are the recommended way of installing the latest Mono versions on Ubuntu. Here’s the Dockerfile: #DOCKER-VERSION 0.9.1 # #VERSION 0.1 # # monoxide mono-devel package on Ubuntu 13.10 FROM ubuntu:13.10 MAINTAINER Mike Hadlow RUN sudo DEBIAN_FRONTEND=noninteractive apt-get install -y -q software-properties-common RUN sudo add-apt-repository ppa:directhex/monoxide -y RUN sudo apt-get update RUN sudo DEBIAN_FRONTEND=noninteractive apt-get install -y -q mono-devel Notice the first line (after the comments) that reads, ‘FROM ubuntu:13.10’. This specifies the parent image for this Dockerfile. This is the official docker Ubuntu image from the index. When I build this Dockerfile, that image will be automatically downloaded and used as the starting point for my image. But I don’t want to build this image locally. Docker provide a build server linked to the docker index. All you have to do is create a public GitHub repository containing your dockerfile, then link the repository to your profile on docker index. You can read the documentation for the details. The GitHub repository for my Mono image is at https://github.com/mikehadlow/ubuntu-monoxide-mono-devel. Notice how the Docker file is in the root of the repository. That’s the default location, but you can have multiple files in sub-directories if you want to support many images from a single repository. Now any time I push a change of my Dockerfile to GitHub, the docker build system will automatically build the image and update the docker index. You can see image listed here:https://index.docker.io/u/mikehadlow/ubuntu-monoxide-mono-devel/ I can now grab my image and run it interactively like this: $ sudo docker pull mikehadlow/ubuntu-monoxide-mono-devel Pulling repository mikehadlow/ubuntu-monoxide-mono-devel f259e029fcdd: Download complete 511136ea3c5a: Download complete 1c7f181e78b9: Download complete 9f676bd305a4: Download complete ce647670fde1: Download complete d6c54574173f: Download complete 6bcad8583de3: Download complete e82d34a742ff: Download complete $ sudo docker run -i mikehadlow/ubuntu-monoxide-mono-devel /bin/bash mono --version Mono JIT compiler version 3.2.8 (Debian 3.2.8+dfsg-1~pre1) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: altstack Notifications: epoll Architecture: amd64 Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen exit Next let’s create a new local Dockerfile that compiles a simple ‘hello world’ program, and then runs it when we run the image. You can follow along with these steps. All you need is a Ubuntu machine with Docker installed. First here’s our ‘hello world’, save this code in a file named hello.cs: using System; namespace Mike.MonoTest { public class Program { public static void Main() { Console.WriteLine("Hello World"); } } } Next we’ll create our Dockerfile. Copy this code into a file called ‘Dockerfile’: #DOCKER-VERSION 0.9.1 FROM mikehadlow/ubuntu-monoxide-mono-devel ADD . /src RUN mcs /src/hello.cs CMD ["mono", "/src/hello.exe"] Once again, notice the ‘FROM’ line. This time we’re telling Docker to start with our mono image. The next line ‘ADD . /src’, tells Docker to copy the contents of the current directory (the one containing our Dockerfile) into a root directory named ‘src’ in the container. Now our hello.cs file is at /src/hello.cs in the container, so we can compile it with the mono C# compiler, mcs, which is the line ‘RUN mcs /src/hello.cs’. Now we will have the executable, hello.exe, in the src directory. The line ‘CMD [“mono”, “/src/hello.exe”]’ tells Docker what we want to happen when the container is run: just execute our hello.exe program. As an aside, this exercise highlights some questions around what best practice should be with Docker. We could have done this in several different ways. Should we build our software independently of the Docker build in some CI environment, or does it make sense to do it this way, with the Docker build as a step in our CI process? Do we want to rebuild our container for every commit to our software, or do we want the running container to pull the latest from our build output? Initially I’m quite attracted to the idea of building the image as part of the CI but I expect that we’ll have to wait a while for best practice to evolve. Anyway, for now let’s manually build our image: $ sudo docker build -t hello . Uploading context 1.684 MB Uploading context Step 0 : FROM mikehadlow/ubuntu-monoxide-mono-devel ---> f259e029fcdd Step 1 : ADD . /src ---> 6075dee41003 Step 2 : RUN mcs /src/hello.cs ---> Running in 60a3582ab6a3 ---> 0e102c1e4f26 Step 3 : CMD ["mono", "/src/hello.exe"] ---> Running in 3f75e540219a ---> 1150949428b2 Successfully built 1150949428b2 Removing intermediate container 88d2d28f12ab Removing intermediate container 60a3582ab6a3 Removing intermediate container 3f75e540219a You can see Docker executing each build step in turn and storing the intermediate result until the final image is created. Because we used the tag (-t) option and named our image ‘hello’, we can see it when we list all the docker images: $ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE hello latest 1150949428b2 10 seconds ago 396.4 MB mikehadlow/ubuntu-monoxide-mono-devel latest f259e029fcdd 24 hours ago 394.7 MB ubuntu 13.10 9f676bd305a4 8 weeks ago 178 MB ubuntu saucy 9f676bd305a4 8 weeks ago 178 MB ... Now let’s run our image. The first time we do this Docker will create a container and run it. Each subsequent run will reuse that container: $ sudo docker run hello Hello World And that’s it. Imagine that instead of our little hello.exe, this image contained our web application, or maybe a service in some distributed software. In order to deploy it, we’d simply ask Docker to run it on any server we like; development, test, production, or on many servers in a web farm. This is an incredibly powerful way of doing consistent repeatable deployments. To reiterate, I think Docker is a game changer for large server side software. It’s one of the most exciting developments to have emerged this year and definitely worth your time to check out.

April 3, 2014

by Mike Hadlow

· 11,309 Views

Docker: Bulk Remove Images and Containers

I’ve just started looking at Docker. It’s a cool new technology that has the potential to make the management and deployment of distributed applications a great deal easier. I’d very much recommend checking it out. I’m especially interested in using it to deploy Mono applications because it promises to remove the hassle of deploying and maintaining the mono runtime on a multitude of Linux servers. I’ve been playing around creating new images and containers and debugging my Dockerfile, and I’ve wound up with lots of temporary containers and images. It’s really tedious repeatedly running ‘docker rm’ and ‘docker rmi’, so I’ve knocked up a couple of bash commands to bulk delete images and containers. Delete all containers: sudo docker ps -a -q | xargs -n 1 -I {} sudo docker rm {} Delete all un-tagged (or intermediate) images: sudo docker rmi $( sudo docker images | grep '' | tr -s ' ' | cut -d ' ' -f 3)

April 2, 2014

by Mike Hadlow

· 14,676 Views

Spring-boot and Scala

There is actually nothing very special about writing a Spring-boot web application purely using Scala, it just works! In this blog entry, I will slowly transform a Java based Spring-boot application completely to Scala - the Java based sample is available at this github location - https://github.com/bijukunjummen/spring-boot-mvc-test To start with, I had the option of going with either a maven based build or gradle based build - I opted to go with a gradle based build as gradle has a greatscala plugin, so for scala support the only changes to a build.gradle build script is the following: ... apply plugin: 'scala' ... jar { baseName = 'spring-boot-scala-web' version = '0.1.0' } dependencies { ... compile 'org.scala-lang:scala-library:2.10.2' ... } Essentially adding in the scala plugin and specifying the version of the scala-library. Now, I have one entity, a Hotel class, it transforms to the following with Scala: package mvctest.domain .... @Entity class Hotel { @Id @GeneratedValue @BeanProperty var id: Long = _ @BeanProperty var name: String = _ @BeanProperty var address: String = _ @BeanProperty var zip: String = _ } Every property is annotated with @BeanProperty annotation to instruct scala to generate the Java bean based getter and setter on the variables. With the entity in place a Spring-data repository for CRUD operations on this entity transforms from: import mvctest.domain.Hotel; import org.springframework.data.repository.CrudRepository; public interface HotelRepository extends CrudRepository { } to the following in Scala: import org.springframework.data.repository.CrudRepository import mvctest.domain.Hotel import java.lang.Long trait HotelRepository extends CrudRepository[Hotel, Long] And the Scala based controller which uses this repository to list the Hotels - vi... import org.springframework.web.bind.annotation.RequestMapping import org.springframework.stereotype.Controller import mvctest.service.HotelRepository import org.springframework.beans.factory.annotation.Autowired import org.springframework.ui.Model @Controller @RequestMapping(Array("/hotels")) class HotelController @Autowired() (private val hotelRepository: HotelRepository) { @RequestMapping(Array("/list")) def list(model: Model) = { val hotels = hotelRepository.findAll() model.addAttribute("hotels", hotels) "hotels/list" } } Here the constructor autowiring of the HotelRepository just works!. Do note the slightly awkward way of specifying the @Autowired annotation for constructor based injection. Finally, Spring-boot based application requires a main class to bootstrap the entire application, where this bootstrap class looks like this with Java: @Configuration @EnableAutoConfiguration @ComponentScan public class SampleWebApplication { public static void main(String[] args) { SpringApplication.run(SampleWebApplication.class, args); } } In scala, though I needed to provide two classes, one to specify the annotation and other to bootstrap the application, there may be better way to do this(blame it on my lack of Scala depth!) - package mvctest import org.springframework.context.annotation.Configuration import org.springframework.boot.autoconfigure.EnableAutoConfiguration import org.springframework.context.annotation.ComponentScan import org.springframework.boot.SpringApplication @Configuration @EnableAutoConfiguration @ComponentScan class SampleConfig object SampleWebApplication extends App { SpringApplication.run(classOf[SampleConfig]); } and that's it, with this set-up the entire application just works, the application can be started up with the following: ./gradlew build && java -jar build/libs/spring-boot-scala-web-0.1.0.jar and the sample endpoint listing the hotels accessed at this url: http://localhost:8080/hotels/list I have the entire git project available at this github location: https://github.com/bijukunjummen/spring-boot-scala-web In conclusion, Scala can be considered a first class citizen for a Spring-boot based application and there is no special configuration required to get a Scala based Spring-boot application to work. It just works!

April 2, 2014

by Biju Kunjummen

· 71,132 Views · 11 Likes

Distributed Counters Feature Design

this is another experiment with longer posts. previously, i used the time series example as the bed on which to test some ideas regarding feature design, to explain how we work and in general work out the rough patches along the way. i should probably note that these posts are purely fiction at this point. we have no plans to include a time series feature in ravendb at this time. i am trying to work out some thoughts in the open and get your feedback. at any rate, yesterday we had a request for cassandra style counters at the mailing list. and as long as i am doing feature design series, i thought that i could talk about how i would go about implementing this. again, consider this fiction, i have no plans of implementing this at this time. the essence of what we want is to be able to… count stuff. efficiently, in a distributed manner, with optional support for cross data center replication. very roughly, the idea is to have “sub counters”, unique for every node in the system. whenever you increment the value, we log this to our own sub counter, and then replicate it out. whenever you read it, we just sum all the data we have from all the sub counters. let us outline the various parts of the solution in the same order as the one i used for time series. storage a counter is just a named 64 bits signed integer. a counter name can be any string up to 128 printable characters. the external interface of the storage would look like this: 1: public struct counterincrement 2: { 3: public string name; 4: public long change; 5: } 6: 7: public struct counter 8: { 9: public string name; 10: public string source; 11: public long value; 12: } 13: 14: public interface icounterstorage 15: { 16: void localincrementbatch(counterincrement[] batch); 17: 18: counter[] read(string name); 19: 20: void replicatedupdates(counter[] updates); 21: } as you can see, this gives us very simple interface for the storage. we can either change the data locally (which modify our own storage) or we can get an update from a replica about its changes. there really isn’t much more to it, to be fair. the localincrementbatch() increment a local value, and read() will return all the values for a counter. there is a little bit of trickery involved in how exactly one would store the counter values. for now, i think we’ll store each counter as two step values. we’ll have a tree of multi tree values that will carry each value from each source. that means that a counter will take roughly 4kb or so. this is easy to work with and nicely fit the model voron uses internally. note that we’ll outline additional requirement for storage (searching for counter by prefix, iterating over counters, addresses of other servers, stats, etc) below. i’m not showing them here because they aren’t the major issue yet. over the wire skipping out on any optimizations that might be required, we will expose the following endpoints: get /counters/read?id=users/1/visits&users/1/posts <—will return json response with all the relevant values (already summed up). { “users/1/visits”: 43, “users/1/posts”: 3 } get /counters/read?id=users/1/visits&users/1/1/posts&raw=true <—will return json response with all the relevant values, per source. { “users/1/visits”: {“rvn1”: 21, “rvn2”: 22 } , “users/1/posts”: { “rvn1”: 2, “rvn3”: 1 } } post /counters/increment <– allows to increment counters. the request is a json array of the counter name and the change. for a real system, you’ll probably need a lot more stuff, metrics, stats, etc. but this is the high level design, so this would be enough. note that we are skipping the high performance stream based writes we outlined for time series. we’ll probably won’t need them, so that doesn’t matter, but they are an option if we need them. system behavior this is where it is really not interesting, there is very little behavior here, actually. we only have to read the data from the storage, sum it up, and send it to the user. hardly what i’ll call business logic. client api the client api will probably look something like this: 1: counters.increment("users/1/posts"); 2: counters.increment("users/1/visits", 4); 3: 4: using(var batch = counters.batch()) 5: { 6: batch.increment("users/1/posts"); 7: batch.increment("users/1/visits",5); 8: batch.submit(); 9: } note that we’re offering both batch and single api. we’ll likely also want to offer a fire & forget style, which will be able to offer even better performance (because they could do batching across more than a single thread), but that is out of scope for now. for simplicity sake, we are going to have the client just a container for all of endpoints that it knows about. the container would be responsible for… updating the client visible topology, selecting the best server to use at any given point, etc. user interface there isn’t much to it. just show a list of counter values in a list. allow to search by prefix, allow to dive into a particular counter and read its raw values, but that is about it. oh, and allow to delete a counter. deleting data honestly, i really hate deletes. they are very expensive to handle properly the moment you have more than a single node. in this case, there is an inherent race condition between a delete going out and another node getting an increment. and then there is the issue of what happens if you had a node down when you did the delete, etc. this just sucks. deletion are handled normally, (with the race condition caveat, obviously), and i’ll discuss how we replicate them in a bit. high availability / scale out by definition, we actually don’t want to have storage replication here. either log shipping or consensus based. we actually do want to have different values, because we are going to be modifying things independently on many servers. that means that we need to do replication at the database level. and that leads to some interesting questions. again, the hard part here is the deletes. actually, the really hard part is what we are going to do with the new server problem. the new server problem dictates how we are going to bring a new server into the cluster. if we could fix the size of the cluster, that would make things a lot easier. however, we are actually interested in being able to dynamically grow the cluster size. therefor, there are only two real ways to do it: add a new empty node to the cluster, and have it be filled from all the other servers. add a new node by backing up an existing node, and restoring as a new node. ravendb, for example, follows the first option. but it means that in needs to track a lot more information. the second option is actually a lot simpler, because we don’t need to care about keeping around old data. however, this means that the process of bringing up a new server would now be: update all nodes in the cluster with the new node address (node isn’t up yet, replication to it will fail and be queued). backup an existing node and restore at the new node. start the new node. the order of steps is quite important. and it would be easy to get it wrong. also, on large systems, backup & restore can take a long time. operationally speaking, i would much rather just be able to do something like, bring a new node into the cluster in “silent” mode. that is, it would get information from all the other nodes, and i can “flip the switch” and make it visible to clients at any point in time. that is how you do it with ravendb, and it is an incredibly powerful system, when used properly. that means that for all intents and purposes, we don’t do real deletes. what we’ll actually do is replace the counter value with delete marker. this turns deletes into a much simple “just another write”. it has the sad implication of not free disk space on deletes, but deletes tend to be rare, and it is usually fine to add a “purge” admin option that can be run on as needed basis. but that brings us to an interesting issue, how do we actually handle replication. the topology map to simplify things, we are going to go with one way replication from a node to another. that allows complex topologies like master-master, cluster-cluster, replication chain, etc. but in the end, this is all about a single node replication to another. the first question to ask is, are we going to replicate just our local changes, or are we going to have to replicate external changes as well? the problem with replicating external changes is that you may have the following topology: now, server a got a value and sent it to server b. server b then forwarded it to server c. however, at that point, we also have a the value from server a replicated directly to server c. which value is it supposed to pick? and what about a scenario where you have more complex topology? in general, because in this type of system, we can have any node accept writes, and we actually desire this to be the case , we don’t want this behavior. we want to only replicate local data, not all the data. of course, that leads to an annoying question, what happens if we have a 3 node cluster, and one node fails catastrophically. we can bring a new node in, and the other two nodes will be able to fill in their values via replication, but what about the node that is down? the data isn’t gone, it is still right there in the other two nodes, but we need a way to pull it out. therefor, i think that the best option would be to say that nodes only replicate their local state, except in the case of a new node. a new node will be told the address of an existing node in the cluster, at which point it will: register itself in all the nodes in the cluster (discoverable from the existing node). this assumes a standard two way replication link between all servers, if this isn’t the case, the operators would have the responsibility to setup the actual replication semantics on their own. new node now starts getting updates from all the nodes in the cluster. it keeps them in a log for now, not doing anything yet. ask that node for a complete update of all of its current state. when it has all the complete state of the existing node, it replays all of the remembered logs that it didn’t have a chance to apply yet. then it announces that it is in a valid state to start accepting client connections. note that this process is likely to be very sensitive to high data volumes. that is why you’ll usually want to select a backup node to read from, and that decision is an ops decision. you’ll also want to be able to report extensively on the current status of the node, since this can take a while, and ops will be watching this very closely. server name a node requires a unique name. we can use guids, but those aren’t readable, so we can use machine name + port, but those can change. ideally, we can require the user to set us up with a unique name. that is important for readability and for being able to alter see all the values we have in all the nodes. it is important that names are never repeated, so we’ll probably have a guid there anyway, just to be on the safe side. actual replication semantics since we have the new server problem down to an automated process, we can choose the drastically simpler model of just having an internal queue per each replication destination. whenever we make a change, we also make a note of that in the queue for that destination, then we start an async replication process to that server, sending all of our updates there. it is always safe to overwrite data using replication, because we are overwriting our own data, never anyone else. and… that is about it, actually. there are probably a lot of details that i am missing / would discover if we were to actually implement this. but i think that this is a pretty good idea about what this feature is about.

March 25, 2014

by Oren Eini

· 12,649 Views · 1 Like

How to Use NodeManager to Control WebLogic Servers

In my previous post, you have seen how we can start a WebLogic admin and multiple managed servers. One downside with that instruction is that those processes will start in foreground and the STDOUT are printed on terminal. If you intended to run these severs as background services, you might want to try the WebLogic node manager wlscontrol.sh tool. I will show you how you can get Node Manager started here. The easiest way is still to create the domain directory with the admin server running temporary and then create all your servers through the /console application as described in last post. Once you have these created, then you may shut down all these processes and start it with Node Manager. 1. cd $WL_HOME/server/bin && startNodeManager.sh & 3. $WL_HOME/common/bin/wlscontrol.sh -d mydomain -r $HOME/domains/mydomain -c -f startWebLogic.sh -s myserver START 4. $WL_HOME/common/bin/wlscontrol.sh -d mydomain -r $HOME/domains/mydomain -c -f startManagedWebLogic.sh -s appserver1 START The first step above is to start and run your Node Manager. It is recommended you run this as full daemon service so even OS reboot can restart itself. But for this demo purpose, you can just run it and send to background. Using the Node Manager we can then start the admin in step 2, and then to start the managed server on step 3. The NodeManager can start not only just the WebLogic server for you, but it can also monitor them and automatically restart them if they were terminated for any reasons. If you want to shutdown the server manually, you may use this command using Node Manager as well: $WL_HOME/common/bin/wlscontrol.sh -d mydomain -s appserver1 KILL The Node Manager can also be used to start servers remotely through SSH on multiple machines. Using this tool effectively can help managing your servers across your network. You may read more details here: http://docs.oracle.com/cd/E23943_01/web.1111/e13740/toc.htm TIPS1: If there is problem when starting server, you may wnat to look into the log files. One log file is the/servers//logs/.out of the server you trying to start. Or you can look into the Node Manager log itself at $WL_HOME/common/nodemanager/nodemanager.log TIPS2: You add startup JVM arguments to each server starting with Node Manager. You need to create a file under /servers//data/nodemanager/startup.properties and add this key value pair:Arguments = -Dmyapp=/foo/bar TIPS3: If you want to explore Windows version of NodeManager, you may want to start NodeManager without native library to save yourself some trouble. Try adding NativeVersionEnabled=false to$WL_HOME/common/nodemanager/nodemanager.properties file.

March 24, 2014

by Zemian Deng

· 14,287 Views

As well as being key value store, Redis offers a publish subscribe messaging implementation. This post will describe a simple scenario, using Spring Data Redis, of adding a message domain object to a repository via a REST call, publishing that message to a channel, subscribers to that channel receiving that message who as a result set any long polling deferred results with the message. The two key classes in the Redis publish subscribe mechanism are the RedisTemplate class and the RedisMessageListenerContainer class. The RedisTemplate contains the JedisConnectionFactory which holds the Redis connection details and as well as the methods to manipulate the key value stores, there’s a publish method calledconvertAndSend. This method takes two arguments. The first being the channel name of where the messages need to be published to and the second being the object to be sent. In this example, the publishing of the message is done after the Message is persisted via an aspect. @Aspect @Component public class MessageAspect extends AbstractRedisAspect { private static final Logger LOGGER = LoggerFactory .getLogger(MessageAspect.class); @Value("${messaging.redis.channel.messages}") private String channelName; @After("execution(* com.city81.redisPubSub.repository.MessageDao.save(..))") public void interceptMessage(JoinPoint joinPoint) { Message message = (Message) joinPoint.getArgs()[0]; // this publishes the message this.redisTemplate.convertAndSend(channelName, message); } } The RedisMessageListenerContainer, as well as holding the JedisConnectionFactory, holds a map of message listeners where the key is a message listener instance and the value the channel. The message listener instance references a class which implements the onMessage method of theMessageListener interface. When a message is published, those subscribers who are listening to that channel will then receive the published message via the onMessage method. The published message contains the serialised object that was sent in the body of the Redis Message and needs to be deserialised and cast to the original object. public void onMessage( org.springframework.data.redis.connection.Message redisMessage, byte[] pattern) { Message message = (Message) SerializationUtils.deserialize(redisMessage.getBody()); // set the deferred results for the user for (DeferredResult deferredResult : this.messageDeferredResultList) { deferredResult.setResult(message); } } The DeferredResult list is populated by calls to the REST service's getNewMessage method. This will in turn, in the MessageManager, create a DeferredResult object, add it to the list and return the object to the client. public DeferredResult getNewMessage() throws Exception { final DeferredResult deferredResult = new DeferredResult(deferredResultTimeout); deferredResult.onCompletion(new Runnable() { public void run() { messageDeferredResultList.remove(deferredResult); } }); deferredResult.onTimeout(new Runnable() { public void run() { messageDeferredResultList.remove(deferredResult); } }); messageDeferredResultList.add(deferredResult); return deferredResult; } The GitHub repo for this example contains two simple HTML pages, one which starts a long poll request and another which adds a message. These will call the below REST web service. @Controller @RequestMapping("/messages") public class MessageAPIController { @Inject private MessageManager messageManager; // // ADD A MESSAGE // @RequestMapping(value = "/add", method = RequestMethod.POST, produces = "application/json") @ResponseBody public Message addMessage( @RequestParam(required = true) String text) throws Exception { return messageManager.addMessage(text); } // // LONG POLLING // @RequestMapping(value = "/watch", method = RequestMethod.GET, produces = "application/json") @ResponseBody public DeferredResult getNewMessage() throws Exception { return messageManager.getNewMessage(); } } A further enhancement to the above to ensure messages aren't missed in between long polling requests would be to store the messages in Redis in a sorted set with the score being the message's creation timestamp. The Redis publish mechanism could then be used to tell the subscriber that there are new messages in Redis and it could then retrieve them based on the time of the last request, and return a collection of messages back to the client in the DeferredResult object.

March 17, 2014

by Geraint Jones

· 16,642 Views

Getting Started with Avro: Part 2

In the previous post we used avro-tools commands to serialize and deserialize data. In this post we post we will use Avro Java API for achieving the same. We will use same sample data and schema from our previous post. The java code for serializing and deserializing data without generating the code for schema is given below: package com.rishav.avro; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.util.Iterator; import java.util.LinkedHashMap; import org.apache.avro.Schema; import org.apache.avro.file.DataFileReader; import org.apache.avro.file.DataFileWriter; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericDatumReader; import org.apache.avro.generic.GenericDatumWriter; import org.apache.avro.generic.GenericRecord; import org.apache.avro.io.BinaryDecoder; import org.apache.avro.io.DatumReader; import org.apache.avro.io.DatumWriter; import org.codehaus.jackson.JsonFactory; import org.codehaus.jackson.JsonParseException; import org.codehaus.jackson.JsonProcessingException; import org.codehaus.jackson.map.ObjectMapper; import org.json.simple.JSONObject; public class AvroExampleWithoutCodeGeneration { public void serialize() throws JsonParseException, JsonProcessingException, IOException { InputStream in = new FileInputStream("resources/StudentActivity.json"); // create a schema Schema schema = new Schema.Parser().parse(new File("resources/StudentActivity.avsc")); // create a record to hold json GenericRecord AvroRec = new GenericData.Record(schema); // create a record to hold course_details GenericRecord CourseRec = new GenericData.Record(schema.getField("course_details").schema()); // this file will have AVro output data File AvroFile = new File("resources/StudentActivity.avro"); // Create a writer to serialize the record DatumWriter datumWriter = new GenericDatumWriter(schema); DataFileWriter dataFileWriter = new DataFileWriter(datumWriter); dataFileWriter.create(schema, AvroFile); // iterate over JSONs present in input file and write to Avro output file for (Iterator it = new ObjectMapper().readValues( new JsonFactory().createJsonParser(in), JSONObject.class); it.hasNext();) { JSONObject JsonRec = (JSONObject) it.next(); AvroRec.put("id", JsonRec.get("id")); AvroRec.put("student_id", JsonRec.get("student_id")); AvroRec.put("university_id", JsonRec.get("university_id")); LinkedHashMap CourseDetails = (LinkedHashMap) JsonRec.get("course_details"); CourseRec.put("course_id", CourseDetails.get("course_id")); CourseRec.put("enroll_date", CourseDetails.get("enroll_date")); CourseRec.put("verb", CourseDetails.get("verb")); CourseRec.put("result_score", CourseDetails.get("result_score")); AvroRec.put("course_details", CourseRec); dataFileWriter.append(AvroRec); } // end of for loop in.close(); dataFileWriter.close(); } // end of serialize method public void deserialize () throws IOException { // create a schema Schema schema = new Schema.Parser().parse(new File("resources/StudentActivity.avsc")); // create a record using schema GenericRecord AvroRec = new GenericData.Record(schema); File AvroFile = new File("resources/StudentActivity.avro"); DatumReader datumReader = new GenericDatumReader(schema); DataFileReader dataFileReader = new DataFileReader(AvroFile, datumReader); System.out.println("Deserialized data is :"); while (dataFileReader.hasNext()) { AvroRec = dataFileReader.next(AvroRec); System.out.println(AvroRec); } } public static void main(String[] args) throws JsonParseException, JsonProcessingException, IOException { AvroExampleWithoutCodeGeneration AvroEx = new AvroExampleWithoutCodeGeneration(); AvroEx.serialize(); AvroEx.deserialize(); } } For generating the schema java code from Avro json schema we can use avro-tools jar. The command for same is given below: java -jar avro-tools-1.7.5.jar compile schema StudentActivity.avsc Output path can be source folder for the project or we can add the generated java class files to Eclipse IDE manually. The java code for serializing and deserializing data with generating the code for schema is similar to above code except that in previous code we were assiging values to a GenericRecord and in this one we are assigning values to the generated Avro object: package com.rishav.avro; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.util.Iterator; import java.util.LinkedHashMap; import org.apache.avro.Schema; import org.apache.avro.file.DataFileReader; import org.apache.avro.file.DataFileWriter; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericDatumReader; import org.apache.avro.generic.GenericDatumWriter; import org.apache.avro.generic.GenericRecord; import org.apache.avro.io.DatumReader; import org.apache.avro.io.DatumWriter; import org.codehaus.jackson.JsonFactory; import org.codehaus.jackson.JsonParseException; import org.codehaus.jackson.JsonProcessingException; import org.codehaus.jackson.map.ObjectMapper; import org.json.simple.JSONObject; public class AvroExampleWithCodeGeneration { public void serialize() throws JsonParseException, JsonProcessingException, IOException { InputStream in = new FileInputStream("resources/StudentActivity.json"); // create a schema Schema schema = new Schema.Parser().parse(new File("resources/StudentActivity.avsc")); // create an object to hold json record StudentActivity sa = new StudentActivity(); // create an object to hold course_details Activity a = new Activity(); // this file will have AVro output data File AvroFile = new File("resources/StudentActivity.avro"); // Create a writer to serialize the record DatumWriter datumWriter = new GenericDatumWriter(schema); DataFileWriter dataFileWriter = new DataFileWriter(datumWriter); dataFileWriter.create(schema, AvroFile); // iterate over JSONs present in input file and write to Avro output file for (Iterator it = new ObjectMapper().readValues( new JsonFactory().createJsonParser(in), JSONObject.class); it.hasNext();) { JSONObject JsonRec = (JSONObject) it.next(); sa.setId((CharSequence) JsonRec.get("id")); sa.setStudentId((Integer) JsonRec.get("student_id")); sa.setUniversityId((Integer) JsonRec.get("university_id")); LinkedHashMap CourseDetails = (LinkedHashMap) JsonRec.get("course_details"); a.setCourseId((Integer) CourseDetails.get("course_id")); a.setEnrollDate((CharSequence) CourseDetails.get("enroll_date")); a.setVerb((CharSequence) CourseDetails.get("verb")); a.setResultScore((Double) CourseDetails.get("result_score")); sa.setCourseDetails(a); dataFileWriter.append(sa); } // end of for loop in.close(); dataFileWriter.close(); } // end of serialize method public void deserialize () throws IOException { // create a schema Schema schema = new Schema.Parser().parse(new File("resources/StudentActivity.avsc")); // create a record using schema GenericRecord AvroRec = new GenericData.Record(schema); File AvroFile = new File("resources/StudentActivity.avro"); DatumReader datumReader = new GenericDatumReader(schema); DataFileReader dataFileReader = new DataFileReader(AvroFile, datumReader); System.out.println("Deserialized data is :"); while (dataFileReader.hasNext()) { AvroRec = dataFileReader.next(AvroRec); System.out.println(AvroRec); } } public static void main(String[] args) throws JsonParseException, JsonProcessingException, IOException { AvroExampleWithoutCodeGeneration AvroEx = new AvroExampleWithoutCodeGeneration(); AvroEx.serialize(); AvroEx.deserialize(); } } In next post we will see how Avro deals with schema evolution.

March 17, 2014

by Rishav Rohit

· 41,030 Views · 2 Likes

3 Reasons to Choose Vert.x

Vert.x is a lightweight, high performance application platform for the JVM Modern web applications and the rise of mobile clients redefined what is expected from a web server. Node.js was the first technology that recognized the paradigm shift and offered a solution. The application platform Vert.x takes some of the innovations from Node.js and makes them available on the JVM, combining fresh ideas with one of the most sophisticated and fastest runtime environments available. Vert.x comes with a set of exciting features that make it interesting for anybody developing web applications. Non-blocking, event driven runtime Vert.x provides a non-blocking, event-driven runtime. If a server has to do a task that requires waiting for a response (e.g. requesting data from a database) there are two possibilities how this can be implemented: blocking and non-blocking. The traditional approach is a synchronous or blocking call. The program flow pauses and waits for the answer to return. To be able to handle more than one request in parallel, the server would execute each request in a different thread. The advantage is a relatively simple programming model, but the downside is a significant amount of overhead if the number of threads becomes large. The second solution is a non-blocking call. Instead of waiting for the answer, the caller continues execution, but provides a callback that will be executed once data arrives. This approach requires a (slightly) more complex programming model, but has a lot less overhead. In general a non-blocking approach results in much better performance when a large number of requests need to be served in parallel. Simple to use concurrency and scalability A Vert.x application consists of loosely coupled components, which can be rearranged to match increasing performance requirements Vert.x applications are written using an Actor-like concurrency model. An application consists of several components, the so-called Verticles, which run independently. A Verticle runs single-threaded and communicates with other Verticles by exchanging messages on the global event-bus. Because they do not share state, Verticles can run in parallel. The result is an easy to use approach for writing multi-threaded applications.You can create several Verticles which are responsible for the same task and the runtime will distribute the workload among them, which means you can take full advantage of all CPU cores without much effort. Verticles can also be distributed between several machines. This will be transparent to the application code. The Verticles use the same mechanisms to communicate as if they would run on the same machine. This makes it extremely easy to scale your application. Vert.x supports the most popular languages on the JVM. Support for Scala and Clojure is on the way. Polyglot Unlike many other application platforms, Vert.x is polyglot. Applications can be written in several languages. It is even possible to use different languages in the same application. At this point Java, Python, Groovy, Ruby, and JavaScript can be used and support for Scala and Clojure is on the way. Conclusion Vert.x is a relatively young platform and subsequently the ecosystem is not as rich as that of the more established platforms. Nevertheless for the most common tasks, there are extensions available.The advantages of Vert.x are astonishing. Its non-blocking, event-driven nature is extremely well-suited for modern web applications. Vert.x makes it easy to write concurrent applications that scale effortless from a single low-end machine to a cluster with several high-end servers. Add the fact that you can use most popular languages for the JVM and you have a web developers dream come true.

March 11, 2014

by Michael Heinrichs

· 29,232 Views · 7 Likes

Spring Boot & JavaConfig integration

Java EE in general and Context and Dependency Injection has been part of the Vaadin ecosystem since ages. Recently, Spring Vaadin is a joint effort of the Vaadin and the Spring teams to bring the Spring framework into the Vaadin ecosystem, lead by Petter Holmström for Vaadin and Josh Long for Pivotal. Integration is based on the Spring Boot project - and its sub-modules, that aims to ease creating new Spring web projects. This article assumes the reader is familiar enough with Spring Boot. If not the case, please take some time to get to understand basic notions about the library. Note that at the time of this writing, there's no release for Spring Vaadin. You'll need to clone the project and build it yourself. The first step is to create the UI. In order to display usage of Spring's Dependency Injection, it should use a service dependency. Let's injection the UI through Constructor Injection to favor immutability. The only addition to a standard UI is to annotate it with org.vaadin.spring.@VaadinUI. @VaadinUI public class VaadinSpringExampleUi extends UI { private HelloService helloService; public VaadinSpringExampleUi(HelloService helloService) { this.helloService = helloService; } @Override protected void init(VaadinRequest vaadinRequest) { String hello = helloService.sayHello(); setContent(new Label(hello)); } } The second step is standard Spring Java configuration. Let's create two configuration classes, one for the main context and the other for the web one. Two thing of note: The method instantiating the previous UI has to be annotated with org.vaadin.spring.@UIScope in addition to standard Spring org.springframework.context.annotation.@Bean to bind the bean lifecycle to the new scope provided by the Spring Vaadin library. At the time of this writing, a RequestContextListener bean must be provided. In order to be compliant with future versions of the library, it's a good practice to annotate the instantiating method with @ConditionalOnMissingBean(RequestContextListener.class). @Configuration public class MainConfig { @Bean public HelloService helloService() { return new HelloService(); } } @Configuration public class WebConfig extends MainConfig { @Bean @ConditionalOnMissingBean(RequestContextListener.class) public RequestContextListener requestContextListener() { return new RequestContextListener(); } @Bean @UIScope public VaadinSpringExampleUi exampleUi() { return new VaadinSpringExampleUi(helloService()); } } The final step is to create a dedicated WebApplicationInitializer. Spring Boot already offers a concrete implementation, we just need to reference our previous configuration classes as well as those provided by Spring Vaadin, namely VaadinAutoConfiguration and VaadinConfiguration. public class ApplicationInitializer extends SpringBootServletInitializer { @Override protected SpringApplicationBuilder configure(SpringApplicationBuilder application) { return application.showBanner(false) .sources(MainConfig.class) .sources(VaadinAutoConfiguration.class, VaadinConfiguration.class) .sources(WebConfig.class); } } At this point, we demonstrated a working Spring Vaadin sample application. Code for this article can be browsed and forked on Github.

March 10, 2014

by Nicolas Fränkel

· 13,544 Views

XML to Avro Conversion

We all know what XML is right? Just in case not, no problem here is what it is all about. 5 Now, what the computer really needs is the number five and some context around it. In XML you (human and computer) can see how it represents context to five. Now lets say instead you have a business XML document like FPML 32.00 150000 1.00 EUR 405000 2001-07-17Z NONE EUR 2.70 ISDA2002 ISDA2002Equity TODO GBEN Party A Party B That is a lot of extra unnecessary data points. Now lets look at this using Apache Avro. With Avro, the context and the values are separated. This means the schema/structure of what the information is does not get stored or streamed over and over and over and over (and over) again. The Avro schema is hashed. So the data structure only holds the value and the computer understands the fingerprint (the hash) of the schema and can retrieve the schema using the fingerprint. 0x d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592 This type of implementation is pretty typical in the data space. When you do this you can reduce your data between 20%-80%. When I tell folks this they immediately ask, “why such a large gap of unknowns”. The answer is because not every XML is created the same. But that is the problem because you are duplicating the information the computer needs to understand the data. XML is nice for humans to read, sure … but that is not optimized for the computer. Here is a converter we are working on https://github.com/stealthly/xml-avro to help get folks off of XML and onto lower cost, open source systems. This allows you to keep parts of your systems (specifically the domain business code) using the XML and not having to be changed (risk mitigation) but store and stream the data with less overhead (optimize budget).

March 7, 2014

by Joe Stein

· 27,202 Views

Convert CSV Data to Avro Data

In one of my previous posts I explained how we can convert json data to avro data and vice versa using avro tools command line option. Today I was trying to see what options we have for converting csv data to avro format, as of now we don't have any avro tool option to accomplish this . Now, we can either write our own java program (MapReduce program or a simple java program) or we can use various SerDe's available with Hive to do this quickly and without writing any code :) To convert csv data to Avro data using Hive we need to follow the steps below: Create a Hive table stored as textfile and specify your csv delimiter also. Load csv file to above table using "load data" command. Create another Hive table using AvroSerDe. Insert data from former table to new Avro Hive table using "insert overwrite" command. To demonstrate this I will use use the data below (student.csv): 0,38,91 0,65,28 0,78,16 1,34,96 1,78,14 1,11,43 Now execute below queries in Hive: --1. Create a Hive table stored as textfile USE test; CREATE TABLE csv_table ( student_id INT, subject_id INT, marks INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; --2. Load csv_table with student.csv data LOAD DATA LOCAL INPATH "/path/to/student.csv" OVERWRITE INTO TABLE test.csv_table; --3. Create another Hive table using AvroSerDe CREATE TABLE avro_table ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.literal'='{ "namespace": "com.rishav.avro", "name": "student_marks", "type": "record", "fields": [ { "name":"student_id","type":"int"}, { "name":"subject_id","type":"int"}, { "name":"marks","type":"int"}] }'); --4. Load avro_table with data from csv_table INSERT OVERWRITE TABLE avro_table SELECT student_id, subject_id, marks FROM csv_table; Now you can get data in Avro format from Hive warehouse folder. To dump this file to local file system use below command: hadoop fs -cat /path/to/warehouse/test.db/avro_table/* > student.avro If you want to get json data from this avro file you can use avro tools command: java -jar avro-tools-1.7.5.jar tojson student.avro > student.json So we can easily convert csv to avro and csv to json also by just writing 4 HQLs.

March 5, 2014

by Rishav Rohit

· 39,713 Views · 1 Like

Step-by-Step: Live Migrate Multiple (Clustered) VMs in One Line of PowerShell - Revisited

A while back, I wrote an article showing how to Live Migrate Your VMs in One Line of Powershell between non-clustered Windows Server 2012 Hyper-V hosts using Shared Nothing Live Migration. Since then, I’ve been asked a few times for how this type of parallel Live Migration would be performed for highly available virtual machines between Hyper-V hosts within a cluster. In this article, we’ll walk through the steps of doing exactly that … via Windows PowerShell on Windows Server 2012 or 2012 R2 or our FREE Hyper-V Server 2012 R2 bare-metal, enterprise-grade hypervisor in a clustered configuration. Wait! Do I need PowerShell to Live Migrate multiple VMs within a Cluster? Well, actually … No. You could certainly use the Failover Cluster Manager GUI tool to select multiple highly available virtual machines, right-click and select Move | Live Migration … Failover Cluster Manager – Performing Multi-VM Live Migration But, you may wish to script this process for other reasons … perhaps to efficiently drain all VM’s from a host as part of a maintenance script that will be performing other tasks. Can I use the same PowerShell cmdlets for Live Migrating within a Cluster? Well, actually … No again. When VMs are made highly available resources within a cluster, they’re managed as cluster group resources instead of being standalone VM resources. As a result, we have a different set of Cluster-aware PowerShell cmdlets that we use when managing these cluster groups. To perform a scripted multi-VM Live Migration, we’ll be leveraging three of these cmdlets: Get-ClusterNode, Get-ClusterGroup and Move-ClusterVirtualMachineRole Now, let’s see that one line of PowerShell! Before getting to the point of actually performing the multi-VM Live Migration in a single PowerShell command line, we first need to setup a few variables to handle the "what" and "where" of moving these VMs. First, let’s specify the name of the cluster with which we’ll be working. We’ll store it in a $clusterName variable. $clusterName = read-host -Prompt "Cluster name" Next, we’ll need to select the cluster node to which we’ll be Live Migrating the VMs. Lets use the Get-ClusterNode and Out-GridView cmdlets together to prompt for the cluster node and store the value in a $targetClusterNode variable. $targetClusterNode = Get-ClusterNode -Cluster $clusterName | Out-GridView -Title "Select Target Cluster Node" ` -OutputMode Single And then, we’ll need to create a list of all the VMs currently running in the cluster. We can use the Get-ClusterGroup cmdlet to retrieve this list. Below, we have an example where we are combining this cmdlet with a Where-Object cmdlet to return only the virtual machine cluster groups that are running on any node except the selected target cluster node. After all, it really doesn’t make any sense to Live Migrate a VM to the same node on which it’s currently running! $haVMs = Get-ClusterGroup -Cluster $clusterName | Where-Object {($_.GroupType -eq "VirtualMachine") ` -and ($_.OwnerNode -ne $targetClusterNode.Name)} We’ve stored the resulting list of VMs in a $haVMs variable. Ready to Live Migrate! OK … Now we have all of our variables defined for the cluster, the target cluster node and the list of VMs from which to choose. Here’s our single line of PowerShell to do the magic … $haVMs | Out-GridView -Title "Select VMs to Move" –PassThru | Move-ClusterVirtualMachineRole -MigrationType Live ` -Node $targetClusterNode.Name -Wait 0 Proceed with care: Keep in mind that your target cluster node will need to have sufficient available resources to run the VM's that you select for Live Migration. Of course, it's best to initially test tasks like this in your lab environment first. Here’s what is happening in this single PowerShell command line: We’re passing the list of VMs stored in the $haVMs variable to the Out-GridView cmdlet. Out-GridView prompts for which VMs to Live Migrate and then passes the selected VMs down the PowerShell object pipeline to the Move-ClusterVirtualMachineRole cmdlet. This cmdlet initiates the Live Migration for each selected VM, and because it’s using a –Wait 0 parameter, it initiates each Live Migration one-after-another without waiting for the prior task to finish. As a result, all of the selected VMs will Live Migrate in parallel, up to the maximum number of concurrent Live Migrations that you’ve configured on these cluster nodes. The VMs selected beyond this maximum will simply queue up and wait their turn. Unlike some competing hypervisors, Hyper-V doesn't impose an artificial hard-coded limit on how many VMs for you can Live Migrate concurrently. Instead, it's up to you to set the maximum to a sensible value based on your hardware and network capacity. Do you have your own PowerShell automation ideas for Hyper-V? Feel free to share your ideas in the Comments section below. See you in the Clouds! - Keith

March 3, 2014

by Keith Mayer

· 10,722 Views

Brief comparison of BDD frameworks

JDave, Concordion, Easyb, JBehave, Cucumber are all compared here briefly for your convenience.

February 24, 2014

by Sebastian Laskawiec

· 129,923 Views · 16 Likes

To ServiceMix or Not to ServiceMix

This morning an interesting topic was posted to the Apache ServiceMix user forum, asking the question: To ServiceMix or not ServiceMix. In my mind the short answer is: NO Guillaume Nodet one of the key architects and long time committer on Apache ServiceMix already had his mind set 3 years ago when he wrong this blog post - Thoughts about ServiceMix. What has happened on the ServiceMix project was that the ServiceMix kernel was pulled out of ServiceMix into its own project - Apache Karaf. That happened in spring 2009, which Guillaume also blogged about. So is all that bad? No its IMHO all great. In fact having the kernel as a separate project, and Camel and CXF as the integration and WS/RS frameworks, would allow the ServiceMix team to focus on building the ESB that truly had value-add. But that did not happen. ServiceMix did not create a cross product security model, web console, audit and trace tooling, clustering, governance, service registry, and much more that people were looking for in an ESB (or related to a SOA suite). There were only small pieces of it, but never really baked well into the project. That said its not too late. I think the ServiceMix project is dying, but if a lot of people in the community step up, and contribute and work on these things, then it can bring value to some users. But I seriously doubt this will happen. PS: 6 years ago I was working as a consultant and looked at the next integration platform for a major Danish organization, and we looked at ServiceMix back then and dismissed it due its JBI nature, and the new OSGi based architecture was only just started. And frankly it has taken a long long time to mature Apache Karaf / Felix / Aries and the other pieces in OSGi to what they are today to offer a stable and sound platform for users to build their integration applications. That was not the case 4-6 years ago. Okay No to ServiceMix - what are my options then? So what should use you instead of ServiceMix? Well in my mind you have at least these two options. 1) Use Apache Karaf and add the pieces you need, such as Camel, CXF, ActiveMQ and build your own ESB. These individual projects have regular releases, and you can upgrade as you need. The ServiceMix project only has the JBI components in additional, that you should NOT use. Only legacy users that got on the old ServiceMix 3.x wagon may need to use this in a graceful upgrade from JBI to Karaf based containers. 2) Take a look at fabric8. IMHO fabric8 is all that value-add the ServiceMix project did not create, and a lot more. James Strachan, just blogged today about some of his thoughts on fabric8, JBoss Fuse, and Karaf. I encourage you to take a read. For example he talks about how fabric becomes poly container, so you have a much wider choice of which containers/JVM to run your integration applications. OSGi is no longer a requirement. (IMHO that is very very existing and potentially a changer). I encourage you to check out fabric8 web-site, and also read the overview and motivation sections of the documentation. And then check out some of the videos. After the upcoming JBoss Fuse 6.1 release, the Fuse team at Red Hat will have more time and focus to bring the documentation at fabric8 up to date covering all the functionality we have (there is a lot more), and as well bring out a 1.0 community released using pure community releases. This gives end users a 100% free to use out of the box release. And users looking for a commercial release can then use JBoss Fuse. Best of both worlds. Summary Okay back to the question - to ServiceMix or not. Then NO. Innovation happens outside ServiceMix, and also more and more outside Apache. If you have thoughts then you can share those in comments to this blog, or better yet, get involved in the discussion forum at the ServiceMix user forum. PPS: The thoughts on this blog is mine alone, and are not any official words from my employer.

February 12, 2014

by Claus Ibsen

· 16,970 Views

Couchbase .NET SDK 2.0 Development Series: Part 1-1: Server Configuration

This article was originally written by Jeff Morris In the introduction to this series, I discussed some of the motivation for rewriting .NET SDK, the goals, objectives and the major features of the upcoming 2.0 release, and we examined the high-level architecture (10,000 feet view) of a Couchbase Server Client SDK. In this post we will go over the design and development of one of the core configuration components of a Couchbase SDK: Server Configuration. Introduction A Couchbase SDK client requires configuration from two sources: the Client Configuration, which defines the IP of the cluster to connect to, number of connections to use and other important information regarding how the client will interact with the cluster, and the Server Configuration, which defines the current state of the cluster (e.g. number of nodes, buckets that are available, etc.), thus driving the internal state of a client (Cluster Map) This post will only discuss the Server Configuration aspects and will largely revolve around implementing several well-defined interfaces or contracts. HTTP Streaming Configuration Currently, most clients use a “bootstrapping” technique via client configuration and a “Streaming Configuration” exposed by the Couchbase REST API. This is supported by versions of Couchbase from 2.2 and back. The usual approach is as follows: Within the “uris” element of a Client Configuration (semantics very per client), a URL is defined for which to start the bootstrapping process: http://[SERVER]:8091/pools The response is then parsed and the a request is made to get the buckets configuration: http://[SERVER]:8091/pools/default?uuid=[UUID] This response is parsed and another request is made to get streaming URL from: http://[SERVER]:8091/pools/default/buckets?v=[VERSION]&uuid=[UUID] Finally, the streaming URL connection is made which is long-lived and raises events in the client with respect to changes in the cluster: http://[SERVER]:8091/pools/default/bucketsStreaming/default?bucket_uuid=[UUID] The client will then change its internal state to match that of the current server configuration. There are some problems with this approach, among others: The “streaming URL” is resource intensive to create and maintain (mainly memory) on the server-side During a rebalance or failover situation, the cluster configuration may change many, many times. Each time this happens the client must tear down all of its resources (socket connections, VBucket mappings) and build its state up again and again, which can leads to reduced throughput, latency, higher than expected memory and CPU usage, and so on and so forth… Operations that are in-flight may be terminated and then re-tried on a new config state – it’s as if the “carpet has been pulled out from underneath them”. Responding to NOT_MY_VBUCKET responses are handled in-efficiently by simple trying the next node in the list – there is no information to help the client in which node to re-direct the operation to. A New Model for Configuration Management: CCCP While the streaming HTTP “bootstrapping” approach has worked reasonably well for most clients, the downsides have begun to outweigh the plusses, thus a new model for updating client configuration has been defined is available starting with the 2.5 version of the Couchbase Server: Client Cluster Configuration Publication or “CCCP”. CCCP introduces a new operation to be used before or after authentication to request configuration as well as a mechanism for returning configuration information when a NOT_MY_VBUCKET response is returned for a failed operation. In this case CCCP supporting SDK, the client will react by using the configuration to update itself before resending the operation. Note that a NOT_MY_VBUCKET is the standard response that is returned by the cluster when the cluster itself has changed (during a rebalance or failover scenario for example) and the client has not yet “synched” up and is using a stale configuration, resulting in an invalid key mapping. Whereas the “bootstrapping” approach is somewhat of a “pull” type operation, CCCP is either “push” or “pull” depending upon whether the request was initiated by the client (via an explicit CMD_GET_CLUSTER_CONFIG operation) or by the server itself (via a NOT_MY_VBUCKET response to an operation). We will go over CCCP in more detail in a later post. File Based Configuration One other semi-supported configuration option exists: file based configuration. File based configuration is primarily useful for testing and development and we will provide an implementation in the test projects to remove some of the dependencies that are difficult to replicate and or cause false positives when running the test suite. Structural Architecture View Internally the Server Configuration component of the client is a provider based model, in which multiple implementations of a configuration provider can be configured in the client and then a strategy can be chosen to determine which provider should be used. The default is a simple linear, fallback approach where the first configured provider is used and then if it fails the next provider in sequence will take its place. Here is a diagram showing the main actor objects and the relationships with some of other key objects within the client which will be discussed in subsequent posts: A description of each follows: ConfigurationProvider: a source which shall yield a new ConfigInfo. It’s the responsibility of the provider to provide the mechanism for fetching the configuration from its source. ConfigurationInformation: the configuration info contains a list of possible nodes and the VBucket map informing clients about which servers within said nodes a given key should be forwarded to. ConfigurationManager: bridge between the client and the providers and the strategy taken to determine which provider to use and what retry logic to apply. A more detailed document of this architecture can be found here. Please note that this, like all development, is an evolutionary process, so expect some changes and revisions over time. Conclusion and Next Steps This post discussed the history (HTTP Streaming) and the future (CCCP) of Couchbase SDK Server Configuration Management. In the next post we will go into detail the implementation of the HTTP Streaming configuration provider which is required for clients targeting pre-2.5 versions of the Couchbase Server.

February 7, 2014

by Don Pinto

· 3,789 Views

Java: Handling a RuntimeException in a Runnable

At the end of last year I was playing around with running scheduled tasks to monitor a Neo4j cluster and one of the problems I ran into was that the monitoring would sometimes exit. I eventually realised that this was because a RuntimeException was being thrown inside the Runnable method and I wasn’t handling it. The following code demonstrates the problem: import java.util.ArrayList; import java.util.List; import java.util.concurrent.*; public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } If we run that code we’ll see the RuntimeException but the executor won’t exit because the thread died without informing it: Exception in thread "main" pool-1-thread-1 -> 1391212558074 java.util.concurrent.ExecutionException: java.lang.RuntimeException: game over at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at RunnableBlog.main(RunnableBlog.java:11) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) At the time I ended up adding a try catch block and printing the exception like so: public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } catch (RuntimeException e) { e.printStackTrace(); } } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } This allows the exception to be recognised and as far as I can tell means that the thread executing the Runnable doesn’t die. java.lang.RuntimeException: game over pool-1-thread-1 -> 1391212651955 at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) pool-1-thread-1 -> 1391212652956 java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) pool-1-thread-1 -> 1391212653955 java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) This worked well and allowed me to keep monitoring the cluster. However, I recently started reading ‘Java Concurrency in Practice‘ (only 6 years after I bought it!) and realised that this might not be the proper way of handling the RuntimeException. public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } catch (RuntimeException e) { Thread t = Thread.currentThread(); t.getUncaughtExceptionHandler().uncaughtException(t, e); } } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } I don’t see much difference between the two approaches so it’d be great if someone could explain to me why this approach is better than my previous one of catching the exception and printing the stack trace.

February 6, 2014

by Mark Needham

· 19,657 Views

How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 1

Learn how to set up a four node Hadoop cluster using AWS EC2, PuTTy(gen), and WinSCP.

January 23, 2014

by Hardik Pandya

· 136,010 Views · 3 Likes

Spring IDE and the Spring Tool Suite - Using Spring in Eclipse

Get started with Spring IDE and the Spring Tool Suite – a set of plugins to simplify the development of Spring-based applications in Eclipse.

January 10, 2014

by James Sugrue

· 700,053 Views · 9 Likes