DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Cloud Architecture Topics

article thumbnail
Understanding the Cloud Foundry Java Buildpack Code with Tomcat Example
Cloudfoundry's java buildpack is supporting some popular jvm based applications. This article is oriented to the audiences already with experience of cloudfoundry/heroku buildpack who want to have more understanding of how buildpack and cloudfoundry works internally. cf push app -p app.war -b build-pack-url The above command demonstrates the usage of pushing a war file to cloudfoundry by using a custom buildpack (E.g. https://github.com/cloudfoundry/java-buildpack). However, what exactly happens inside, or how cloudfoundry bootstrap the war file with tomcat? There are three contracts phase that bridge communication between buildpack and cloudfoundry. The three phases are detect, compile and release, which are three ruby shell scripts: Java buildpack has multiple sub components, while each of them has all of these three phases (E.g. tomcat is one of the sub components, while it contained another layer of sub components). Detect Phase: detect phase is to check whether a particular buildpack/component applies to the deployed application. Take the war file example, tomcat applies only when https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat.rb is true: def supports? web_inf? && !JavaBuildpack::Util::JavaMainUtils.main_class(@application) end The above code means, the tomcat applies when the application has a WEB-INF folder andthisisnot a main class bootstrapped application. Compile Phase: Compile phase would be the major/comprehensive work for a customized buildpack, while it is trying to build a file system on a lxc container. Take the example of our war application and tomcat example. In https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat/tomcat_instance.rb def compile download(@version, @uri) { |file| expand file } link_to(@application.root.children, root) @droplet.additional_libraries << tomcat_datasource_jar if tomcat_datasource_jar.exist? @droplet.additional_libraries.link_to web_inf_lib end def expand(file) with_timing "Expanding Tomcat to #{@droplet.sandbox.relative_path_from(@droplet.root)}" do FileUtils.mkdir_p @droplet.sandbox shell "tar xzf #{file.path} -C #{@droplet.sandbox} --strip 1 --exclude webapps 2>&1" @droplet.copy_resources end The above code is all about preparing the tomcat and link the application files, so the application files will be available for the tomcat classpath. Before going to the code, we have to understand the working directory when the above code executes: . => working directory .app => @application, contains the extracted war archive .buildpack/tomcat => @droplet.sandbox .buildpack/jdk .buildpack/other needed components Inside compile method: download method will download tomcat binary file (specified here: https://github.com/cloudfoundry/java-buildpack/blob/master/config/tomcat.yml), and then extract the archive file to @droplet.sandbox directory. Then copy the resources folder's files to https://github.com/cloudfoundry/java-buildpack/tree/master/resources/tomcat/conf to @droplet.sandbox/conf Symlink the @droplet.sandbox/webapps/ROOT to .app/ Symlink additional libraries (comes from other component rather than application) to the WEB-INF/lib Note: All the symlinks use relative path, since when the container deployed to DEA, the absolute paths would be different. RELEASE PHASE: Release phase is to setup instructions of how to start tomcat. Look at the code in :https://github.com/cloudfoundry/java-buildpack/blob/master/lib/java_buildpack/container/tomcat.rb def command @droplet.java_opts.add_system_property 'http.port', '$PORT' [ @droplet.java_home.as_env_var, @droplet.java_opts.as_env_var, "$PWD/#{(@droplet.sandbox + 'bin/catalina.sh').relative_path_from(@droplet.root)}", 'run' ].flatten.compact.join(' ') end The above code does: Add java system properties http.port (referenced in tomcat server.xml) with environment properties ($PORT), this is the port on the DEA bridging to the lxc container already setup when the container was provisioned. instruction of how to run the tomcat Eg. "./bin/catalina.sh run"
May 9, 2014
by Shaozhen Ding
· 23,190 Views · 1 Like
article thumbnail
Java EE: The Basics
wanted to go through some of the basic tenets, the technical terminology related to java ee. for many people, java ee/j2ee still mean servlets, jsps or maybe struts at best. no offence or pun intended! this is not a java ee 'bible' by any means. i am not capable enough of writing such a thing! so let us line up the 'keywords' related to java ee and then look at them one by one java ee java ee apis (specifications) containers services multitiered applications components let's try to elaborate on the above mentioned points. ok. so what is java ee? 'ee' stands for enterprise edition. that essentially makes java ee - java enterprise edition. if i had to summarize java ee in a couple of sentences, it would go something like this "java ee is a platform which defines 'standard specifications/apis' which are then implemented by vendors and used for development of enterprise (distributed, 'multi-tired', robust) 'applications'. these applications are composed of modules or 'components' which use java ee 'containers' as their run-time infrastructure." what is this 'standardized platform' based upon? what does it constitute? the platform revolves around 'standard' specifications or apis . think of these as contracts defined by a standard body e.g. enterprise java beans (ejb), java persistence api (jpa), java message service (jms) etc. these contracts/specifications/apis are implemented by different vendors e.g. glassfish, oracle weblogic, apache tomee etc alright. what about containers? containers can be visualized as 'virtual/logical partitions' . each container supports a subset of the apis/specifications defined by the java ee platform they provide run-time 'services' to the 'applications' which they host the java ee specification lists 4 types of containers ejb container web container application client container applet container java ee containers i am not going to dwell into details of these containers in this post. services?? well, 'services' are nothing but a result of the vendor implementations of the standard 'specifications' (mentioned above). examples of specifications are - jersey for jax-rs (restful services), tyrus (web sockets), eclipselink (jpa), weld (cdi) etc. the 'container' is the interface between the deployed application ('service' consumer) and the application server. here is a list of 'services' which are rendered by the 'container' to the underlying 'components' (this is not an exhaustive list) persistence - offered by the java persistence api (jpa) which drives object relational mapping (orm) and an abstraction for the database operations. messaging - the java message service (jms) provides asynchronous messaging between disparate parts of your applications. contexts & dependency injection - cdi provides loosely coupled and type safe injection of resources. web services - jaxrs and jaxws provide support for rest and soap style services respectively transaction - provided by the java transaction api (jta) implementation what is a typical java ee 'application'? what does it comprise of? applications are composed of different ' components ' which in turn are supported by their corresponding ' container ' supported 'component' types are: enterprise applications - make use of the specifications like ejb, jms, jpa etc and are executed within an ejb container web applications - they leverage the servlet api, jsp, jsf etc and are supported by a web container application client - executed in client side. they need an application client container which has a set of supported libraries and executes in a java se environment. applets - these are gui applications which execute in a web browser. how are java ee applications structured? as far as java ee 'application' architecture is concerned, they generally tend follow the n-tier model consisting of client tier, server tier and of course the database (back end) tier client tier - consists of web browsers or gui (swing, java fx) based clients. web browsers tend to talk to the 'web components' on the server tier while the gui clients interact directly with the 'business' layer within the server tier server tier - this tier comprises of the dynamic web components (jsp, jsf, servlets) and the business layer driven by ejbs, jms, jpa, jta specifications. database tier - contains 'enterprise information systems' backed by databases or even legacy data repositories. generic 3-tier java ee application architecture java ee - bare bones, basics.... as quickly and briefly as i possibly could. that's all for now! :-) stay tuned for more java ee content, specifically around the latest and greatest version of the java ee platform --> java ee 7 happy reading!
April 29, 2014
by Abhishek Gupta DZone Core CORE
· 40,620 Views · 3 Likes
article thumbnail
Innodb redo log archiving
This post was originally written by Vlad Lesin for the MySQL Performance Blog. Percona Server 5.6.11-60.3 introduces a new “log archiving” feature. Percona XtraBackup 2.1.5 supports “apply archived logs.” What does it mean and how it can be used? Percona products propose three kinds of incremental backups. The first is full scan of data files and comparison the data with backup data to find some delta. This approach provides a history of changes and saves disk space by storing only data deltas. But the disadvantage is a full-data file scan that adds load to the disk subsystem. The second kind of incremental backup avoids extra disk load during data file scans. The idea is in reading only changed data pages. The information about what specific pages were changed is provided by the server itself which writes files with the information during work. It’s a good alternative but changed-pages tracking adds some small load. And Percona XtraBackup’s delta reading leads to non-sequential disk io. This is good alternative but there is one more option. The Innodb engine has a data log. It writes all operations which modify database pages to log files. This log is used in the case of unexpected server terminating to recover data. The Innodb log consists of the several log files which are filled sequentially in circular. The idea is to save those files somewhere and apply all modifications from archived logs to backup data files. The disadvantage of this approach is in using extra disk space. The advantage is there is no need to do an “explicit” backup on the host server. A simple script could sit and wait for logs to appear then scp/netcat them over to another machine. But why not use good-old replication? Maybe replication does not have such performance as logs recovering but it is more controlled and well-known. Archived logs allows you to do any number of things with them from just storing them to doing periodic log applying. You can not recover from a ‘DROP TABLE’, etc with replication. But with this framework one could maintain the idea of “point in time” backups. So the “archived logs” feature is one more option to organize incremental backups. It is not widely used as it was issued not so far and there is not A good understanding of how it works and how it can be used. We are open to any suggestions about its suggest improvements and use cases. The subject of this post is to describe how it works in depth. As log archiving is closely tied with innodb redo logs the internals of redo logs will be covered too. This post would be useful not only for DBA but also for Software Engineers because not only common principles are considered but the specific code too, and knowledge from this post can be used for further MySQL code exploring and patching. What is the innodb log and how it is written? Let’s remember what are innodb logs, why they are written, what they are used for. The Innodb engine has buffer pool. This is a cache of database pages. Any changes are done on page in buffer pool, then page is considered as “dirty,” which means it must be flushed, and pushed to the flush list which is processed periodically by special thread. If pages are not flushed to disk and server is terminated unexpectedly the changes will be lost. To avoid this innodb writes changes to redo log and recover data from redo log during start. This technique allows to delay buffer pool pages flushing. It can increase performance because several changes of one page can be accumulated in memory and then flushed by one io. Except that flushed pages can be grouped to decrease the number of non-sequential io’s. But the down-side of this approach is time for data recovering. Let’s consider how this log is stored, generated and used for data recovering. Log files Redo log consists of a several log files which are treated as a circular buffer. The number and the size of log files can be configured. Each log file has a header. The description of this header can be found in “storage/innobase/include/log0log.h” by “LOG_GROUP_ID” keyword. Each log file contains log records. Redo log records are written sequentially by log blocks of OS_FILE_LOG_BLOCK_SIZE size which is equal to 512 bytes by default and can be changed with innodb option. Each record has its LSN. LSN is a “Log Sequence Number” – the number of bytes written to log from the log creation to the certain log record. Each block consists of header, trailer and log records. Log blocks Let’s consider log block header. The first 4 bytes of the header is log block number. The block number is very similar as LSN but LSN is measured in bytes and block number is measured by OS_FILE_LOG_BLOCK_SIZE. Here is the simple formula how LSN is converted to block number: return(((ulint) (lsn / OS_FILE_LOG_BLOCK_SIZE) & 0x3FFFFFFFUL) + 1); This formula can be found in log_block_convert_lsn_to_no() function. The next two bytes is the number of bytes used in the block. The next two bytes is the offset of the first MTR log record in this block. What is MTR will be described below. Currently it can be considered as a synonym of bunch of log records which are gathered together as a description of some logical operation. For example it can be a group of log records for inserting new row to some table. This field is used when there are records of several MTR’s in one block. The next four bytes is a checkpoint number. The trailer is four bytes of log block checksum. The above description can be found in “storage/innobase/include/log0log.h” by “LOG_BLOCK_HDR_NO” keyword. Before writing to disk log blocks must be somehow formed and stored. And the question is: How log blocks are stored in memory and on disk? Where log blocks are stored before flushing to disk and how they are written and flushed? Global log object and log buffer The answer to the first part of the question is log buffer. Server holds very important global object log_sys in memory. It contains a lot of useful information about logging state. Log buffer is pointed by log_sys->buf pointer which is initialized in log_init(). I would highlight the following log_sys fields that are used for work with log buffer and flushing: log_sys->buf_size – the size of log buffer, can be set with innodb-log-buffer-size variable, the default value is 8M; log_sys->buf_free – the offset from which the next log record will be written; log_sys->max_buf_free – if log_sys->buf_free is greater then this value log buffer must be flushed, see log_free_check(); log_sys->buf_next_to_write – the offset of the next log record to write to disk; log_sys->write_lsn – the LSN up to which log is written; log_sys->flushed_to_disk_lsn – the LSN up to which log is written and flushed; log_sys->lsn – the last LSN in log buffer; So log_sys->buf_next_to_write is between 0 and log_sys->buf_free, log_sys->write_lsn is equal or less log_sys->lsn, log_sys->flushed_to_disk_lsn is less or equal to log_sys->write_lsn. The relationships for those fields can be easily traced with debugged by setting up watchpoints. Ok, we have log buffer, but how do log records come to this buffer? Where log records come from? Innodb has special objects that allow you to gather redo log records for some operations in one bunch before writing them to log buffer. These objects are called “mini-transactions” and corresponding functions and data types have “mtr” prefix in the code. The objects itself are described in mtr_t “c” structure. The most interesting fields of this structure are the following: mtr_t::log – contains log records for the mini-transaction, mtr_t::memo – contains pointers to pages which are changed or locked by the mini-transaction, it is used to push pages to flush list and release locks after logs records are copied to log buffer in mtr_commit() (see mtr_memo_pop_all() called in mtr_commit()). mtr_start() function initializes an object of mtr_t type and mtr_commit() writes log records from mtr_t::log to log_sys->buf + log_sys->buf_free. So the typical sequence of any operation which changes data is the following: mtr_start(); // initialize mtr object some_ops... // operations on data which are logged in mtr_t::log mtr_commit(); // write logged operations from mtr_t::log to log buffer log_sys->buf page_cur_insert_rec_write_log() is a good example of how mtr records can be written and mtr::memo can be filled. The low-level function which writes data to log buffer is log_write_low(). This function is invoked inside of mtr_commit() and not only copy the log records from mtr_t object to log buffer log_sys->buf but also creates a new log blocks inside of log_sys->buf, fills their header, trailer, calculates checksum. So log buffer contains log blocks which are sequentially filled with log records which are grouped in “mini-transactions” which logically can be treated as some logical operation over data which consists of a sequence of mini-operations(log records). As log records are written sequentially in log buffer one mini-transaction and even one log record can be written in two neighbour blocks. That is why the header field which would contain the offset of the first MTR in the block is necessary to calculate the point from which log records parsing can be started. This field was described in 2.2. So we have a buffer of log blocks in a memory. How is data from this buffer written to disk? The mysql documentation says that this depends on innodb_flush_log_at_trx_commit option. There can be three cases depending on the value of this option. Let’s consider each of them. Writing log buffer to disk: innodb_flush_log_at_trx_commit is 1 or 2. The first two cases is when innodb_flush_log_at_trx_commit is 1 or 2. In these cases flush log records are written for 2 and flushed for 1 on each transaction commit. If innodb_flush_log_at_trx_commit is 2 log records are flushed periodically by special thread which will be considered later. The low-level function which writes log records from buffer to file is log_group_write_buf(). But in the most cases it is not called directly but it is called from more high level log_write_up_to(). For the current case the calling stack is the following: (trx_commit_in_memory() or trx_commit_complete_for_mysql() or trx_prepare() e.t.c)-> trx_flush_log_if_needed()-> trx_flush_log_if_needed_low()-> log_write_up_to()-> log_group_write_buf(). It is quite easy to find the higher levels of calling stack, just set up breakpoint on log_group_write_buf() and execute any sql query that modifies innodb data. For example for the simple “insert” sql query the higher levels of calling stack are the following: mysql_execute_command()-> trans_commit_stmt()-> ha_commit_trans()-> TC_LOG_DUMMY::commit()-> ha_commit_low()-> innobase_commit()-> trx_commit_complete_for_mysql()-> trx_flush_log_if_needed()-> ... . log_io_complete() callback is invoked when i/o is finished for log files (see fil_aio_wait()). log_io_complete() flushes log files if this is not forbidden by innodb_flush_method or innodb_flush_log_at_trx_commit options. Writing log buffer to disk: innodb_flush_log_at_trx_commit is equal to 0 The third case is when innodb_flush_log_at_trx_commit is equal to 0. For this case log buffer is NOT written to disk on transaction commit, it is written and flushed periodically by separate thread “srv_master_thread”. If innodb_flush_log_at_trx_commit = 0 log files are flushed in the same thread by the same calls. The calling stack is the following: srv_master_thread()-> (srv_master_do_active_tasks() or srv_master_do_idle_tasks() or srv_master_do_shutdown_tasks())-> srv_sync_log_buffer_in_background()-> log_buffer_sync_in_background()->log_write_up_to()->... . Special cases for logs flushing While log_io_complete() do flushing depending on innodb_flush_log_at_trx_commit value among others log_write_up_to() has it’s own flushing criteria. This is flush_to_disk function argument. So it is possible to force log files flushing even if innodb_flush_log_at_trx_commit = 0. Here are examples of such cases: 1) buf_flush_write_block_low() Each page contains information about the last applied LSN(buf_flush_write_block_low::newest_modification), each log record is a description of change on certain page. Imagine we flushed some changed pages but log records for these pages were not flushed and server goes down. After starting the server some pages will have the newest modifications, but some of them were not flushed and the correspondent log records are lost too. We will have inconsistent database in this case. That is why log records must be flushed before the pages they refer. 2) srv_sync_log_buffer_in_background() As it was described above this function is called periodically by special thread and forces flushing. 3) log_checkpoint() When checkpoint is made log files must be reliably flushed. 4) The special handlerton innobase_flush_logs() which can be called through ha_flush_logs() from mysql server. For example ha_flush_logs() is called from MYSQL_BIN_LOG::reset_logs() when “RESET MASTER” or “RESET SLAVE” are executed. 5) srv_master_do_shutdown_tasks() – on shutdown, ha_innobase::create() – on table creating, ha_innobase::delete_table() – on table removing, innobase_drop_database() – on all database tables removing, innobase_rename_table() – on table rename e.t.c If log files are treated as circular buffer what happens when the buffer is overflown? Briefly. Innodb has a mechanism which allows you to avoid overflowing. It is called “checkpoints.” The checkpoint is a state when log files are synchronized with data files. In this case there is no need to keep the history of changes before checkpoint because all pages with the last modifications LSN less or equal to checkpoint LSN are flushed and the log files space from the last written LSN to the last checkpoint LSN can be reused. We will not describe a checkpoint process here because it is a separate interesting subject. The only thing we need to know is when checkpoint happens all pages with modification LSN less or equal to checkpoint LSN are reliably flushed. How archived logs are written by server. So the log contains information about page changes. But as we said, log files are the circular buffer. This means that they occupy fixed disk size and the oldest records can be rewritten by the newest ones as there are points when data files are synchronized with log files called checkpoints and there is no need to store the previous history of log records to guarantee database consistency. The idea is to save somewhere all log records to have the possibility of applying them to backuped data to have some kind if incremental backup. For example if we want to have an archive of log records. As log consists of log files it is reasonable to store log records in such files too, and these files are called “archived logs.” Archived log files are written to the directory which can be set with special innodb option. Each file has the same size as innodb log size and the suffix of each archived file is the LSN from which it is started. As well as log writing system log archiving system stores its data in global log_sys object. Here are the most valuable fields in log_sys from my point of view: log_sys->archive_buf, log_sys->archive_buf_size – logs archive buffer and its size, log records are copied from log buffer log_sys->buf to this buffer before writing to disk; log_sys->archiving_phase – the current phase of log archiving: LOG_ARCHIVE_READ when log records are being copied from log_sys->buf to log_sys->archive_buf, LOG_ARCHIVE_WRITE when log_sys->archive_buf is being written to disk; log_sys->archived_lsn – the LSN to which log files are written; log_sys->next_archived_lsn – the LSN to which write operations was invoked but not yet finished; log_sys->max_archived_lsn_age – the maximum difference between log_sys->lsn and log_sys->archived_lsn, if this difference exceeds the log are being archived synchronously, i.e. the difference is decreased; log_sys->archive_lock – this is rw-lock which is used for synchronizing LOG_ARCHIVE_WRITE and LOG_ARCHIVE_READ phases, it is x-locked on LOG_ARCHIVE_WRITE phase. So how is data copied from log_sys->buf to log_sys->archived_buf? log_archive_do() is used for this. It is not only set the proper state for archived log fields in log_sys but also invokes log_group_read_log_seg() with corresponding arguments which not only copy data from log buffer to archived log buffer but also invokes asynchronous write operation for archived log buffer. log_archive_do() can wait until io operations are finished using log_sys->archive_lock if corresponding function parameter is set. The main question is on what circumstances log_archive_do() is invoked, i.e. when log records are being written to archived log files. The first call stack is the following: log_free_check()-> log_check_margins()-> log_archive_margin()-> log_archive_do(). Here is text of log_free_check() with comments: /*********************************************************************// Checks if there is need for a log buffer flush or a new checkpoint, and does this if yes. Any database operation should call this when it has modified more than about 4 pages. NOTE that this function may only be called when the OS thread owns no synchronization objects except the dictionary mutex. */ UNIV_INLINE void log_free_check(void) /*================*/ { #ifdef UNIV_SYNC_DEBUG ut_ad(sync_thread_levels_empty_except_dict()); #endif /* UNIV_SYNC_DEBUG */ if (log_sys->check_flush_or_checkpoint) { log_check_margins(); } } log_sys->check_flush_or_checkpoint is set when there is no enough free space in log buffer or it is time to do checkpoint or any other bound case. log_archive_margin() is invoked only if the limit if the difference between log_sys->lsn and log_sys->archived_lsn is exceeded. Let’s refer to this difference as archived lsn age. One more call log_archive_do() is from log_open() when archived lsn age exceeds some limit. log_open() is called on each mtr_commit(). And for this case archived logs are written synchronously. The next synchronous call is from log_archive_all() during shutdown. Summarizing all above archived logs begins to be written when the log buffer is full enough to be written or when checkpoint happens or when the server is in the process of shut down. And there is no any delay between writing to archive log buffer and writing to disk. I mean there is no way to say that archived logs must be written once a second as it is possible for redo logs with innodb_flush_log_at_trx_commit = 0. As soon as data is copied to the buffer the write operation is invoked immediately for this buffer. Archived log buffer is not filled on each mtr_commit() so it does not slow down the usual logging process. The exception is when there are a lot of io operations what can be the reason of archive log age is too big. The result of big archive log age is the synchronous archived logs writing during mtr_commit(). Memory to memory copying is quite fast operation that is why the data is copied to archived log buffer and is written to disk asynchronously minimizing delays which can be caused by logs archiving. PS: Here is another call stack for writing archived log buffer to archived log files: log_io_complete()->log_io_complete_archive()->log_archive_check_completion_low()->log_archive_groups(). I propose to explore this stack yourself. Logs recovery process, how it is started and works inside. Archived logs applying. So we discovered how innodb redo logging works, and how redo logs are archived. And the last uncovered thing is how recovery works and how archived logs are applied. These two processes are very similar – that is why they are discussed in one section of this post. The story begins with innobase_start_or_create_for_mysql() which is invoked from innobase_init(). The following trident in innobase_start_or_create_for_mysql() can be used to search the relevant code: if (create_new_db) { ... } else if (srv_archive_recovery) { ... } else { ... } The second condition and the last one is the place from which archived logs applying and innodb logs recovery processes correspondingly start. These two blocks wrap two pairs of functions: recv_recovery_from_archive_start() recv_recovery_from_archive_finish() and recv_recovery_from_checkpoint_start() recv_recovery_from_checkpoint_finish() And all the magic happens in these pairs. As well as global log_sys object for redo logging there is global recv_sys object for innodb recovery and archived logs applying. It is created and initialized in recv_sys_create() and recv_sys_init() functions correspondingly. The following fields if recv_sys object are the most important from my point: recv_sys->limit_lsn – the LSN up to which recovery should be made, this value is initialized with the maximum value of uint64_t(see #define LSN_MAX) for the recovery process and with certain value which is passed as an argument of recv_recovery_from_archive_start() function and can be set via xtrabackup option for log applying; recv_sys->parse_start_lsn – the LSN from which logs parsing is started, for the the logs recovery this value equals to the last checkpoint LSN, for logs applying this is last applied LSN; recv_sys->scanned_lsn – the LSN up to which log files are scanned; recv_sys->recovered_lsn – the LSN up to which log records are applied, this value <= recv_sys->scanned_lsn; The first thing that must be done for starting recovery process is to find out the point in log files where the recovery must be started from. This is the last checkpoint LSN. recv_find_max_checkpoint() proceed this. As we can see in log_group_checkpoint() the following code writes checkpoint info into two places in the first log file depending on the checkpoint number: /* We alternate the physical place of the checkpoint info in the first log file */ if ((log_sys->next_checkpoint_no & 1) == 0) { write_offset = LOG_CHECKPOINT_1; } else { write_offset = LOG_CHECKPOINT_2; } So recv_find_max_checkpoint() reads checkpoint info from both places and selects the latest checkpoint. The same idea is applied for logs, too, but the last applied LSN instead of last checkpoint LSN must be found. Here is the call stack for reading last applied LSN: innobase_start_or_create_for_mysql()-> open_or_create_data_files()-> fil_read_first_page(). The last applied LSN is stored in the first page of data files in (min|max)_flushed_lsn fields(see FIL_PAGE_FILE_FLUSH_LSN offset). These values are written in fil_write_flushed_lsn_to_data_files() function on server shutdown. So the main difference between logs applying and recovery process at this stage is the manner of calculating LSN from which log records will be read. For logs applying the last flushed LSN is used but for recovery process it is the last checkpoint LSN. Why does this difference take place? Logs can be applied periodically. Assume we gather archived logs and apply them once an hour to have fresh backup. After applying the previous bunch of log files there can be unfinished transactions. For the recovery process any unfinished transactions are rolled back to have consistent db state at server starting. But for the logs applying process there is no need to roll back them because any unfinished transactions can be finished during the next logs applying. After calculating the start LSN the sequence of actions is the same for both recovering and applying. The next step is reading and parsing log records. See recv_group_scan_log_recs() which is invoked from recv_recovery_from_checkpoint_start_func() for logs recovering and recv_recovery_from_archive_start()->log_group_recover_from_archive_file() for logs applying. The first we read log records to some buffer and then invoke recv_scan_log_recs() to parse them. recv_scan_log_recs() checks each log block on consistency(checksum + comparing the log block number written in log block with log block number calculated from log block LSN) and other edge cases and copy it to parsing buffer recv_sys->buf with recv_sys_add_to_parsing_buf() function. The parsing buffer is then parsed by recv_parse_log_recs(). Log records are stored in hash table recv_sys->addr_hash. The key for this hash table is calculated basing on space id and page number pair. This pair refers to the page to which log records must be applied. The value of the hash table is object of recv_addr_t type. recv_addr_t type contains rec_list field which is the list of log records for applying to the (space id, page num) page (see recv_add_to_hash_table(). After parsing and storing log record in hash table recv_sys->addr_hash log records are applied. The function which is responsible for log records applying is recv_apply_hashed_log_recs(). It is invoked from recv_scan_log_recs() if there is no enough memory to store log records and at the end of recovering/applying process. For each element of recv_sys->addr_hash, i.e. for each DB page which must be changed with log records recv_recover_page() is invoked. It can be invoked as from recv_apply_hashed_log_recs() in the case if page is already in buffer pool of from buf_page_io_complete() on io completion, i.e. just after page was read from storage. Applying log records on page read completion is necessary and very convenient. Assume log records have not yet applied as we had enough memory to store the whole recovery log records. But we want for example to boot DB dictionary. I this case any records that concern to the pages of the dictionary will be applied to those pages just after reading them from storage to buffer pool. The function which applies log records to the certain page is recv_recover_page_func(). It gets the list of log records for the certain page from recv_sys->addr_hash hash table, for each element of this list it compares the lsn of last page changes with the LSN of the record, and if the former is greater the later it applies log record to the page. After applying all log records from archived logs xtrabackup writes last applied LSN to (min|max)_flushed LSN fields of each data file and finishes execution. The logs recovery process rollbacks all unfinished transactions unless this is forbidden with innodb-force-recovery parameter. Conclusion We covered the processes of redo logs writing and recovery in depth. These are very important processes as they provide data consistency on crashes. These two processes became a base for logs archiving and applying features. As log records can describe any data changes the idea is to store these records somewhere and then apply them to backups for organizing some kind of incremental backup. The features were implemented a short time ago and currently they are not widely used. So if you have something to say about them you are welcome to comment for discussion.
April 16, 2014
by Peter Zaitsev
· 6,170 Views
article thumbnail
A Docker ‘Hello World' With Mono
Docker is a lightweight virtualization technology for Linux that promises to revolutionize the deployment and management of distributed applications. Rather than requiring a complete operating system, like a traditional virtual machine, Docker is built on top of Linux containers, a feature of the Linux kernel, that allows light-weight Docker containers to share a common kernel while isolating applications and their dependencies. There’s a very good Docker SlideShare presentation here that explains the philosophy behind Docker using the analogy of standardized shipping containers. Interesting that the standard shipping container has done more to create our global economy than all the free-trade treaties and international agreements put together. A Docker image is built from a script, called a ‘Dockerfile’. Each Dockerfile starts by declaring a parent image. This is very cool, because it means that you can build up your infrastructure from a layer of images, starting with general, platform images and then layering successively more application specific images on top. I’m going to demonstrate this by first building an image that provides a Mono development environment, and then creating a simple ‘Hello World’ console application image that runs on top of it. Because the Dockerfiles are simple text files, you can keep them under source control and version your environment and dependencies alongside the actual source code of your software. This is a game changer for the deployment and management of distributed systems. Imagine developing an upgrade to your software that includes new versions of its dependencies, including pieces that we’ve traditionally considered the realm of the environment, and not something that you would normally put in your source repository, like the Mono version that the software runs on for example. You can script all these changes in your Dockerfile, test the new container on your local machine, then simply move the image to test and then production. The possibilities for vastly simplified deployment workflows are obvious. Docker brings concerns that were previously the responsibility of an organization’s operations department and makes them a first class part of the software development lifecycle. Now your infrastructure can be maintained as source code, built as part of your CI cycle and continuously deployed, just like the software that runs inside it. Docker also provides docker index, an online repository of docker images. Anyone can create an image and add it to the index and there are already images for almost any piece of infrastructure you can imagine. Say you want to use RabbitMQ, all you have to do is grab a handy RabbitMQ images such as https://index.docker.io/u/tutum/rabbitmq/ and run it like this: docker run -d -p 5672:5672 -p 55672:55672 tutum/rabbitmq The –p flag maps ports between the image and the host. Let’s look at an example. I’m going to show you how to create a docker image for the Mono development environment and have it built and hosted on the docker index. Then I’m going to build a local docker image for a simple ‘hello world’ console application that I can run on my Ubuntu box. First we need to create a Docker file for our Mono environment. I’m going to use the Mono debian packages from directhex. These are maintained by the official Debian/Ubuntu Mono team and are the recommended way of installing the latest Mono versions on Ubuntu. Here’s the Dockerfile: #DOCKER-VERSION 0.9.1 # #VERSION 0.1 # # monoxide mono-devel package on Ubuntu 13.10 FROM ubuntu:13.10 MAINTAINER Mike Hadlow RUN sudo DEBIAN_FRONTEND=noninteractive apt-get install -y -q software-properties-common RUN sudo add-apt-repository ppa:directhex/monoxide -y RUN sudo apt-get update RUN sudo DEBIAN_FRONTEND=noninteractive apt-get install -y -q mono-devel Notice the first line (after the comments) that reads, ‘FROM ubuntu:13.10’. This specifies the parent image for this Dockerfile. This is the official docker Ubuntu image from the index. When I build this Dockerfile, that image will be automatically downloaded and used as the starting point for my image. But I don’t want to build this image locally. Docker provide a build server linked to the docker index. All you have to do is create a public GitHub repository containing your dockerfile, then link the repository to your profile on docker index. You can read the documentation for the details. The GitHub repository for my Mono image is at https://github.com/mikehadlow/ubuntu-monoxide-mono-devel. Notice how the Docker file is in the root of the repository. That’s the default location, but you can have multiple files in sub-directories if you want to support many images from a single repository. Now any time I push a change of my Dockerfile to GitHub, the docker build system will automatically build the image and update the docker index. You can see image listed here:https://index.docker.io/u/mikehadlow/ubuntu-monoxide-mono-devel/ I can now grab my image and run it interactively like this: $ sudo docker pull mikehadlow/ubuntu-monoxide-mono-devel Pulling repository mikehadlow/ubuntu-monoxide-mono-devel f259e029fcdd: Download complete 511136ea3c5a: Download complete 1c7f181e78b9: Download complete 9f676bd305a4: Download complete ce647670fde1: Download complete d6c54574173f: Download complete 6bcad8583de3: Download complete e82d34a742ff: Download complete $ sudo docker run -i mikehadlow/ubuntu-monoxide-mono-devel /bin/bash mono --version Mono JIT compiler version 3.2.8 (Debian 3.2.8+dfsg-1~pre1) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: altstack Notifications: epoll Architecture: amd64 Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen exit Next let’s create a new local Dockerfile that compiles a simple ‘hello world’ program, and then runs it when we run the image. You can follow along with these steps. All you need is a Ubuntu machine with Docker installed. First here’s our ‘hello world’, save this code in a file named hello.cs: using System; namespace Mike.MonoTest { public class Program { public static void Main() { Console.WriteLine("Hello World"); } } } Next we’ll create our Dockerfile. Copy this code into a file called ‘Dockerfile’: #DOCKER-VERSION 0.9.1 FROM mikehadlow/ubuntu-monoxide-mono-devel ADD . /src RUN mcs /src/hello.cs CMD ["mono", "/src/hello.exe"] Once again, notice the ‘FROM’ line. This time we’re telling Docker to start with our mono image. The next line ‘ADD . /src’, tells Docker to copy the contents of the current directory (the one containing our Dockerfile) into a root directory named ‘src’ in the container. Now our hello.cs file is at /src/hello.cs in the container, so we can compile it with the mono C# compiler, mcs, which is the line ‘RUN mcs /src/hello.cs’. Now we will have the executable, hello.exe, in the src directory. The line ‘CMD [“mono”, “/src/hello.exe”]’ tells Docker what we want to happen when the container is run: just execute our hello.exe program. As an aside, this exercise highlights some questions around what best practice should be with Docker. We could have done this in several different ways. Should we build our software independently of the Docker build in some CI environment, or does it make sense to do it this way, with the Docker build as a step in our CI process? Do we want to rebuild our container for every commit to our software, or do we want the running container to pull the latest from our build output? Initially I’m quite attracted to the idea of building the image as part of the CI but I expect that we’ll have to wait a while for best practice to evolve. Anyway, for now let’s manually build our image: $ sudo docker build -t hello . Uploading context 1.684 MB Uploading context Step 0 : FROM mikehadlow/ubuntu-monoxide-mono-devel ---> f259e029fcdd Step 1 : ADD . /src ---> 6075dee41003 Step 2 : RUN mcs /src/hello.cs ---> Running in 60a3582ab6a3 ---> 0e102c1e4f26 Step 3 : CMD ["mono", "/src/hello.exe"] ---> Running in 3f75e540219a ---> 1150949428b2 Successfully built 1150949428b2 Removing intermediate container 88d2d28f12ab Removing intermediate container 60a3582ab6a3 Removing intermediate container 3f75e540219a You can see Docker executing each build step in turn and storing the intermediate result until the final image is created. Because we used the tag (-t) option and named our image ‘hello’, we can see it when we list all the docker images: $ sudo docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE hello latest 1150949428b2 10 seconds ago 396.4 MB mikehadlow/ubuntu-monoxide-mono-devel latest f259e029fcdd 24 hours ago 394.7 MB ubuntu 13.10 9f676bd305a4 8 weeks ago 178 MB ubuntu saucy 9f676bd305a4 8 weeks ago 178 MB ... Now let’s run our image. The first time we do this Docker will create a container and run it. Each subsequent run will reuse that container: $ sudo docker run hello Hello World And that’s it. Imagine that instead of our little hello.exe, this image contained our web application, or maybe a service in some distributed software. In order to deploy it, we’d simply ask Docker to run it on any server we like; development, test, production, or on many servers in a web farm. This is an incredibly powerful way of doing consistent repeatable deployments. To reiterate, I think Docker is a game changer for large server side software. It’s one of the most exciting developments to have emerged this year and definitely worth your time to check out.
April 3, 2014
by Mike Hadlow
· 11,309 Views
article thumbnail
Docker: Bulk Remove Images and Containers
I’ve just started looking at Docker. It’s a cool new technology that has the potential to make the management and deployment of distributed applications a great deal easier. I’d very much recommend checking it out. I’m especially interested in using it to deploy Mono applications because it promises to remove the hassle of deploying and maintaining the mono runtime on a multitude of Linux servers. I’ve been playing around creating new images and containers and debugging my Dockerfile, and I’ve wound up with lots of temporary containers and images. It’s really tedious repeatedly running ‘docker rm’ and ‘docker rmi’, so I’ve knocked up a couple of bash commands to bulk delete images and containers. Delete all containers: sudo docker ps -a -q | xargs -n 1 -I {} sudo docker rm {} Delete all un-tagged (or intermediate) images: sudo docker rmi $( sudo docker images | grep '' | tr -s ' ' | cut -d ' ' -f 3)
April 2, 2014
by Mike Hadlow
· 14,628 Views
article thumbnail
Distributed Counters Feature Design
this is another experiment with longer posts. previously, i used the time series example as the bed on which to test some ideas regarding feature design, to explain how we work and in general work out the rough patches along the way. i should probably note that these posts are purely fiction at this point. we have no plans to include a time series feature in ravendb at this time. i am trying to work out some thoughts in the open and get your feedback. at any rate, yesterday we had a request for cassandra style counters at the mailing list. and as long as i am doing feature design series, i thought that i could talk about how i would go about implementing this. again, consider this fiction, i have no plans of implementing this at this time. the essence of what we want is to be able to… count stuff. efficiently, in a distributed manner, with optional support for cross data center replication. very roughly, the idea is to have “sub counters”, unique for every node in the system. whenever you increment the value, we log this to our own sub counter, and then replicate it out. whenever you read it, we just sum all the data we have from all the sub counters. let us outline the various parts of the solution in the same order as the one i used for time series. storage a counter is just a named 64 bits signed integer. a counter name can be any string up to 128 printable characters. the external interface of the storage would look like this: 1: public struct counterincrement 2: { 3: public string name; 4: public long change; 5: } 6: 7: public struct counter 8: { 9: public string name; 10: public string source; 11: public long value; 12: } 13: 14: public interface icounterstorage 15: { 16: void localincrementbatch(counterincrement[] batch); 17: 18: counter[] read(string name); 19: 20: void replicatedupdates(counter[] updates); 21: } as you can see, this gives us very simple interface for the storage. we can either change the data locally (which modify our own storage) or we can get an update from a replica about its changes. there really isn’t much more to it, to be fair. the localincrementbatch() increment a local value, and read() will return all the values for a counter. there is a little bit of trickery involved in how exactly one would store the counter values. for now, i think we’ll store each counter as two step values. we’ll have a tree of multi tree values that will carry each value from each source. that means that a counter will take roughly 4kb or so. this is easy to work with and nicely fit the model voron uses internally. note that we’ll outline additional requirement for storage (searching for counter by prefix, iterating over counters, addresses of other servers, stats, etc) below. i’m not showing them here because they aren’t the major issue yet. over the wire skipping out on any optimizations that might be required, we will expose the following endpoints: get /counters/read?id=users/1/visits&users/1/posts <—will return json response with all the relevant values (already summed up). { “users/1/visits”: 43, “users/1/posts”: 3 } get /counters/read?id=users/1/visits&users/1/1/posts&raw=true <—will return json response with all the relevant values, per source. { “users/1/visits”: {“rvn1”: 21, “rvn2”: 22 } , “users/1/posts”: { “rvn1”: 2, “rvn3”: 1 } } post /counters/increment <– allows to increment counters. the request is a json array of the counter name and the change. for a real system, you’ll probably need a lot more stuff, metrics, stats, etc. but this is the high level design, so this would be enough. note that we are skipping the high performance stream based writes we outlined for time series. we’ll probably won’t need them, so that doesn’t matter, but they are an option if we need them. system behavior this is where it is really not interesting, there is very little behavior here, actually. we only have to read the data from the storage, sum it up, and send it to the user. hardly what i’ll call business logic. client api the client api will probably look something like this: 1: counters.increment("users/1/posts"); 2: counters.increment("users/1/visits", 4); 3: 4: using(var batch = counters.batch()) 5: { 6: batch.increment("users/1/posts"); 7: batch.increment("users/1/visits",5); 8: batch.submit(); 9: } note that we’re offering both batch and single api. we’ll likely also want to offer a fire & forget style, which will be able to offer even better performance (because they could do batching across more than a single thread), but that is out of scope for now. for simplicity sake, we are going to have the client just a container for all of endpoints that it knows about. the container would be responsible for… updating the client visible topology, selecting the best server to use at any given point, etc. user interface there isn’t much to it. just show a list of counter values in a list. allow to search by prefix, allow to dive into a particular counter and read its raw values, but that is about it. oh, and allow to delete a counter. deleting data honestly, i really hate deletes. they are very expensive to handle properly the moment you have more than a single node. in this case, there is an inherent race condition between a delete going out and another node getting an increment. and then there is the issue of what happens if you had a node down when you did the delete, etc. this just sucks. deletion are handled normally, (with the race condition caveat, obviously), and i’ll discuss how we replicate them in a bit. high availability / scale out by definition, we actually don’t want to have storage replication here. either log shipping or consensus based. we actually do want to have different values, because we are going to be modifying things independently on many servers. that means that we need to do replication at the database level. and that leads to some interesting questions. again, the hard part here is the deletes. actually, the really hard part is what we are going to do with the new server problem. the new server problem dictates how we are going to bring a new server into the cluster. if we could fix the size of the cluster, that would make things a lot easier. however, we are actually interested in being able to dynamically grow the cluster size. therefor, there are only two real ways to do it: add a new empty node to the cluster, and have it be filled from all the other servers. add a new node by backing up an existing node, and restoring as a new node. ravendb, for example, follows the first option. but it means that in needs to track a lot more information. the second option is actually a lot simpler, because we don’t need to care about keeping around old data. however, this means that the process of bringing up a new server would now be: update all nodes in the cluster with the new node address (node isn’t up yet, replication to it will fail and be queued). backup an existing node and restore at the new node. start the new node. the order of steps is quite important. and it would be easy to get it wrong. also, on large systems, backup & restore can take a long time. operationally speaking, i would much rather just be able to do something like, bring a new node into the cluster in “silent” mode. that is, it would get information from all the other nodes, and i can “flip the switch” and make it visible to clients at any point in time. that is how you do it with ravendb, and it is an incredibly powerful system, when used properly. that means that for all intents and purposes, we don’t do real deletes. what we’ll actually do is replace the counter value with delete marker. this turns deletes into a much simple “just another write”. it has the sad implication of not free disk space on deletes, but deletes tend to be rare, and it is usually fine to add a “purge” admin option that can be run on as needed basis. but that brings us to an interesting issue, how do we actually handle replication. the topology map to simplify things, we are going to go with one way replication from a node to another. that allows complex topologies like master-master, cluster-cluster, replication chain, etc. but in the end, this is all about a single node replication to another. the first question to ask is, are we going to replicate just our local changes, or are we going to have to replicate external changes as well? the problem with replicating external changes is that you may have the following topology: now, server a got a value and sent it to server b. server b then forwarded it to server c. however, at that point, we also have a the value from server a replicated directly to server c. which value is it supposed to pick? and what about a scenario where you have more complex topology? in general, because in this type of system, we can have any node accept writes, and we actually desire this to be the case , we don’t want this behavior. we want to only replicate local data, not all the data. of course, that leads to an annoying question, what happens if we have a 3 node cluster, and one node fails catastrophically. we can bring a new node in, and the other two nodes will be able to fill in their values via replication, but what about the node that is down? the data isn’t gone, it is still right there in the other two nodes, but we need a way to pull it out. therefor, i think that the best option would be to say that nodes only replicate their local state, except in the case of a new node. a new node will be told the address of an existing node in the cluster, at which point it will: register itself in all the nodes in the cluster (discoverable from the existing node). this assumes a standard two way replication link between all servers, if this isn’t the case, the operators would have the responsibility to setup the actual replication semantics on their own. new node now starts getting updates from all the nodes in the cluster. it keeps them in a log for now, not doing anything yet. ask that node for a complete update of all of its current state. when it has all the complete state of the existing node, it replays all of the remembered logs that it didn’t have a chance to apply yet. then it announces that it is in a valid state to start accepting client connections. note that this process is likely to be very sensitive to high data volumes. that is why you’ll usually want to select a backup node to read from, and that decision is an ops decision. you’ll also want to be able to report extensively on the current status of the node, since this can take a while, and ops will be watching this very closely. server name a node requires a unique name. we can use guids, but those aren’t readable, so we can use machine name + port, but those can change. ideally, we can require the user to set us up with a unique name. that is important for readability and for being able to alter see all the values we have in all the nodes. it is important that names are never repeated, so we’ll probably have a guid there anyway, just to be on the safe side. actual replication semantics since we have the new server problem down to an automated process, we can choose the drastically simpler model of just having an internal queue per each replication destination. whenever we make a change, we also make a note of that in the queue for that destination, then we start an async replication process to that server, sending all of our updates there. it is always safe to overwrite data using replication, because we are overwriting our own data, never anyone else. and… that is about it, actually. there are probably a lot of details that i am missing / would discover if we were to actually implement this. but i think that this is a pretty good idea about what this feature is about.
March 25, 2014
by Oren Eini
· 12,598 Views · 1 Like
article thumbnail
Cloud Automation with WinRM vs SSH
[Article originally written by Barak Merimovich.] Automation the Linux Way In the Linux world SSH, secure shell, is the de facto standard for remote connectivity and automation for the purpose of logging into a remote machine to install tools and run commands. It's pretty much ubiquitous, runs across multiple Linux versions and distributions, and every Linux admin worth their salt knows SSH and how to configure it. What's more, it's even the default enabled port on most clouds - port 22. An important feature available with SSH is support for file transfer via its secure copy protocol - AKA SCP, and secure file transfer protocol - AKA SFTP. These are a built-in part of the tool or exist as add-ons to the protocol that are almost always available. Therefore, using SSH for file transfer and remote execution is basically a given with Linux, and there are even tools to support SSH clients available for virtually every major programming language and operating system. WinRM in a Linux World So what comes out-of-the-box with Linux, is less of a given with Windows. SSH, obviously, is not built in with Windows; over the years there have been different protocols attempting to achieve the same functionality, such as Secure Telnet and others, however to date, none have really caught on. From Windows Server 2003, a new tool called WinRM - windows remote management, was introduced. WinRM is a SOAP-based protocol built on web services that among other things, allows you to connect to a remote system, providing a shell, essentially offering similar functionality to SSH. WinRM is currently the Windows world alternative to SSH. The Pros The advantage with WinRM is that you can use a vanilla VM with nothing pre-configured on it, with the only prerequisite being that the WinRM service needs to be running. EC2, the largest cloud provider today, supports this out-of-the-box, so if you want to run a standard Amazon machine image (AMI) for Windows, WinRM is enabled by default. This makes it possible to quickly start working with a cloud, all that needs to be done is bring up a standard Windows VM, and then it's possible to remotely configure it - and start using it. This is very useful in cloud environments where you are sometimes unable to create a custom Windows image or are limited to a very small number of images and want to limit your resource usage. The Challenges Where SSH has become the de facto protocol with Linux, WinRM is far less known tool in the Windows world, although it does offer comparable features as far as security, as well as connecting and executing commands to a remote machine. The standard tool for using WinRM is usually PowerShell, the new Windows shell that is intended to supersede the standard command prompt. To date though, there are still relatively few programming languages with built-in support for WinRM, making automation and remote execution of tasks over WinRM much more complex. To achieve these tasks, Cloudify employs PowerShell itself, as an external process to act as a client library for accessing WinRM. The primary issue with this, however, is that the client-side also needs to be running Windows, as PowerShell cannot run on Linux. Another aspect where WinRM differs from SSH is that it does not really have built-in file transfer. There is no direct equivalent for secure copy in SSH for WinRM. That said, it is possible to implement file transfer through PowerShell scripts. There are currently several open source initiatives looking to build a WinRM client for Linux - or specifically for some programming languages, such as Java, however, these are in different levels of maturity, where none of them are fully featured yet. Hence, PowerShell remains the default tool for Cloudify, which essentially provides the same level of functionality you would expect for running remote commands on a Linux machine with Windows. WinRM & Security Another interesting point to consider about WinRM is its support for encryption. WinRM supports three types of transfer protocols, HTTP, HTTPS, and encrypted HTTP. With HTTP, inevitably your wire protocol is unencrypted. It is only a good idea to use HTTP inside your own data center in the event that you are completely convinced that no one can monitor anything going over the wire. HTTPS is commonly used instead of HTTP, however with WinRM there's a chicken and egg issue. If you want to work with HTTPS you are required to set up an SSL certificate on the remote machine. The challenge here is when you're starting with a vanilla Windows VM that will not have the certificate installed, there is a need to automate the insertion of that certificate, however this often cannot be done, as WinRM is not running. Encrypted HTTP, which is also the default in EC2, basically uses your login credentials as your encryption key and it works. From a security perspective this is the recommended secure transfer protocol to use. It is worth noting that most attempts to create a WinRM client library tend to encounter problems around the encrypted HTTP protocol, as implementing MS' encrypted HTTP system - credSSP - is challenging. However, there are various projects working on achieving this, so it will hopefully be solved in the near future. Where Cloudify Comes Into the Mix Where WinRM comes into play with Cloudify, is during the cloud bootstrapping process. By using WinRM Cloudify is able to remotely connect to a vanilla VM provided by the cloud, and set up the Cloudify manager or agent to run on the machine. In addition to traditional cloud environments, WinRM also works on non-cloud and non-virtualized environments, such as a standard data center with multiple Windows servers running. All that needs to be done is provide Cloudify with the credentials, and it will use WinRM to connect and set up the machine remotely. Since WinRM is pre-packaged with Windows, there is no need to install anything. The only thing requirement, as mentioned above, is to have the WinRM service running, as not all Windows images will have this service running. Conclusion In short WinRM is the Window's world alternative to SSHD that allows you to remotely login securely and execute commands on Windows machines. From a cloud automation perspective, it provides virtually all the necessary functionality requirements, and thus it is recommended to have WinRM running in your Windows environment.
March 19, 2014
by Sharone Zitzman
· 25,969 Views
article thumbnail
Step-by-Step: Live Migrate Multiple (Clustered) VMs in One Line of PowerShell - Revisited
A while back, I wrote an article showing how to Live Migrate Your VMs in One Line of Powershell between non-clustered Windows Server 2012 Hyper-V hosts using Shared Nothing Live Migration. Since then, I’ve been asked a few times for how this type of parallel Live Migration would be performed for highly available virtual machines between Hyper-V hosts within a cluster. In this article, we’ll walk through the steps of doing exactly that … via Windows PowerShell on Windows Server 2012 or 2012 R2 or our FREE Hyper-V Server 2012 R2 bare-metal, enterprise-grade hypervisor in a clustered configuration. Wait! Do I need PowerShell to Live Migrate multiple VMs within a Cluster? Well, actually … No. You could certainly use the Failover Cluster Manager GUI tool to select multiple highly available virtual machines, right-click and select Move | Live Migration … Failover Cluster Manager – Performing Multi-VM Live Migration But, you may wish to script this process for other reasons … perhaps to efficiently drain all VM’s from a host as part of a maintenance script that will be performing other tasks. Can I use the same PowerShell cmdlets for Live Migrating within a Cluster? Well, actually … No again. When VMs are made highly available resources within a cluster, they’re managed as cluster group resources instead of being standalone VM resources. As a result, we have a different set of Cluster-aware PowerShell cmdlets that we use when managing these cluster groups. To perform a scripted multi-VM Live Migration, we’ll be leveraging three of these cmdlets: Get-ClusterNode, Get-ClusterGroup and Move-ClusterVirtualMachineRole Now, let’s see that one line of PowerShell! Before getting to the point of actually performing the multi-VM Live Migration in a single PowerShell command line, we first need to setup a few variables to handle the "what" and "where" of moving these VMs. First, let’s specify the name of the cluster with which we’ll be working. We’ll store it in a $clusterName variable. $clusterName = read-host -Prompt "Cluster name" Next, we’ll need to select the cluster node to which we’ll be Live Migrating the VMs. Lets use the Get-ClusterNode and Out-GridView cmdlets together to prompt for the cluster node and store the value in a $targetClusterNode variable. $targetClusterNode = Get-ClusterNode -Cluster $clusterName | Out-GridView -Title "Select Target Cluster Node" ` -OutputMode Single And then, we’ll need to create a list of all the VMs currently running in the cluster. We can use the Get-ClusterGroup cmdlet to retrieve this list. Below, we have an example where we are combining this cmdlet with a Where-Object cmdlet to return only the virtual machine cluster groups that are running on any node except the selected target cluster node. After all, it really doesn’t make any sense to Live Migrate a VM to the same node on which it’s currently running! $haVMs = Get-ClusterGroup -Cluster $clusterName | Where-Object {($_.GroupType -eq "VirtualMachine") ` -and ($_.OwnerNode -ne $targetClusterNode.Name)} We’ve stored the resulting list of VMs in a $haVMs variable. Ready to Live Migrate! OK … Now we have all of our variables defined for the cluster, the target cluster node and the list of VMs from which to choose. Here’s our single line of PowerShell to do the magic … $haVMs | Out-GridView -Title "Select VMs to Move" –PassThru | Move-ClusterVirtualMachineRole -MigrationType Live ` -Node $targetClusterNode.Name -Wait 0 Proceed with care: Keep in mind that your target cluster node will need to have sufficient available resources to run the VM's that you select for Live Migration. Of course, it's best to initially test tasks like this in your lab environment first. Here’s what is happening in this single PowerShell command line: We’re passing the list of VMs stored in the $haVMs variable to the Out-GridView cmdlet. Out-GridView prompts for which VMs to Live Migrate and then passes the selected VMs down the PowerShell object pipeline to the Move-ClusterVirtualMachineRole cmdlet. This cmdlet initiates the Live Migration for each selected VM, and because it’s using a –Wait 0 parameter, it initiates each Live Migration one-after-another without waiting for the prior task to finish. As a result, all of the selected VMs will Live Migrate in parallel, up to the maximum number of concurrent Live Migrations that you’ve configured on these cluster nodes. The VMs selected beyond this maximum will simply queue up and wait their turn. Unlike some competing hypervisors, Hyper-V doesn't impose an artificial hard-coded limit on how many VMs for you can Live Migrate concurrently. Instead, it's up to you to set the maximum to a sensible value based on your hardware and network capacity. Do you have your own PowerShell automation ideas for Hyper-V? Feel free to share your ideas in the Comments section below. See you in the Clouds! - Keith
March 3, 2014
by Keith Mayer
· 10,570 Views
article thumbnail
Jersey: Ignoring SSL certificate – javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException
Last week Alistair and I were working on an internal application and we needed to make a HTTPS request directly to an AWS machine using a certificate signed to a different host. We use jersey-client so our code looked something like this: Client client = Client.create(); client.resource("https://some-aws-host.compute-1.amazonaws.com").post(); // and so on When we ran this we predictably ran into trouble: com.sun.jersey.api.client.ClientHandlerException: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching some-aws-host.compute-1.amazonaws.com found. at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.post(WebResource.java:241) at com.neotechnology.testlab.manager.bootstrap.ManagerAdmin.takeBackup(ManagerAdmin.java:33) at com.neotechnology.testlab.manager.bootstrap.ManagerAdminTest.foo(ManagerAdminTest.java:11) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching some-aws-host.compute-1.amazonaws.com found. at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1884) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:276) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:270) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1341) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:153) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:868) at sun.security.ssl.Handshaker.process_record(Handshaker.java:804) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1016) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1312) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1339) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1323) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:563) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1300) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) ... 31 more Caused by: java.security.cert.CertificateException: No subject alternative DNS name matching some-aws-host.compute-1.amazonaws.com found. at sun.security.util.HostnameChecker.matchDNS(HostnameChecker.java:191) at sun.security.util.HostnameChecker.match(HostnameChecker.java:93) at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:347) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:203) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:126) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1323) ... 45 more We figured that we needed to get our client to ignore the certificate and came across this Stack Overflow thread which had some suggestions on how to do this. None of the suggestions worked on their own but we ended up with a combination of a couple of the suggestions which did the trick: public Client hostIgnoringClient() { try { SSLContext sslcontext = SSLContext.getInstance( "TLS" ); sslcontext.init( null, null, null ); DefaultClientConfig config = new DefaultClientConfig(); Map properties = config.getProperties(); HTTPSProperties httpsProperties = new HTTPSProperties( new HostnameVerifier() { @Override public boolean verify( String s, SSLSession sslSession ) { return true; } }, sslcontext ); properties.put( HTTPSProperties.PROPERTY_HTTPS_PROPERTIES, httpsProperties ); config.getClasses().add( JacksonJsonProvider.class ); return Client.create( config ); } catch ( KeyManagementException | NoSuchAlgorithmException e ) { throw new RuntimeException( e ); } } You’re welcome Future Mark.
March 2, 2014
by Mark Needham
· 43,042 Views · 8 Likes
article thumbnail
Running Hadoop MapReduce Application from Eclipse Kepler
it's very important to learn hadoop by practice. one of the learning curves is how to write the first map reduce app and debug in favorite ide, eclipse. do we need any eclipse plugins? no, we do not. we can do hadoop development without map reduce plugins this tutorial will show you how to set up eclipse and run your map reduce project and mapreduce job right from your ide. before you read further, you should have setup hadoop single node cluster and your machine. you can download the eclipse project from github . use case: we will explore the weather data to find maximum temperature from tom white’s book hadoop: definitive guide (3rd edition) chapter 2 and run it using toolrunner i am using linux mint 15 on virtualbox vm instance. in addition, you should have hadoop (mrv1 am using 1.2.1) single node cluster installed and running, if you have not done so, would strongly recommend you do it from here download eclipse ide, as of writing this, latest version of eclipse is kepler 1. create new java project 2. add dependencies jars right click on project properties and select java build path add all jars from $hadoop_home/lib and $hadoop_home (where hadoop core and tools jar lives) 3. create mapper package com.letsdobigdata; import java.io.ioexception; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.longwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapreduce.mapper; public class maxtemperaturemapper extends mapper { private static final int missing = 9999; @override public void map(longwritable key, text value, context context) throws ioexception, interruptedexception { string line = value.tostring(); string year = line.substring(15, 19); int airtemperature; if (line.charat(87) == '+') { // parseint doesn't like leading plus // signs airtemperature = integer.parseint(line.substring(88, 92)); } else { airtemperature = integer.parseint(line.substring(87, 92)); } string quality = line.substring(92, 93); if (airtemperature != missing && quality.matches("[01459]")) { context.write(new text(year), new intwritable(airtemperature)); } } } 4. create reducer package com.letsdobigdata; import java.io.ioexception; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapreduce.reducer; public class maxtemperaturereducer extends reducer { @override public void reduce(text key, iterable values, context context) throws ioexception, interruptedexception { int maxvalue = integer.min_value; for (intwritable value : values) { maxvalue = math.max(maxvalue, value.get()); } context.write(key, new intwritable(maxvalue)); } } 5. create driver for mapreduce job map reduce job is executed by useful hadoop utility class toolrunner package com.letsdobigdata; import org.apache.hadoop.conf.configured; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapreduce.job; import org.apache.hadoop.mapreduce.lib.input.fileinputformat; import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; import org.apache.hadoop.util.tool; import org.apache.hadoop.util.toolrunner; /*this class is responsible for running map reduce job*/ public class maxtemperaturedriver extends configured implements tool{ public int run(string[] args) throws exception { if(args.length !=2) { system.err.println("usage: maxtemperaturedriver "); system.exit(-1); } job job = new job(); job.setjarbyclass(maxtemperaturedriver.class); job.setjobname("max temperature"); fileinputformat.addinputpath(job, new path(args[0])); fileoutputformat.setoutputpath(job,new path(args[1])); job.setmapperclass(maxtemperaturemapper.class); job.setreducerclass(maxtemperaturereducer.class); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); system.exit(job.waitforcompletion(true) ? 0:1); boolean success = job.waitforcompletion(true); return success ? 0 : 1; } public static void main(string[] args) throws exception { maxtemperaturedriver driver = new maxtemperaturedriver(); int exitcode = toolrunner.run(driver, args); system.exit(exitcode); } } 6. supply input and output we need to supply input file that will be used during map phase and the final output will be generated in output directory by reduct task. edit run configuration and supply command line arguments. sample.txt reside in the project root. your project explorer should contain following ] 7. map reduce job execution 8. final output if you managed to come this far, once the job is complete, it will create output directory with _success and part_nnnnn , double click to view it in eclipse editor and you will see we have supplied 5 rows of weather data (downloaded from ncdc weather) and we wanted to find out the maximum temperature in a given year from input file and the output will contain 2 rows with max temperature in (centigrade) for each supplied year 1949 111 (11.1 c) 1950 22 (2.2 c) make sure you delete the output directory next time running your application else you will get an error from hadoop saying directory already exists. happy hadooping!
February 21, 2014
by Hardik Pandya
· 144,659 Views · 2 Likes
article thumbnail
To ServiceMix or Not to ServiceMix
This morning an interesting topic was posted to the Apache ServiceMix user forum, asking the question: To ServiceMix or not ServiceMix. In my mind the short answer is: NO Guillaume Nodet one of the key architects and long time committer on Apache ServiceMix already had his mind set 3 years ago when he wrong this blog post - Thoughts about ServiceMix. What has happened on the ServiceMix project was that the ServiceMix kernel was pulled out of ServiceMix into its own project - Apache Karaf. That happened in spring 2009, which Guillaume also blogged about. So is all that bad? No its IMHO all great. In fact having the kernel as a separate project, and Camel and CXF as the integration and WS/RS frameworks, would allow the ServiceMix team to focus on building the ESB that truly had value-add. But that did not happen. ServiceMix did not create a cross product security model, web console, audit and trace tooling, clustering, governance, service registry, and much more that people were looking for in an ESB (or related to a SOA suite). There were only small pieces of it, but never really baked well into the project. That said its not too late. I think the ServiceMix project is dying, but if a lot of people in the community step up, and contribute and work on these things, then it can bring value to some users. But I seriously doubt this will happen. PS: 6 years ago I was working as a consultant and looked at the next integration platform for a major Danish organization, and we looked at ServiceMix back then and dismissed it due its JBI nature, and the new OSGi based architecture was only just started. And frankly it has taken a long long time to mature Apache Karaf / Felix / Aries and the other pieces in OSGi to what they are today to offer a stable and sound platform for users to build their integration applications. That was not the case 4-6 years ago. Okay No to ServiceMix - what are my options then? So what should use you instead of ServiceMix? Well in my mind you have at least these two options. 1) Use Apache Karaf and add the pieces you need, such as Camel, CXF, ActiveMQ and build your own ESB. These individual projects have regular releases, and you can upgrade as you need. The ServiceMix project only has the JBI components in additional, that you should NOT use. Only legacy users that got on the old ServiceMix 3.x wagon may need to use this in a graceful upgrade from JBI to Karaf based containers. 2) Take a look at fabric8. IMHO fabric8 is all that value-add the ServiceMix project did not create, and a lot more. James Strachan, just blogged today about some of his thoughts on fabric8, JBoss Fuse, and Karaf. I encourage you to take a read. For example he talks about how fabric becomes poly container, so you have a much wider choice of which containers/JVM to run your integration applications. OSGi is no longer a requirement. (IMHO that is very very existing and potentially a changer). I encourage you to check out fabric8 web-site, and also read the overview and motivation sections of the documentation. And then check out some of the videos. After the upcoming JBoss Fuse 6.1 release, the Fuse team at Red Hat will have more time and focus to bring the documentation at fabric8 up to date covering all the functionality we have (there is a lot more), and as well bring out a 1.0 community released using pure community releases. This gives end users a 100% free to use out of the box release. And users looking for a commercial release can then use JBoss Fuse. Best of both worlds. Summary Okay back to the question - to ServiceMix or not. Then NO. Innovation happens outside ServiceMix, and also more and more outside Apache. If you have thoughts then you can share those in comments to this blog, or better yet, get involved in the discussion forum at the ServiceMix user forum. PPS: The thoughts on this blog is mine alone, and are not any official words from my employer.
February 12, 2014
by Claus Ibsen
· 16,940 Views
article thumbnail
Couchbase .NET SDK 2.0 Development Series: Part 1-1: Server Configuration
This article was originally written by Jeff Morris In the introduction to this series, I discussed some of the motivation for rewriting .NET SDK, the goals, objectives and the major features of the upcoming 2.0 release, and we examined the high-level architecture (10,000 feet view) of a Couchbase Server Client SDK. In this post we will go over the design and development of one of the core configuration components of a Couchbase SDK: Server Configuration. Introduction A Couchbase SDK client requires configuration from two sources: the Client Configuration, which defines the IP of the cluster to connect to, number of connections to use and other important information regarding how the client will interact with the cluster, and the Server Configuration, which defines the current state of the cluster (e.g. number of nodes, buckets that are available, etc.), thus driving the internal state of a client (Cluster Map) This post will only discuss the Server Configuration aspects and will largely revolve around implementing several well-defined interfaces or contracts. HTTP Streaming Configuration Currently, most clients use a “bootstrapping” technique via client configuration and a “Streaming Configuration” exposed by the Couchbase REST API. This is supported by versions of Couchbase from 2.2 and back. The usual approach is as follows: Within the “uris” element of a Client Configuration (semantics very per client), a URL is defined for which to start the bootstrapping process: http://[SERVER]:8091/pools The response is then parsed and the a request is made to get the buckets configuration: http://[SERVER]:8091/pools/default?uuid=[UUID] This response is parsed and another request is made to get streaming URL from: http://[SERVER]:8091/pools/default/buckets?v=[VERSION]&uuid=[UUID] Finally, the streaming URL connection is made which is long-lived and raises events in the client with respect to changes in the cluster: http://[SERVER]:8091/pools/default/bucketsStreaming/default?bucket_uuid=[UUID] The client will then change its internal state to match that of the current server configuration. There are some problems with this approach, among others: The “streaming URL” is resource intensive to create and maintain (mainly memory) on the server-side During a rebalance or failover situation, the cluster configuration may change many, many times. Each time this happens the client must tear down all of its resources (socket connections, VBucket mappings) and build its state up again and again, which can leads to reduced throughput, latency, higher than expected memory and CPU usage, and so on and so forth… Operations that are in-flight may be terminated and then re-tried on a new config state – it’s as if the “carpet has been pulled out from underneath them”. Responding to NOT_MY_VBUCKET responses are handled in-efficiently by simple trying the next node in the list – there is no information to help the client in which node to re-direct the operation to. A New Model for Configuration Management: CCCP While the streaming HTTP “bootstrapping” approach has worked reasonably well for most clients, the downsides have begun to outweigh the plusses, thus a new model for updating client configuration has been defined is available starting with the 2.5 version of the Couchbase Server: Client Cluster Configuration Publication or “CCCP”. CCCP introduces a new operation to be used before or after authentication to request configuration as well as a mechanism for returning configuration information when a NOT_MY_VBUCKET response is returned for a failed operation. In this case CCCP supporting SDK, the client will react by using the configuration to update itself before resending the operation. Note that a NOT_MY_VBUCKET is the standard response that is returned by the cluster when the cluster itself has changed (during a rebalance or failover scenario for example) and the client has not yet “synched” up and is using a stale configuration, resulting in an invalid key mapping. Whereas the “bootstrapping” approach is somewhat of a “pull” type operation, CCCP is either “push” or “pull” depending upon whether the request was initiated by the client (via an explicit CMD_GET_CLUSTER_CONFIG operation) or by the server itself (via a NOT_MY_VBUCKET response to an operation). We will go over CCCP in more detail in a later post. File Based Configuration One other semi-supported configuration option exists: file based configuration. File based configuration is primarily useful for testing and development and we will provide an implementation in the test projects to remove some of the dependencies that are difficult to replicate and or cause false positives when running the test suite. Structural Architecture View Internally the Server Configuration component of the client is a provider based model, in which multiple implementations of a configuration provider can be configured in the client and then a strategy can be chosen to determine which provider should be used. The default is a simple linear, fallback approach where the first configured provider is used and then if it fails the next provider in sequence will take its place. Here is a diagram showing the main actor objects and the relationships with some of other key objects within the client which will be discussed in subsequent posts: A description of each follows: ConfigurationProvider: a source which shall yield a new ConfigInfo. It’s the responsibility of the provider to provide the mechanism for fetching the configuration from its source. ConfigurationInformation: the configuration info contains a list of possible nodes and the VBucket map informing clients about which servers within said nodes a given key should be forwarded to. ConfigurationManager: bridge between the client and the providers and the strategy taken to determine which provider to use and what retry logic to apply. A more detailed document of this architecture can be found here. Please note that this, like all development, is an evolutionary process, so expect some changes and revisions over time. Conclusion and Next Steps This post discussed the history (HTTP Streaming) and the future (CCCP) of Couchbase SDK Server Configuration Management. In the next post we will go into detail the implementation of the HTTP Streaming configuration provider which is required for clients targeting pre-2.5 versions of the Couchbase Server.
February 7, 2014
by Don Pinto
· 3,755 Views
article thumbnail
Java: Handling a RuntimeException in a Runnable
At the end of last year I was playing around with running scheduled tasks to monitor a Neo4j cluster and one of the problems I ran into was that the monitoring would sometimes exit. I eventually realised that this was because a RuntimeException was being thrown inside the Runnable method and I wasn’t handling it. The following code demonstrates the problem: import java.util.ArrayList; import java.util.List; import java.util.concurrent.*; public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } If we run that code we’ll see the RuntimeException but the executor won’t exit because the thread died without informing it: Exception in thread "main" pool-1-thread-1 -> 1391212558074 java.util.concurrent.ExecutionException: java.lang.RuntimeException: game over at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at RunnableBlog.main(RunnableBlog.java:11) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) At the time I ended up adding a try catch block and printing the exception like so: public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } catch (RuntimeException e) { e.printStackTrace(); } } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } This allows the exception to be recognised and as far as I can tell means that the thread executing the Runnable doesn’t die. java.lang.RuntimeException: game over pool-1-thread-1 -> 1391212651955 at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) pool-1-thread-1 -> 1391212652956 java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) pool-1-thread-1 -> 1391212653955 java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) This worked well and allowed me to keep monitoring the cluster. However, I recently started reading ‘Java Concurrency in Practice‘ (only 6 years after I bought it!) and realised that this might not be the proper way of handling the RuntimeException. public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } catch (RuntimeException e) { Thread t = Thread.currentThread(); t.getUncaughtExceptionHandler().uncaughtException(t, e); } } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } I don’t see much difference between the two approaches so it’d be great if someone could explain to me why this approach is better than my previous one of catching the exception and printing the stack trace.
February 6, 2014
by Mark Needham
· 19,605 Views
article thumbnail
How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 1
Learn how to set up a four node Hadoop cluster using AWS EC2, PuTTy(gen), and WinSCP.
January 23, 2014
by Hardik Pandya
· 135,897 Views · 3 Likes
article thumbnail
Top Posts of 2013: Google's Big Data Papers
I’ll review Google’s most important Big Data publications and discuss where they are (as far as they’ve disclosed).
December 30, 2013
by Mikio Braun
· 117,043 Views
article thumbnail
Deconstructing the Azure Point-to-Site VPN for Command Line usage
when configuring an azure virtual network one of the most common things you'll want to do is setup a point-to-site vpn so that you can actually get to your servers to manage and maintain them. azure point-to-site vpns use client certificates to secure connections which can be quite complicated to configure so microsoft has gone the extra mile to make it easy for you to configure and get setup – sadly at the cost of losing the ability to connect through the command line or through powershell – let's change that. current state of play == no command line vpn connections normally when you want to launch a vpn from the cli or powershell in windows you can simply use the following command: rasdial "my home vpn" the azure pre-packaged vpn doesn't allow this because it's really just not a normal vpn. it's something else , something mysterious - not a normal native windows vpn connection. when you run the azure vpn through the command line you get this (you'll see a hint as to why i'd be using azure point-to-site in this screenshot): azure vpns don't appear to support this. if you want to keep your servers behind a private network in azure and use continuous deployment to get your code into production this makes it hard to deploy without a human being around. not really the best case scenario – especially when you remind yourself that automated builds aim to do away with human error altogether. what the azure point-to-site looks like out of the box when you first go to setup a point-to-site vpn into your azure virtual network microsoft points you at a page that walks you through creating a client certificate on your local machine to use as authentication. they then get you to download a package for setting up the azure vpn ras dialler on your local machine. this is accessed from within the azure "networks" page for your virtual network. you install this package and then whenever connecting you're greeted with a connection screen that you might of seen in a previous life. and by seen i don't mean that windows azure virtual networks have been around for ages. but more that the login screen may look familiar. this is because this login screen is a microsoft " connection manager " login screen and has been around for a while. example from technet (note extremely dated bitmap awesomeness): connection manager is used to pre-package vpn and dial up connections for easy-install distribution in a large organisation. this also means we can reconstruct the underlying vpn connection and use it as a normal vpn – claiming back our cli super powers. digging through the details so what we really want to know is: what is this mystical vpn technology the people at microsoft have bestowed upon us? here's how i started getting more information about the implementation: connecting once successfully then disconnect. open it up again to connect and click on properties then clicking on view log you'll then be greeted by something that looks like this: ****************************************************************** operating system : windows nt 6.2 dialler version : 7.2.9200.16384 connection name : my azure virtual network all users/single user : single user start date/time : 24/11/2013, 7:50:31 ****************************************************************** module name, time, log id, log item name, other info for connection type, 0=dial-up, 1=vpn, 2=vpn over dial-up ****************************************************************** [cmdial32] 7:50:31 03 pre-init event callingprocess = c:\windows\system32\cmmon32.exe [cmdial32] 7:50:39 04 pre-connect event connectiontype = 1 [cmdial32] 7:50:39 06 pre-tunnel event username = myclientsslcertificate domain = dunsetting = [obfuscated azure gateway id] tunnel devicename = tunneladdress = [obfuscated azure gateway id].cloudapp.net [cmdial32] 7:50:44 07 connect event [cmdial32] 7:50:44 08 custom action dll actiontype = connect actions description = to update your routing table actionpath = c:\users\doug\appdata\roaming\microsoft\network\connections\cm\[obfuscated azure gateway id]\cmroute.dll returnvalue = 0x0 [cmmon32] 7:56:21 23 external disconnect [cmdial32] 7:56:21 13 disconnect event callingprocess = c:\windows\explorer.exe more importantly you'll see this path included in the connection: within this folder is all the magic connection manager odds and ends. apologies for the [obfuscated], simply the path contains information to my azure endpoint. within this folder you'll see a bunch of files: most importantly there is a pbk file – a personal phonebook. this is what stores the connect settings for the vpn as is a commonly distributed way of sending out connection settings in the enterprise. if you run this on its own you'll actually be able to connect to the vpn directly (without your network routes being updated). this phonebook is where we can steal our settings from to recreate a command line driven connection. setting it up open up the properties of your azure point-to-site vpn phonebook above, and copy the connection address. it will look like this: azuregateway-[guid].cloudapp.net open network sharing centre , and create a new connection. then select connect to a workplace . select that you'll "use my internet connection". then enter your azure point-to-site vpn address and then give your new connection a name. remember this name for later then click create to save your vpn. now open the connection properties for your newly created vpn. this is where we'll use the settings in your azure diallers config to setup your connection. i'll save you the hassle of showing you me copying the settings from one connection to another and instead i'll just focus on what you need to set them to. flick over to the options tab and then click ppp settings . click the 2 missing options enable software compression and negotiate multi-link for single-link connections . set the type of vpn to secure socket tunnelling protocol (sstp), turn on eap and select microsoft: smart card of other certificate as the authentication type. then click on properties . select "use a certificate on this computer", un-tick "connect to these servers", and then select the certificate that uses your azure endpoint uri as its certificate name and then save out. then flick over to the network tab. open tcp/ipv4 then advanced then untick use default gateway on remote network . this setting stops internet traffic going over the vpn while you're connected so you can still surf reddit while managing your azure environment. close the vpn configuration panel. you now have a working vpn connection to azure. when you connect using windows you'll be asked to select the name of the client certificate you'll be authenticating with. you select the certificate you created and uploaded into azure before you setup your connection. when you connect using the command line you don't need to specify your certificate: rasdial "azure vpn" but there's one catch: your local machine's route table doesn't know when to send any traffic to your azure virtual network. the network link is there, but windows doesn't know what to send over your internet link and what to send over the vpn link. you see microsoft did a few things when they packaged your connection manager, and one of these things was to also copy a file called "cmroute.dll" and call this after connection to route your traffic onto your virtual network. this file altered your routing table to route traffic to your virtual network subnets through the vpn connection . we can do the same thing – so lets go about it. what's this about routing... rooting (for the english speakers in the room) my azure virtual network consists of the following network range: 10.0.0.0/8 i also have the following subnets for different machines groups. 10.0.1.0/24 (web servers) 10.0.2.0/24 (application servers) 10.0.3.0/24 (management services) my pptp connections, or point-to-site connections sit on the range: 172.16.0/24 this means that when i connect to the azure vpn i will get an ip address in this range. example: 172.16.0.17 when this happens we need to tell windows to route all traffic going to my 10.0.x.x range ip addresses through the ip address that has been given to us by azure's vpn rras service. you can see your current routing table by entering route print into a command prompt or powershell console. automating the routing additions luckily the windows task scheduler supports event listeners that allow us to watch for vpn connections and run commands off the back of them. take the below powershell script below and save it for arguments sake in c:\scripts\updateroutetableforazurevpn.ps1 ############################################################# # adds ip routes to azure vpn through the point-to-site vpn ############################################################# # define your azure subnets $ips = @("10.0.1.0", "10.0.2.0","10.0.3.0") # point-to-site ip address range # should be the first 4 octets of the ip address '172.16.0.14' == '172.16.0. $azurepptprange = "172.16.0." # find the current new dhcp assigned ip address from azure $azureipaddress = ipconfig | findstr $azurepptprange # if azure hasn't given us one yet, exit and let u know if (!$azureipaddress){ "you do not currently have an ip address in your azure subnet." exit 1 } $azureipaddress = $azureipaddress.split(": ") $azureipaddress = $azureipaddress[$azureipaddress.length-1] $azureipaddress = $azureipaddress.trim() # delete any previous configured routes for these ip ranges foreach($ip in $ips) { $routeexists = route print | findstr $ip if($routeexists) { "deleting route to azure: " + $ip route delete $ip } } # add our new routes to azure virtual network foreach($subnet in $ips) { "adding route to azure: " + $subnet echo "route add $ip mask 255.255.255.0 $azureipaddress" route add $subnet mask 255.255.255.0 $azureipaddress } now execute the following from an elevated command prompt window. this tells windows to add an event listener based task that looks for events to our "azure vpn" connection and if it sees them, it runs our powershell script. schtasks /create /f /tn "vpn connection update" /tr "powershell.exe -noninteractive -command c:\scripts\updateroutetableforazurevpn.ps1" /sc onevent /ec application /mo "*[system[(level=4 or level=0) and (eventid=20225)]] and *[eventdata[data='azure vpn']] " if i then connect to my vpn the above script should execute. after connecting if i check my routing table by entering route print into a console application we have our routes to azure added correctly. we're done! with that we're now able to fully use an azure point-to-site vpn simply from the command line. this means we can use it as part of a build server deployment, or if you're working on it all the time you can simply set it up to connect every time you login to windows . command line usage rasdial "[connection name]" rasdial "[connection name]" /disconnect for my connection named "azure vpn" this command line usage becomes: rasdial "azure vpn" rasdial "azure vpn" /disconnect
November 29, 2013
by Douglas Rathbone
· 10,437 Views
article thumbnail
Securing Docker’s Remote API
One piece to Docker that is interesting AMAZING is the Remote API that can be used to programatically interact with docker. I recently had a situation where I wanted to run many containers on a host with a single container managing the other containers through the API. But the problem I soon discovered is that at the moment when you turn networking on it is an all or nothing type of thing… you can’t turn networking off selectively on a container by container basis. You can disable IPv4 forwarding, but you can still reach the docker remote API on the machine if you can guess the IP address of it. One solution I came up with for this is to use nginx to expose the unix socket for docker over HTTPS and utilize client-side ssl certificates to only allow trusted containers to have access. I liked this setup a lot so I thought I would share how it’s done. Disclaimer: assumes some knowledge of docker! Generate The SSL Certificates We’ll use openssl to generate and self-sign the certs. Since this is for an internal service we’ll just sign it ourselves. We also remove the password from the keys so that we aren’t prompted for it each time we start nginx. # Create the CA Key and Certificate for signing Client Certs openssl genrsa -des3 -out ca.key 4096 openssl rsa -in ca.key -out ca.key # remove password! openssl req -new -x509 -days 365 -key ca.key -out ca.crt # Create the Server Key, CSR, and Certificate openssl genrsa -des3 -out server.key 1024 openssl rsa -in server.key -out server.key # remove password! openssl req -new -key server.key -out server.csr # We're self signing our own server cert here. This is a no-no in production. openssl x509 -req -days 365 -in server.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out server.crt # Create the Client Key and CSR openssl genrsa -des3 -out client.key 1024 openssl rsa -in client.key -out client.key # no password! openssl req -new -key client.key -out client.csr # Sign the client certificate with our CA cert. Unlike signing our own server cert, this is what we want to do. openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out client.crt Another option may be to leave the passphrase in and provide it as an environment variable when running a docker container or through some other means as an extra layer of security. We’ll move ca.crt, server.key and server.crt to /etc/nginx/certs. Setup Nginx The nginx setup for this is pretty straightforward. We just listen for traffic on localhost on port 4242. We require client-side ssl certificate validation and reference the certificates we generated in the previous step. And most important of all, set up an upstream proxy to the docker unix socket. I simply overwrote what was already in /etc/nginx/sites-enabled/default. upstream docker { server unix:/var/run/docker.sock fail_timeout=0; } server { listen 4242; server localhost; ssl on; ssl_certificate /etc/nginx/certs/server.crt; ssl_certificate_key /etc/nginx/certs/server.key; ssl_client_certificate /etc/nginx/certs/ca.crt; ssl_verify_client on; access_log on; error_log /dev/null; location / { proxy_pass http://docker; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; client_max_body_size 10m; client_body_buffer_size 128k; proxy_connect_timeout 90; proxy_send_timeout 120; proxy_read_timeout 120; proxy_buffer_size 4k; proxy_buffers 4 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; } } One important piece to make this work is you should add the user nginx runs as to the docker group so that it can read from the socket. This could be www-data, nginx, or something else! Hack It Up! With this setup and nginx restarted, let’s first run a curl command to make sure that this setup correctly. First we’ll make a call without the client cert to double check that we get denied access then a proper one. # Is normal http traffic denied? curl -v http://localhost:4242/info # How about https, sans client cert and key? curl -v -s -k https://localhost:4242/info # And the final good request! curl -v -s -k --key client.key --cert client.crt https://localhost:4242/info For the first two we should get some run of the mill 400 http response codes before we get a proper JSON response from the final command! Woot! But wait there’s more… let’s build a container that can call the service to launch other containers! For this example we’ll simply build two containers: one that has the client certificate and key and one that doesn’t. The code for these examples are pretty straightforward and to save space I’ll leave the untrusted container out. You can view the untrusted container on github (although it is nothing exciting). First, the node.js application that will connect and display information: https = require 'https' fs = require 'fs' options = host: 172.42.1.62 port: 4242 method: 'GET' path: '/containers/json' key: fs.readFileSync('ssl/client.key') cert: fs.readFileSync('ssl/client.crt') headers: { 'Accept': 'application/json'} # not required, but being semantic here! req = https.request options, (res) -> console.log res req.end() And the Dockerfile used to build the container. Notice we add the client.crt and client.key as part of building it! FROM shykes/nodejs MAINTAINER James R. Carr ADD ssl/client* /srv/app/ssl ADD package.json /srv/app/package.json ADD app.coffee /srv/app/app.coffee RUN cd /srv/app && npm install . CMD cd /srv/app && npm start That’s about it. Run docker build . and docker run -n >IMAGE ID< and we should see a json dump to the console of the actively running containers. Doing the same in the untrusted directory should present us with some 400 error about not providing a client ssl certificate. I’ve shared a project with all this code plus a vagrant file on github for your own prusual. Enjoy!
October 31, 2013
by James Carr
· 14,313 Views
article thumbnail
JMS-style selectors on Amazon SQS with Apache Camel
This blog post demonstrates how easy it is to use Apache Camel and its new json-path component along with the camel-sqs component to produce and consume messages on Amazon SQS. Amazon Web Services SQS is a message queuing “software as a service” (SaaS) in the cloud. To be able to use it, you need to sign up for AWS. It’s primary access mechanism is XML over HTTP through various AWS SDK clients provided by Amazon. Please check out the SQS documentation for more. And as “luck” would have it, one of the users in the Apache Camel community created a component to be able to integrate with SQS. This makes it trivial to add a producer or consumer to an SQS queue and plugs in nicely with the Camel DSL. SQS, however, is not a “one-size fits all” queueing service; you must be aware of your use case and make sure it fits (current requirements as well as somewhat into the future…). There are limitations that, if not studied and accounted for ahead of time, could come back to sink your project. An example of a viable alternative, and one that more closely fits the profile of a high performance and full featured message queue is Apache ActiveMQ. For example, one limitation to keep in mind is that unlike traditional JMS consumers, you cannot create a subscription to a queue that filters messages based on some predicate (at least not using the AWS-SQS API — you’d have to build that into your solution). Some other things to keep in mind when using SQS: The queue does not preserve FIFO messaging That is, message order is not preserved. They can arrive out of order from when they were sent. Apache Camel can help with its resequencer pattern. Bilgin Ibryam, now a colleague of mine at Red Hat, has written a great blog post about how to restore message order using the resequencer pattern. Message size is limited to 256K This is probably sufficient, but if your message sizes are variable, or contain more data that 256K, you will have to chunk them and send in smaller chunks. No selector or selective consumption If you’re familiar with JMS, you know that you can specify consumers to use a “selector” or a predicate expression that is evaluated on the broker side to determine whether or not a specific message should be dispatched to a specific consumer. For example, Durability constraints Some use cases call for the message broker to store messages until consumers return. SQS allows a limit of up to 14 days. This is most likely sufficient, but something to keep in mind. Binary payloads not allowed SQS only allows text-based messages, e.g., XML, JSON, fixed format text, etc. Binary such as Avro, Protocol Buffers, or Thrift are not allowed. For some of these limitations, you can work around them by building out the functionality yourself. I would always recommend taking a look at how an integration library like Apache Camel can help — which has out-of-the-box support for doing some of these things. Doing JMS-style selectors So the basic problem is we want to subscribe to a SQS queue, but we want to filter which messages we process. For those messages that we do not process, those should be left in the queue. To do this, we will make use of Apache Camel’s Filter EIP as well as the visibility timeouts available on the SQS queue. By default, SQS will dispatch all messages in its queue when it’s queried. We cannot change this, and thus not avoid the message being dispatched to us — we’ll have to do the filtering on our side (this is different than how a full-featured broker like ActiveMQ does it, i.e., filtering is done on the broker side so the consumer doesn’t even see the message it does not want to see). Once SQS dispatches a message, it does not remove it from the queue unless the consumer has acknowledged that it has it and is finished with it. The consumer does this by sending a DeleteMessage command. Until the DeleteMessage command is sent, the message is always in the queue, however visibility comes in to play here. When a message is dispatched to a consumer, there is a period of time which it will not be visible to other consumers. So if you browsed the queue, you would not see it (it should appear in the stats as “in-flight”). However, there is a configurable period of time you can specify for how long this “visibility timeout” should be active. So if you set the visibility to a lower time period (default is 30 seconds), you can more quickly get messages re-dispatched to consumers that would be able to handle the message. Take a look at the following Camel route which does just that: @Override public void configure() throws Exception { // every two seconds, send a message to the "demo" queue in SQS from("timer:kickoff?period=5000") .setBody().method(this, "generateJsonString") .to("aws-sqs://demo?amazonSQSClient=#sqsClient&defaultVisibilityTimeout=2"); } In the above Camel Route, we create a new message every 5 seconds and send it to an SQS queue named demo — note we set the defaultVisibilityTimeout to 2 seconds. This means that after a message gets dispatched to a consumer, SQS will wait about 2 seconds before considering it eligible to be dispatched to another consumer if it has not been deleted. On the consumer side, we take advantage of a couple Apache Camel conveniences Using JSON Path + Filter EIP Camel has an excellent new component named JSON-Path. Claus Ibsen tweeted about it when he hacked it up. This allows you to do Content-Based Routing on a JSON payload very easily by using XPath-style expressions to pick out and evaluate attributes in a JSON encoded object. So in the following example, we can test an attribute named ‘type’ to be equal to ‘LOGIN’ and use Camel’s Filter EIP to allow only those messages that match to go through and continue processing: public class ConsumerRouteBuilder extends RouteBuilder { @Override public void configure() throws Exception { from("aws-sqs://demo?amazonSQSClient=#sqsClient&deleteIfFiltered=false") .setHeader("identity").jsonpath("$['type']") .filter(simple("${header.identity} == 'login'")) .log("We have a message! ${body}") .to("file:target/output?fileName=login-message-${date:now:MMDDyy-HHmmss}.json"); } } To complete the functionality, we have to pay attention to a new configuration option added for the Camel-SQS component: deleteIfFiltered — Whether or not to send the DeleteMessage to the SQS queue if an exchange fails to get through a filter. If ‘false’ and exchange does not make it through a Camel filter upstream in the route, then don’t send DeleteMessage. By default, Camel will send the “DeleteMessage” command to SQS after a route has completed successfully (without an exception). However, in this case, we are specifying to not send the DeleteMessage command if the message had been previously filtered by Camel. This example demonstrates how easy it is to use Apache Camel and its new json-path component along with the camel-sqs component to produce and consume messages on Amazon SQS. Please take a look at the source code on my github repo to play with the live code and try it out yourself.
October 28, 2013
by Christian Posta
· 12,093 Views
article thumbnail
Examples of the Windows Azure Storage Services REST API
The examples in this post were updated in September to work with the current version of the Windows Azure Storage REST API. In the Windows Azure MSDN Azure Forum there are occasional questions about the Windows Azure Storage Services REST API. I have occasionally responded to these with some code examples showing how to use the API. I thought it would be useful to provide some examples of using the REST API for tables, blobs and queues – if only so I don’t have to dredge up examples when people ask how to use it. This post is not intended to provide a complete description of the REST API. The REST API is comprehensively documented (other than the lack of working examples). Since the REST API is the definitive way to address Windows Azure Storage Services I think people using the higher level Storage Client API should have a passing understanding of the REST API to the level of being able to understand the documentation. Understanding the REST API can provide a deeper understanding of why the Storage Client API behaves the way it does. Fiddler The Fiddler Web Debugging Proxy is an essential tool when developing using the REST (or Storage Client) API since it captures precisely what is sent over the wire to the Windows Azure Storage Services. Authorization Nearly every request to the Windows Azure Storage Services must be authenticated. The exception is access to blobs with public read access. The supported authentication schemes for blobs, queues and tables and these are described here. The requests must be accompanied by an Authorization header constructed by making a hash-based message authentication code using the SHA-256 hash. The following is an example of performing the SHA-256 hash for the Authorization header: public static String CreateAuthorizationHeader(String canonicalizedString) { String signature = String.Empty; using (HMACSHA256 hmacSha256 = new HMACSHA256( Convert.FromBase64String(storageAccountKey) )) { Byte[] dataToHmac = System.Text.Encoding.UTF8.GetBytes(canonicalizedString); signature = Convert.ToBase64String(hmacSha256.ComputeHash(dataToHmac)); } String authorizationHeader = String.Format( CultureInfo.InvariantCulture, "{0} {1}:{2}", AzureStorageConstants.SharedKeyAuthorizationScheme, AzureStorageConstants.Account, signature ); return authorizationHeader; } This method is used in all the examples in this post. AzureStorageConstants is a helper class containing various constants. Key is a secret key for Windows Azure Storage Services account specified by Account. In the examples given here, SharedKeyAuthorizationScheme is SharedKey. The trickiest part in using the REST API successfully is getting the correct string to sign. Fortunately, in the event of an authentication failure the Blob Service and Queue Service responds with the authorization string they used and this can be compared with the authorization string used in generating the Authorization header. This has greatly simplified the us of the REST API. Table Service API The Table Service API supports the following table-level operations: Create Table Delete Table Query Tables The Table Service API supports the following entity-level operations: Delete Entity Insert Entity Merge Entity Update Entity Query Entities These operations are implemented using the appropriate HTTP VERB: DELETE – delete GET – query MERGE – merge POST – insert PUT – update This section provides examples of the Insert Entity and Query Entities operations. Insert Entity The InsertEntity() method listed in this section inserts an entity with two String properties, Artist and Title, into a table. The entity is submitted as an ATOM entry in the body of a request POSTed to the Table Service. In this example, the ATOM entry is generated by the GetRequestContentInsertXml() method. The date must be in RFC 1123 format in the x-ms-date header supplied to the canonicalized resource used to create the Authorization string. Note that the storage service version is set to “2012-02-12″ which requires the DataServiceVersion and MaxDataServiceVersion to be set appropriately. public void InsertEntity(String tableName, String artist, String title) { String requestMethod = "POST"; String urlPath = tableName; String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String contentMD5 = String.Empty; String contentType = "application/atom+xml"; String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n{1}\n{2}\n{3}\n{4}", requestMethod, contentMD5, contentType, dateInRfc1123Format, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] content = utf8Encoding.GetBytes(GetRequestContentInsertXml(artist, title)); Uri uri = new Uri(AzureStorageConstants.TableEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Accept = "application/atom+xml,application/xml"; request.ContentLength = content.Length; request.ContentType = contentType; request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Headers.Add("Accept-Charset", "UTF-8"); request.Headers.Add("DataServiceVersion", "2.0;NetFx"); request.Headers.Add("MaxDataServiceVersion", "2.0;NetFx"); using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(content, 0, content.Length); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } private String GetRequestContentInsertXml(String artist, String title) { String defaultNameSpace = "http://www.w3.org/2005/Atom"; String dataservicesNameSpace = "http://schemas.microsoft.com/ado/2007/08/dataservices"; String metadataNameSpace = "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"; XmlWriterSettings xmlWriterSettings = new XmlWriterSettings(); xmlWriterSettings.OmitXmlDeclaration = false; xmlWriterSettings.Encoding = Encoding.UTF8; StringBuilder entry = new StringBuilder(); using (XmlWriter xmlWriter = XmlWriter.Create(entry)) { xmlWriter.WriteProcessingInstruction("xml", "version=\"1.0\" encoding=\"UTF-8\""); xmlWriter.WriteWhitespace("\n"); xmlWriter.WriteStartElement("entry", defaultNameSpace); xmlWriter.WriteAttributeString("xmlns", "d", null, dataservicesNameSpace); xmlWriter.WriteAttributeString("xmlns", "m", null, metadataNameSpace); xmlWriter.WriteElementString("title", null); xmlWriter.WriteElementString("updated", String.Format("{0:o}", DateTime.UtcNow)); xmlWriter.WriteStartElement("author"); xmlWriter.WriteElementString("name", null); xmlWriter.WriteEndElement(); xmlWriter.WriteElementString("id", null); xmlWriter.WriteStartElement("content"); xmlWriter.WriteAttributeString("type", "application/xml"); xmlWriter.WriteStartElement("properties", metadataNameSpace); xmlWriter.WriteElementString("PartitionKey", dataservicesNameSpace, artist); xmlWriter.WriteElementString("RowKey", dataservicesNameSpace, title); xmlWriter.WriteElementString("Artist", dataservicesNameSpace, artist); xmlWriter.WriteElementString("Title", dataservicesNameSpace, title + "\n" + title); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.Close(); } String requestContent = entry.ToString(); return requestContent; } This generates the following request (as captured by Fiddler): POST https://STORAGE_ACCOUNT.table.core.windows.net/authors HTTP/1.1 Accept: application/atom+xml,application/xml Content-Type: application/atom+xml x-ms-date: Sun, 08 Sep 2013 06:31:12 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:w7Uu4wHZx4fFwa2bsxd/TJVZZ1AqMPwxvW+pYtoWHd0= Accept-Charset: UTF-8 DataServiceVersion: 2.0;NetFx MaxDataServiceVersion: 2.0;NetFx Host: STORAGE_ACCOUNT.table.core.windows.net Content-Length: 514 Expect: 100-continue Connection: Keep-Alive The body of the request is: 2013-09-08T07:19:07Z Beckett Molloy 2013-09-08T07:19:07.2189243Z Beckett Molloy Molloy Note that I should have URLEncoded the PartitionKey and RowKey but did not do so for simplicity. There are, in fact, some issues with the URL encoding of spaces and other symbols. Get Entity The GetEntity() method described in this section retrieves the single entity inserted in the previous section. The particular entity to be retrieved is identified directly in the URL. public void GetEntity(String tableName, String partitionKey, String rowKey) { String requestMethod = "GET"; String urlPath = String.Format("{0}(PartitionKey='{1}',RowKey='{2}')", tableName, partitionKey, rowKey); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n{2}", requestMethod, dateInRfc1123Format, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.TableEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Headers.Add("Accept-Charset", "UTF-8"); request.Accept = "application/atom+xml,application/xml"; request.Headers.Add("DataServiceVersion", "2.0;NetFx"); request.Headers.Add("MaxDataServiceVersion", "2.0;NetFx"); using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } This generates the following request (as captured by Fiddler): GET https://STORAGE_ACCOUNT.table.core.windows.net/authors(PartitionKey='Beckett',RowKey='Molloy') HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:31:14 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:1hWbr4aNq4JWCpNJY3rsLH1SkIyeFTJflbqyKMPQ1Gk= Accept-Charset: UTF-8 Accept: application/atom+xml,application/xml DataServiceVersion: 2.0;NetFx MaxDataServiceVersion: 2.0;NetFx Host: STORAGE_ACCOUNT.table.core.windows.net The Table Service generates the following response: HTTP/1.1 200 OK Cache-Control: no-cache Content-Type: application/atom+xml;charset=utf-8 ETag: W/"datetime'2013-09-08T06%3A31%3A14.1579056Z'" Server: Windows-Azure-Table/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: f4bd4c77-6fb6-42a8-8dff-81ea8d28fa2e x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:31:15 GMT Content-Length: 1108 The returned entities, in this case a single entity, are returned in ATOM entry format in the response body: https://STORAGE_ACCOUNT.table.core.windows.net/authors(PartitionKey='Beckett',RowKey='Molloy') 2013-09-08T06:31:15Z Beckett Molloy 2013-09-08T06:31:14.1579056Z Beckett Molloy Molloy Blob Service API The Blob Service API supports the following account-level operation: List Containers The Blob Service API supports the following container-level operation: Create Container Delete Container Get Container ACL Get Container Properties Get Container Metadata List Blobs Set Container ACL Set Container Metadata The Blob Service API supports the following blob-level operation: Copy Blob Delete Blob Get Blob Get Blob Metadata Get Blob Properties Lease Blob Put Blob Set Blob Metadata Set Blob Properties Snapshot Blob The Blob Service API supports the following operations on block blobs: Get Block List Put Block Put Block List The Blob Service API supports the following operations on page blobs: Get Page Regions Put Page This section provides examples of the Put Blob and Lease Blob operations. Put Blob The Blob Service and Queue Service use a different form of shared-key authentication from the Table Service so care should be taken in creating the string to be signed for authorization. The blob type, BlockBlob or PageBlob, must be specified as a request header and consequently appears in the authorization string. public void PutBlob(String containerName, String blobName) { String requestMethod = "PUT"; String urlPath = String.Format("{0}/{1}", containerName, blobName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String content = "Andrew Carnegie was born in Dunfermline"; UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] blobContent = utf8Encoding.GetBytes(content); Int32 blobLength = blobContent.Length; const String blobType = "BlockBlob"; String canonicalizedHeaders = String.Format( "x-ms-blob-type:{0}\nx-ms-date:{1}\nx-ms-version:{2}", blobType, dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, blobLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.BlobEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-blob-type", blobType); request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = blobLength; using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(blobContent, 0, blobLength); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String ETag = response.Headers["ETag"]; } } This generates the following request: PUT https://STORAGE_ACCOUNT.blob.core.windows.net/fife/dunfermline HTTP/1.1 x-ms-blob-type: BlockBlob x-ms-date: Sun, 08 Sep 2013 06:28:29 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:ntvh/lamVmikvwHhy6vRVBIh87kibkPlEOiHyLDia6g= Host: STORAGE_ACCOUNT.blob.core.windows.net Content-Length: 39 Expect: 100-continue Connection: Keep-Alive The body of the request is: Andrew Carnegie was born in Dunfermline The Blob Service generates the following response: HTTP/1.1 201 Created Transfer-Encoding: chunked Content-MD5: RYJnWGXLyt94l5jG82LjBw== Last-Modified: Sun, 08 Sep 2013 06:28:31 GMT ETag: "0x8D07A73C5704A86" Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: b74ef0a2-294d-4581-b8f1-6cda724bbdbf x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:28:30 GMT Lease Blob The Blob Service allows a user to lease a blob for a minute at a time and so acquire a write lock on it. The use case for this is the locking of a page blob used to store the VHD backing an writeable Azure Drive. The LeaseBlob() example in this section demonstrates a subtle issue with the creation of authorization strings. The URL has a query string, comp=lease. Rather than using this directly in creating the authorization string it must be converted into comp:lease with a colon replacing the equal symbol – see modifiedURL in the example. Furthermore, the Lease Blob operation requires the use of an x-ms-lease-action to indicate whether the lease is being acquired, renewed, released or broken. public void LeaseBlob(String containerName, String blobName) { String requestMethod = "PUT"; String urlPath = String.Format("{0}/{1}?comp=lease", containerName, blobName); String modifiedUrlPath = String.Format("{0}/{1}\ncomp:lease", containerName, blobName); const Int32 contentLength = 0; String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String leaseAction = "acquire"; String leaseDuration = "60"; String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-lease-action:{1}\nx-ms-lease-duration:{2}\nx-ms-version:{3}", dateInRfc1123Format, leaseAction, leaseDuration, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, modifiedUrlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, contentLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.BlobEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-lease-action", leaseAction); request.Headers.Add("x-ms-lease-duration", leaseDuration); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = contentLength; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String leaseId = response.Headers["x-ms-lease-id"]; } } This generates the following request: PUT https://STORAGE_ACCOUNT.blob.core.windows.net/fife/dunfermline?comp=lease HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:28:31 GMT x-ms-lease-action: acquire x-ms-lease-duration: 60 x-ms-version: 2012-02-12 Authorization: SharedKey rebus:+SQ5+RFZg3hUaws5XCRHxsDgXb1ycdRIz5EKyHJWP7s= Host: rebus.blob.core.windows.net Content-Length: 0 The Blob Service generates the following response: HTTP/1.1 201 Created Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: 4b6ff77f-f885-4f74-803a-c92920d225c3 x-ms-version: 2012-02-12 x-ms-lease-id: b1320c2c-65ad-41d6-a7bd-85a4242c0ac5 Date: Sun, 08 Sep 2013 06:28:31 GMT Content-Length: 0 Queue Service API The Queue Service API supports the following queue-level operation: List Queues The Queue Service API supports the following queue-level operation: Create Queue Delete Queue Get Queue Metadata Set Queue Metadata The Queue Service API supports the following message-level operations: Clear Messages Delete Message Get Messages Peek Messages Put Message This section provides examples of the Put Message and Get Message operations. Put Message The most obvious curiosity about Put Message is that it uses the HTTP verb POST rather than PUT. The issue is presumably the interaction of the English language and the HTTP standard which states that PUT should be idempotent and that the Put Message operation is clearly not since each invocation merely adds another message to the queue. Regardless, it did catch me out when I failed to read the documentation well enough – so take that as a warning. The content of a message posted to the queue must be formatted in a specified XML schema and must then be UTF8 encoded. public void PutMessage(String queueName, String message) { String requestMethod = "POST"; String urlPath = String.Format("{0}/messages", queueName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String messageText = String.Format( "{0}", message); UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] messageContent = utf8Encoding.GetBytes(messageText); Int32 messageLength = messageContent.Length; String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-version:{1}", dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, messageLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.QueueEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = messageLength; using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(messageContent, 0, messageLength); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String requestId = response.Headers["x-ms-request-id"]; } } This generates the following request: POST https://rebus.queue.core.windows.net/revolution/messages HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:34:08 GMT x-ms-version: 2012-02-12 Authorization: SharedKey rebus:nyASTVWifnxHKnj2wXwuzzzXz5CxUBZj58SToV5QFK8= Host: rebus.queue.core.windows.net Content-Length: 76 Expect: 100-continue Connection: Keep-Alive The body of the request is: Saturday in the cafe The Queue Service generates the following response: HTTP/1.1 201 Created Server: Windows-Azure-Queue/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: 14c6e73b-15d9-480c-b251-c4c01b48e529 x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:34:09 GMT Content-Length: 0 Get Messages The Get Messages operation described in this section retrieves a single message with the default message visibility timeout of 30 seconds. public void GetMessage(String queueName) { string requestMethod = "GET"; String urlPath = String.Format("{0}/messages", queueName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-version:{1}", dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n\n\n\n\n\n\n\n\n\n{1}\n{2}", requestMethod, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.QueueEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Accept = "application/atom+xml,application/xml"; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } This generates the following request: GET https://rebus.queue.core.windows.net/revolution/messages HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:34:11 GMT x-ms-version: 2012-02-12 Authorization: SharedKey rebus:K67XooYhokw0i0AlCzYQ4GeLLrJih1r1vSqiO9DBo0c= Accept: application/atom+xml,application/xml Host: rebus.queue.core.windows.net The Queue Service generates the following response: HTTP/1.1 200 OK Content-Type: application/xml Server: Windows-Azure-Queue/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: efb21a86-7d66-47fd-b13d-7aa74fce0568 x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:34:12 GMT Content-Length: 484 The message is returned in the response body as follows: 05fd902f-6031-4ef4-8298-ef3844ec3bc6Sun, 08 Sep 2013 06:34:11 GMTSun, 15 Sep 2013 06:34:11 GMT1AgAAAAMAAAAAAAAAAL+zgF2szgE=Sun, 08 Sep 2013 06:34:43 GMTSaturday in the cafe I noticed that some newline specifiers in strings (\n) were lost when the blog was auto-ported from Windows Live Spaces to WordPress. I have put them back in but it is possible I missed some. Consequently, in the event of a problem you should check the newlines in canonicalizedHeaders and stringToSign.
October 24, 2013
by Neil Mackenzie
· 38,787 Views
article thumbnail
ElasticSearch: Java API
ElasticSearch provides Java API, thus it executes all operations asynchronously by using client object.
September 30, 2013
by Hüseyin Akdoğan DZone Core CORE
· 137,562 Views · 4 Likes
  • Previous
  • ...
  • 273
  • 274
  • 275
  • 276
  • 277
  • 278
  • 279
  • 280
  • 281
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×