DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Cloud Architecture Topics

article thumbnail
Top 10 Causes of Java EE Enterprise Performance Problems
Performance problems are one of the biggest challenges to expect when designing and implementing Java EE related technologies.
June 20, 2012
by Pierre - Hugues Charbonneau
· 274,068 Views · 20 Likes
article thumbnail
Infographics: Cloud Computing and History
infographic: clouds computing and history i have prepared three new infographics for you;aall of them related with cloud computing. these infographics will tell you about history of cloud computing, its definition, and who needs this cloud. i think that this will be interesting for you. information graphics (known as infographics) are one of the best ways to transfer some information into a reader’s mind. it can be something new, or other useful information gathered in one place. nowadays many people don’t have enough time to read a lot of text on multiple screens. infographics makes the information intuitive and understandable. that’s why we would like to share the best relevant infographics from all over the web. original source: cloud computing by the small business authority original source: a complete history of cloud computing original source: hosting decisions, from the chalkboard
June 6, 2012
by Andrei Prikaznov
· 11,879 Views
article thumbnail
The Limited Usefulness of AsyncContext.start()
Some time ago I came across What's the purpose of AsyncContext.start(...) in Servlet 3.0? question. Quoting the Javadoc of aforementioned method: Causes the container to dispatch a thread, possibly from a managed thread pool, to run the specified Runnable. To remind all of you, AsyncContext is a standard way defined in Servlet 3.0 specification to handle HTTP requests asynchronously. Basically HTTP request is no longer tied to an HTTP thread, allowing us to handle it later, possibly using fewer threads. It turned out that the specification provides an API to handle asynchronous threads in a different thread pool out of the box. First we will see how this feature is completely broken and useless in Tomcat and Jetty - and then we will discuss why the usefulness of it is questionable in general. Our test servlet will simply sleep for given amount of time. This is a scalability killer in normal circumstances because even though sleeping servlet is not consuming CPU, but sleeping HTTP thread tied to that particular request consumes memory - and no other incoming request can use that thread. In our test setup I limited the number of HTTP worker threads to 10 which means only 10 concurrent requests are completely blocking the application (it is unresponsive from the outside) even though the application itself is almost completely idle. So clearly sleeping is an enemy of scalability. @WebServlet(urlPatterns = Array("/*")) class SlowServlet extends HttpServlet with Logging { protected override def doGet(req: HttpServletRequest, resp: HttpServletResponse) { logger.info("Request received") val sleepParam = Option(req.getParameter("sleep")) map {_.toLong} TimeUnit.MILLISECONDS.sleep(sleepParam getOrElse 10) logger.info("Request done") } } Benchmarking this code reveals that the average response times are close to sleep parameter as long as the number of concurrent connections is below the number of HTTP threads. Unsurprisingly the response times begin to grow the moment we exceed the HTTP threads count. Eleventh connection has to wait for any other request to finish and release worker thread. When the concurrency level exceeds 100, Tomcat begins to drop connections - too many clients are already queued. So what about the the fancy AsyncContext.start() method (do not confuse with ServletRequest.startAsync())? According to the JavaDoc I can submit any Runnable and the container will use some managed thread pool to handle it. This will help partially as I no longer block HTTP worker threads (but still another thread somewhere in the servlet container is used). Quickly switching to asynchronous servlet: @WebServlet(urlPatterns = Array("/*"), asyncSupported = true) class SlowServlet extends HttpServlet with Logging { protected override def doGet(req: HttpServletRequest, resp: HttpServletResponse) { logger.info("Request received") val asyncContext = req.startAsync() asyncContext.setTimeout(TimeUnit.MINUTES.toMillis(10)) asyncContext.start(new Runnable() { def run() { logger.info("Handling request") val sleepParam = Option(req.getParameter("sleep")) map {_.toLong} TimeUnit.MILLISECONDS.sleep(sleepParam getOrElse 10) logger.info("Request done") asyncContext.complete() } }) } } We are first enabling the asynchronous processing and then simply moving sleep() into a Runnable and hopefully a different thread pool, releasing the HTTP thread pool. Quick stress test reveals slightly unexpected results (here: response times vs. number of concurrent connections): Guess what, the response times are exactly the same as with no asynchronous support at all (!) After closer examination I discovered that when AsyncContext.start() is called Tomcat submits given task back to... HTTP worker thread pool, the same one that is used for all HTTP requests! This basically means that we have released one HTTP thread just to utilize another one milliseconds later (maybe even the same one). There is absolutely no benefit of calling AsyncContext.start() in Tomcat. I have no idea whether this is a bug or a feature. On one hand this is clearly not what the API designers intended. The servlet container was suppose to manage separate, independent thread pool so that HTTP worker thread pool is still usable. I mean, the whole point of asynchronous processing is to escape the HTTP pool. Tomcat pretends to delegate our work to another thread, while it still uses the original worker thread pool. So why I consider this to be a feature? Because Jetty is "broken" in exactly same way... No matter whether this works as designed or is only a poor API implementation, using AsyncContext.start() in Tomcat and Jetty is pointless and only unnecessarily complicates the code. It won't give you anything, the application works exactly the same under high load as if there was no asynchronous logic at all. But what about using this API feature on correct implementations like IBM WAS? It is better, but still the API as is doesn't give us much in terms of scalability. To explain again: the whole point of asynchronous processing is the ability to decouple HTTP request from an underlying thread, preferably by handling several connections using the same thread. AsyncContext.start() will run the provided Runnable in a separate thread pool. Your application is still responsive and can handle ordinary requests while long-running request that you decided to handle asynchronously are processed in a separate thread pool. It is better, unfortunately the thread pool and thread per connection idiom is still a bottle-neck. For the JVM it doesn't matter what type of threads are started - they still occupy memory. So we are no longer blocking HTTP worker threads, but our application is not more scalable in terms of concurrent long-running tasks we can support. In this simple and unrealistic example with sleeping servlet we can actually support thousand of concurrent (waiting) connections using Servlet 3.0 asynchronous support with only one extra thread - and without AsyncContext.start(). Do you know how? Hint: ScheduledExecutorService. Postscriptum: Scala goodness I almost forgot. Even though examples were written in Scala, I haven't used any cool language features yet. Here is one: implicit conversions. Make this available in your scope: implicit def blockToRunnable[T](block: => T) = new Runnable { def run() { block } } And suddenly you can use code block instead of instantiating Runnable manually and explicitly: asyncContext start { logger.info("Handling request") val sleepParam = Option(req.getParameter("sleep")) map { _.toLong} TimeUnit.MILLISECONDS.sleep(sleepParam getOrElse 10) logger.info("Request done") asyncContext.complete() } Sweet!
May 22, 2012
by Tomasz Nurkiewicz
· 17,534 Views · 1 Like
article thumbnail
Virtualization in WPF with VirtualizingStackPanel
First blogged about this on my previous blog site here: http://consultingblogs.emc.com/merrickchaffer/archive/2011/02/14/virtualization-in-wpf-with-virtualizingstackpanel.aspx However, having come across this again today on a project, I thought it was important enough to re-blog! Finally managed to figure out how to get virtualization to actually behave itself in a listbox wpf control. Turns out that in order for Virtualization to work, you need three things satisfied. Use a control that supports virtualization (e.g. list box or list view). (see Controls That Implement Performance Features section at bottom of this page for more info http://msdn.microsoft.com/en-us/library/cc716879.aspx#Controls ) Ensure that the ScrollViewer.CanContentScroll attached property is set to True on the containing list box / list view control. Ensure that either the list box has a height set, or that it is contained within a parent Grid row, where that row definition has a height set (Height="*" will do if you want it to occupy the Client window height). Note: Do not use height=”Auto” as this will not work, as this instructs WPF to simply size the row to the height needed to fit all the items of the list box in, hence you do not get the vertical scroll bar appearing. Ensure that there is no wrapping ScrollViewer control around the list box, as this will prevent virtualization from occuring. Ensure that you use a VirtualizingStackPanel in the ItemsPanelTemplate for the ListBox.ItemsPanel Example
May 14, 2012
by Merrick Chaffer
· 28,278 Views
article thumbnail
Managing and Monitoring Drupal Sites on Windows Azure
A few weeks ago, I co-authored an article (with my colleague Rama Ramani) about how the Screen Actors Guild Awards website migrated its Drupal deployment from LAMP to Windows Azure: Azure Real World: Migrating a Drupal Site from LAMP to Windows Azure. Since then, Rama and another colleague, Jason Roth, have been working on writing up how the SAG Awards website was managed and monitored in Windows Azure. The article below is the fruit of their work…a very interesting/educational read. Overview Drupal is an open source content management system that runs on PHP. Windows Azure offers a flexible platform for hosting, managing, and scaling Drupal deployments. This paper focuses on an approach to host Drupal sites on Windows Azure, based on learning from a BPD Customer Programs Design Win engagement with the Screen Actors Guild Awards Drupal website. This paper covers guidelines and best practices for managing an existing Drupal web site in Windows Azure. For more information on how to migrate Drupal applications to Windows Azure, see Azure Real World: Migrating a Drupal Site from LAMP to Windows Azure. The target audience for this paper is Drupal administrators who have some exposure to Windows Azure. More detailed pointers to Windows Azure content is provided throughout the paper as links. Drupal Application Architecture on Windows Azure Before reviewing the management and monitoring guidelines, it is important to understand the architecture of a typical Drupal deployment on Windows Azure. First, the following diagram displays the basic architecture of Drupal running on Windows and IIS7. In the Windows Server scenario, you could have one or more machines hosting the web site in a farm. Those machines would either persist the site content to the file system or point to other network shares. For Windows Azure, the basic architecture is the same, but there are some differences. In Windows Azure the site is hosted on a web role. A web role instance is hosted on a Windows Server 2008 virtual machine within the Windows Azure datacenter. Like the web farm, you can have multiple instances running the site. But there is no persistence guarantee for the data on the file system. Because of this, much of the shared site content should be stored in Windows Azure Blob storage. This allows them to be highly available and durable. Usually, a large portion of the site caters to static content which lends well to caching. And caching can be applied in a set of places – browser level caching, CDN to cache content in the edge closer to the browser clients, caching in Azure to reduce the load on backend, etc. Finally, the database can be located in SQL Azure. The following diagram shows these differences. For monitoring and management, we will look at Drupal on Windows Azure from three perspectives: Availability: Ensure the web site does not go down and that all tiers are setup correctly. Apply best practices to ensure that the site is deployed across data centers and perform backup operations regularly. Scalability: Correctly handle changes in user load. Understand the performance characteristics of the site. Manageability: Correctly handle updates. Make code and site changes with no downtime when possible. Although some management tasks span one or more of these categories, it is still helpful to discuss Drupal management on Windows Azure within these focus areas. Availability One main goal is that the Drupal site remains running and accessible to all end-users. This involves monitoring both the site and the SQL Azure database that the site depends on. In this section, we will briefly look at monitoring and backup tasks. Other crossover areas that affect availability will be discussed in the next section on scalability. Monitoring With any application, monitoring plays an important role with managing availability. Monitoring data can reveal whether users are successfully using the site or whether computing resources are meeting the demand. Other data reveals error counts and possibly points to issues in a specific tier of the deployment. There are several monitoring tools that can be used. The Windows Azure Management Portal. Windows Azure diagnostic data. Custom monitoring scripts. System Center Operations Manager. Third party tools such as Azure Diagnostics Manager and Azure Storage Explorer. The Windows Azure Management Portal can be used to ensure that your deployments are successful and running. You can also use the portal to manage features such as Remote Desktop so that you can directly connect to machines that are running the Drupal site. Windows Azure diagnostics allows you to collect performance counters and logs off of the web role instances that are running the Drupal site. Although there are many options for configuring diagnostics in Azure, the best solution with Drupal is to use a diagnostics configuration file. The following configuration file demonstrates some basic performance counters that can monitor resources such as memory, processor utilization, and network bandwidth. For more information about setting up diagnostic configuration files, see How to Use the Windows Azure Diagnostics Configuration File. This information is stored locally on each role instance and then transferred to Windows Azure storage per a defined schedule or on-demand. See Getting Started with Storing and Viewing Diagnostic Data in Windows Azure Storage. Various monitoring tools, such as Azure Diagnostics Manager, help you to more easily analyze diagnostic data. Monitoring the performance of the machines hosting the Drupal site is only part of the story. In order to plan properly for both availability and scalability, you should also monitor site traffic, including user load patterns and trends. Standard and custom diagnostic data could contribute to this, but there are also third-party tools that monitor web traffic. For example, if you know that spikes occur in your application during certain days of the week, you could make changes to the application to handle the additional load and increase the availability of the Drupal solution. Backup Tasks To remain highly available, it is important to backup your data as a defense-in-depth strategy for disaster recovery. This is true even though SQL Azure and Windows Azure Storage both implement redundancy to prevent data loss. One obvious reason is that these services cannot prevent administrator error if data is accidentally deleted or incorrectly changed. SQL Azure does not currently have a formal backup technology, although there are many third-party tools and solutions that provide this capability. Usually the database size for a Drupal site is relatively small. In the case of SAG Awards, it was only ~100-150 MB. So performing an entire backup using any strategy was relatively fast. If your database is much larger, you might have to test various backup strategies to find the one that works best. Apart from third-party SQL Azure backup solutions, there are several strategies for obtaining a backup of your data: · Use the Drush tool and the portabledb-export command. · Periodically copy the database using the CREATE DATABASE Transact-SQL command. · Use Data-tier applications (DAC) to assist with backup and restore of the database. SQL Azure backup and data security techniques are described in more detail in the topic, Business Continuity in SQL Azure. Note that bandwidth costs accrue with any backup operation that transfers information outside of the Windows Azure datacenter. To reduce costs, you can copy the database to a database within the same datacenter. Or you can export the data-tier applications to blob storage in the same datacenter. Another potential backup task involves the files in Blob storage. If you keep a master copy of all media files uploaded to Blob storage, then you already have an on-premises backup of those files. However, if multiple administrators are loading files into Blob storage for use on the Drupal site, it is a good idea to enumerate the storage account and to download any new files to a central location. The following PHP script demonstrates how this can be done by backing up all files in Blob storage after a specified modification date. setProxy(true, 'YOUR_PROXY_IF_NEEDED', 80); $blobs = (array)$blobObj->listBlobs(AZURE_STORAGE_CONTAINER, '', '', 35000); backupBlobs($blobs, $blobObj); function backupBlobs($blobs, $blobObj) { foreach ($blobs as $blob) { if (strtotime($blob->lastmodified) >= DEFAULT_BACKUP_FROM_DATE && strtotime($blob->lastmodified) <= DEFAULT_BACKUP_TO_DATE) { $path = pathinfo($blob->name); if ($path['basename'] != '$$$.$$$') { $dir = $path['dirname']; $oldDir = getcwd(); if (handleDirectory($dir)) { chdir($dir); $blobObj->getBlob( AZURE_STORAGE_CONTAINER, $blob->name, $path['basename'] ); chdir($oldDir); } } } } } function handleDirectory($dir) { if (!checkDirExists($dir)) { return mkdir($dir, 0755, true); } return true; } function checkDirExists($dir) { if(file_exists($dir) && is_dir($dir)) { return true; } return false; } ?> This script has a dependency on the Windows Azure SDK for PHP. Also note there are several parameters that you must modify such as the storage account, secret, and backup location. As with SQL Azure, bandwidth and transaction charges apply to a backup script like this. Scalability Drupal sites on Windows Azure can scale as load increased through typical strategies of scale-up, scale-out, and caching. The following sections describe the specifics of how these strategies are implemented in Windows Azure. Typically you make scalability decisions based on monitoring and capacity planning. Monitoring can be done in staging during testing or in production with real-time load. Capacity planning factors in projections for changes in user demand. Scale Up When you configure your web role prior to deployment, you have the option of specifying the Virtual Machine (VM) size, such as Small or ExtraLarge. Each size tier adds additional memory, processing power, and network bandwidth to each instance of your web role. For cost efficiency and smaller units of scale, you can test your application under expected load to find the smallest virtual machine size that meets your requirements. The workload usually in most popular Drupal websites can be separated out into a limited set of Drupal admins making content changes and a large user base who perform mostly read-only workload. End users can be allowed to make ‘writes’, such as uploading blogs or posting in forums, but those changes are not ‘content changes’. Drupal admins are setup to operate without caching so that the writes are made directly to SQL Azure or the corresponding backend database. This workload performs well with Large or ExtraLarge VM sizes. Also, note that the VM size is closely tied to all hardware resources, so if there are many content-rich pages that are streaming content, then the VM size requirements are higher. To make changes to the Virtual Machine size setting, you must change the vmsize attribute of the WebRole element in the service definition file, ServiceDefinition.csdef. A virtual machine size change requires existing applications to be redeployed. Scale Out In addition to the size of each web role instance, you can increase or decrease the number of instances that are running the Drupal site. This spreads the web requests across more servers, enabling the site to handle more users. To change the number of running instances of your web role, see How to Scale Applications by Increasing or Decreasing the Number of Role Instances. Note that some configuration changes can cause your existing web role instances to recycle. You can choose to handle this situation by applying the configuration change and continue running. This is done by handling the RoleEnvironment.Changing event. For more information see, How to Use the RoleEnvironment.Changing Event. A common question for any Windows Azure solution is whether there is some type of built-in automatic scaling. Windows Azure does not provide a service that provides auto-scaling. However, it is possible to create a custom solution that scales Azure services using the Service Management API. For an example of this approach, see An Auto-Scaling Module for PHP Applications in Windows Azure. Caching Caching is an important strategy for scaling Drupal applications on Windows Azure. One reason for this is that SQL Azure implements throttling mechanisms to regulate the load on any one database in the cloud. Code that uses SQL Azure should have robust error handling and retry logic to account for this. For more information, see Error Messages (SQL Azure Database). Because of the potential for load-related throttling as well as for general performance improvement, it is strongly recommended to use caching. Although Windows Azure provides a Caching service, this service does not currently have interoperability with PHP. Because of this, the best solution for caching in Drupal is to use a module that uses an open-source caching technology, such as Memcached. Outside of a specific Drupal module, you can also configure Memcached to work in PHP for Windows Azure. For more information, see Running Memcached on Windows Azure for PHP. Here is also an example of how to get Memcached working in Windows Azure using a plugin: Windows Azure Memcached plugin. In a future paper, we hope to cover this architecture in more detail. For now, here are several design and management considerations related to caching. Area Consideration Design and Implementation For a technology like Memcached, will the cache be collocated (spread across all web role instances)? Or will you attempt to setup a dedicated cache ring with worker roles that only run Memcached? Configuration What memory is required and how will items in the cache be invalidated? Performance and Monitoring What mechanisms will be used to detect the performance and overall health of the cache? For ease of use and cost savings, collocation of the cache across the web role instances of the Drupal site works best. However, this assumes that there is available reserve memory on each instance to apply toward caching. It is possible to increase the virtual machine size setting to increase the amount of available memory on each machine. It is also possible to add additional web role instances to add to the overall memory of the cache while at the same time improving the ability of the web site to respond to load. It is possible to create a dedicated cache cluster in the cloud, but the steps for this are beyond the scope of this paper[RR1] . For Windows Azure Blob storage, there is also a caching feature built into the service called the Content Delivery Network (CDN). CDN provides high-bandwidth access to files in Blob storage by caching copies of the files in edge nodes around the world. Even within a single geographic region, you could see performance improvements as there are many more edge nodes than Windows Azure datacenters. For more information, see Delivering High-Bandwidth Content with the Windows Azure CDN. Manageability It is important to note that each hosted service has a Staging environment and a Production environment. This can be used to manage deployments, because you can load and test and application in staging before performing a VIP swap with production. From a manageability standpoint, Drupal has an advantage on Windows Azure in the way that site content is stored. Because the data necessary to serve pages is stored in the database and blob storage, there is no need to redeploy the application to change the content of the site. Another best practice is to use a separate storage account for diagnostic data than the one that is used for the application itself. This can improve performance and also helps to separate the cost of diagnostic monitoring from the cost of the running application. As mentioned previously, there are several tools that can assist with managing Windows Azure applications. The following table summarizes a few of these choices. Tool Description Windows Azure Management Portal The web interface of the Windows Azure management portal shows deployments, instance counts and properties, and supports many different common management and monitoring tasks. Azure Diagnostics Managerq[RR2] [JR3] A Red Gate Software product that provides advanced monitoring and management of diagnostic data. This tool can be very useful for easily analyzing the performance of the Drupal site to determine appropriate scaling decisions. Azure Storage Explorer A tool created by Neudesic for viewing Windows Azure storage account. This can be useful for viewing both diagnostic data and the files in Blob storage.
April 25, 2012
by Brian Swan
· 8,746 Views
article thumbnail
Amazon EMR Tutorial: Running a Hadoop MapReduce Job Using Custom JAR
See original post at https://muhammadkhojaye.blogspot.com/2012/04/how-to-run-amazon-elastic-mapreduce-job.html Introduction Amazon EMR is a web service which can be used to easily and efficiently process enormous amounts of data. It uses a hosted Hadoop framework running on the web-scale infrastructure of Amazon EC2 and Amazon S3. Amazon EMR removes most of the cumbersome details of Hadoop while taking care of provisioning of Hadoop, running the job flow, terminating the job flow, moving the data between Amazon EC2 and Amazon S3, and optimizing Hadoop. In this tutorial, we will use a developed WordCount Java example using Hadoop and thereafter, we execute our program on Amazon Elastic MapReduce. Prerequisites You must have valid AWS account credentials. You should also have a general familiarity with using the Eclipse IDE before you begin. The reader can also use any other IDE of their choice. Step 1 – Develop MapReduce WordCount Java Program In this section, we are first going to develop a WordCount application. A WordCount program will determine how many times different words appear in a set of files. In Eclipse (or whatever the IDE you are using), Create simple Java Project with the name "WordCount". Create a java class name Map and override the map method as follow, public class Map extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Create a java class named Reduce and override the reduce method as shown below, public class Reduce extends Reducer { @Override protected void reduce(Text key, java.lang.Iterable values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } Create a java class named WordCount and defined the main method as below, public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } Export the WordCount program in a jar using eclipse and save it to some location on disk. Make sure that you have provided the Main Class (WordCount.jar) during extraction ofu8u the jar file as shown below. Our jar is ready!!! Step 2 – Upload the WordCount JAR and Input Files to Amazon S3 Now we are going to upload the WordCount jar to Amazon S3. First, go to the following URL: https://console.aws.amazon.com/s3/home Next, click “Create Bucket”, give your bucket a name, and click the “Create” button. Select your new S3 bucket in the left-hand pane. Upload the WordCount JAR and sample input file for counting the words. Step 3 – Running an Elastic MapReduce job Now that the JAR is uploaded into S3, all we need to do is to create a new Job flow. let's execute the steps below. (I encourage readers to check out the following link for details regarding each step, How to Create a Job Flow Using a Custom JAR ) Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/ Click Create New Job Flow. In the DEFINE JOB FLOW page, enter the following details, a) Job Flow Name = WordCountJob b) Select Run your own applications) Select Custom JAR in the drop-down list) Click Continue In the SPECIFY PARAMETERS page, enter values in the boxes using the following table as a guide, and then click Continue.JAR Location = bucketName/jarFileLocationJAR Arguments =s3n://bucketName/inputFileLocations3n://bucketName/outputpath Please note that the output path must be unique each time we execute the job. The Hadoop always create a folder with the same name specified here. After executing the job, just wait and monitor your job that runs through the Hadoop flow. You can also look for errors by using the Debug button. The job should be complete within 10 to 15 minutes (can also depend on the size of the input). After completing the job, You can view results in the S3 Browser panel. You can also download the files from S3 and can analyze the outcome of the job. Amazon Elastic MapReduce Resources Amazon Elastic MapReduce Documentation,http://aws.amazon.com/documentation/elasticmapreduce/ Amazon Elastic MapReduce Getting Started Guide,http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/ Amazon Elastic MapReduce Developer Guide,http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/ Apache Hadoop,http://hadoop.apache.org/ See more at https://muhammadkhojaye.blogspot.com/2012/04/how-to-run-amazon-elastic-mapreduce-job.html
April 23, 2012
by Muhammad Ali Khojaye
· 59,029 Views
article thumbnail
How to Use Sigma.js with Neo4j
i’ve done a few posts recently using d3.js and now i want to show you how to use two other great javascript libraries to visualize your graphs. we’ll start with sigma.js and soon i’ll do another post with three.js . we’re going to create our graph and group our nodes into five clusters. you’ll notice later on that we’re going to give our clustered nodes colors using rgb values so we’ll be able to see them move around until they find their right place in our layout. we’ll be using two sigma.js plugins, the gefx (graph exchange xml format) parser and the forceatlas2 layout. you can see what a gefx file looks like below. notice it comes from gephi which is an interactive visualization and exploration platform, which runs on all major operating systems, is open source, and is free. ... ... in order to build this file, we will need to get the nodes and edges from the graph and create an xml file. get '/graph.xml' do @nodes = nodes @edges = edges builder :graph end we’ll use cypher to get our nodes and edges: def nodes neo = neography::rest.new cypher_query = " start node = node:nodes_index(type='user')" cypher_query << " return id(node), node" neo.execute_query(cypher_query)["data"].collect{|n| {"id" => n[0]}.merge(n[1]["data"])} end we need the node and relationship ids, so notice i’m using the id() function in both cases. def edges neo = neography::rest.new cypher_query = " start source = node:nodes_index(type='user')" cypher_query << " match source -[rel]-> target" cypher_query << " return id(rel), id(source), id(target)" neo.execute_query(cypher_query)["data"].collect{|n| {"id" => n[0], "source" => n[1], "target" => n[2]} } end so far we have seen graphs represented as json, and we’ve built these manually. today we’ll take advantage of the builder ruby gem to build our graph in xml. xml.instruct! :xml xml.gexf 'xmlns' => "http://www.gephi.org/gexf", 'xmlns:viz' => "http://www.gephi.org/gexf/viz" do xml.graph 'defaultedgetype' => "directed", 'idtype' => "string", 'type' => "static" do xml.nodes :count => @nodes.size do @nodes.each do |n| xml.node :id => n["id"], :label => n["name"] do xml.tag!("viz:size", :value => n["size"]) xml.tag!("viz:color", :b => n["b"], :g => n["g"], :r => n["r"]) xml.tag!("viz:position", :x => n["x"], :y => n["y"]) end end end xml.edges :count => @edges.size do @edges.each do |e| xml.edge:id => e["id"], :source => e["source"], :target => e["target"] end end end end you can get the code on github as usual and see it running live on heroku. you will want to see it live on heroku so you can see the nodes in random positions and then move to form clusters. use your mouse wheel to zoom in, and click and drag to move around. credit goes out to alexis jacomy and mathieu jacomy . you’ve seen me create numerous random graphs, but for completeness here is the code for this graph. notice how i create 5 clusters and for each node i assign half its relationships to other nodes in their cluster and half to random nodes? this is so the forceatlas2 layout plugin clusters our nodes neatly. def create_graph neo = neography::rest.new graph_exists = neo.get_node_properties(1) return if graph_exists && graph_exists['name'] names = 500.times.collect{|x| generate_text} clusters = 5.times.collect{|x| {:r => rand(256), :g => rand(256), :b => rand(256)} } commands = [] names.each_index do |n| cluster = clusters[n % clusters.size] commands << [:create_node, {:name => names[n], :size => 5.0 + rand(20.0), :r => cluster[:r], :g => cluster[:g], :b => cluster[:b], :x => rand(600) - 300, :y => rand(150) - 150 }] end names.each_index do |from| commands << [:add_node_to_index, "nodes_index", "type", "user", "{#{from}"] connected = [] # create clustered relationships members = 20.times.collect{|x| x * 10 + (from % clusters.size)} members.delete(from) rels = 3 rels.times do |x| to = members[x] connected << to commands << [:create_relationship, "follows", "{#{from}", "{#{to}"] unless to == from end # create random relationships rels = 3 rels.times do |x| to = rand(names.size) commands << [:create_relationship, "follows", "{#{from}", "{#{to}"] unless (to == from) || connected.include?(to) end end batch_result = neo.batch *commands end
April 12, 2012
by Max De Marzi
· 15,379 Views
article thumbnail
Hadoop Basics—Creating a MapReduce Program
The Map Reduce Framework works in two main phases to process the data, which are the "map" phase and the "reduce" phase.
March 18, 2012
by Carlo Scarioni
· 212,721 Views · 4 Likes
article thumbnail
Joins with MapReduce
i have been reading up on join implementations available for hadoop for past few days. in this post i recap some techniques i learnt during the process. the joins can be done at both map side and join side according to the nature of data sets of to be joined. reduce side join let’s take the following tables containing employee and department data. let’s see how join query below can be achieved using reduce side join. select employees.name, employees.age, department.name from employees inner join department on employees.dept_id=department.dept_id map side is responsible for emitting the join predicate values along with the corresponding record from each table so that records having same department id in both tables will end up at on same reducer which would then do the joining of records having same department id. however it is also required to tag the each record to indicate from which table the record originated so that joining happens between records of two tables. following diagram illustrates the reduce side join process. here is the pseudo code for map function for this scenario. map (k table, v rec) { dept_id = rec.dept_id tagged_rec.tag = table tagged_rec.rec = rec emit(dept_id, tagged_rec) } at reduce side join happens within records having different tags. reduce (k dept_id, list tagged_recs) { for (tagged_rec : tagged_recs) { for (tagged_rec1 : taagged_recs) { if (tagged_rec.tag != tagged_rec1.tag) { joined_rec = join(tagged_rec, tagged_rec1) } emit (tagged_rec.rec.dept_id, joined_rec) } } map side join (replicated join) using distributed cache on smaller table for this implementation to work one relation has to fit in to memory. the smaller table is replicated to each node and loaded to the memory. the join happens at map side without reducer involvement which significantly speeds up the process since this avoids shuffling all data across the network even-though most of the records not matching are later dropped. smaller table can be populated to a hash-table so look-up by dept_id can be done. the pseudo code is outlined below. map (k table, v rec) { list recs = lookup(rec.dept_id) // get smaller table records having this dept_id for (small_table_rec : recs) { joined_rec = join (small_table_rec, rec) } emit (rec.dept_id, joined_rec) } using distributed cache on filtered table if the smaller table doesn’t fit the memory it may be possible to prune the contents of it if filtering expression has been specified in the query. consider following query. select employees.name, employees.age, department.name from employees inner join department on employees.dept_id=department.dept_id where department.name="eng" here a smaller data set can be derived from department table by filtering out records having department names other than “eng”. now it may be possible to do replicated map side join with this smaller data set. replicated semi-join reduce side join with map side filtering even of the filtered data of small table doesn’t fit in to the memory it may be possible to include just the dept_id s of filtered records in the replicated data set. then at map side this cache can be used to filter out records which would be sent over to reduce side thus reducing the amount of data moved between the mappers and reducers. the map side logic would look as follows. map (k table, v rec) { // check if this record needs to be sent to reducer boolean sendtoreducer = check_cache(rec.dept_id) if (sendtoreducer) { dept_id = rec.dept_id tagged_rec.tag = table tagged_rec.rec = rec emit(dept_id, tagged_rec) } } reducer side logic would be same as the reduce side join case. using a bloom filter a bloom filter is a construct which can be used to test the containment of a given element in a set. a smaller representation of filtered dept_ids can be derived if dept_id values can be augmented in to a bloom filter. then this bloom filter can be replicated to each node. at the map side for each record fetched from the smaller table the bloom filter can be used to check whether the dept_id in the record is present in the bloom filter and only if so to emit that particular record to reduce side. since a bloom filter is guaranteed not to provide false negatives the result would be accurate. references [1] hadoop in action [2] hadoop : the definitive guide
March 12, 2012
by Buddhika Chamith
· 31,048 Views
article thumbnail
Computing a disparity map in OpenCV
A disparity map contains information related to the distance of the objects of a scene from a viewpoint. In this example we will see how to compute a disparity map from a stereo pair and how to use the map to cut the objects far from the cameras. The stereo pair is represented by two input images, these images are taken with two cameras separated by a distance and the disparity map is derived from the offset of the objects between them. There are various algorithm to compute a disparity map, the one implemented in OpenCV is the graph cut algorithm. To use it we have to call the function CreateStereoGCState() to initialize the data structure needed by the algorithm and use the function FindStereoCorrespondenceGC() to get the disparity map. Let's see the code: def cut(disparity, image, threshold): for i in range(0, image.height): for j in range(0, image.width): # keep closer object if cv.GetReal2D(disparity,i,j) > threshold: cv.Set2D(disparity,i,j,cv.Get2D(image,i,j)) # loading the stereo pair left = cv.LoadImage('scene_l.bmp',cv.CV_LOAD_IMAGE_GRAYSCALE) right = cv.LoadImage('scene_r.bmp',cv.CV_LOAD_IMAGE_GRAYSCALE) disparity_left = cv.CreateMat(left.height, left.width, cv.CV_16S) disparity_right = cv.CreateMat(left.height, left.width, cv.CV_16S) # data structure initialization state = cv.CreateStereoGCState(16,2) # running the graph-cut algorithm cv.FindStereoCorrespondenceGC(left,right, disparity_left,disparity_right,state) disp_left_visual = cv.CreateMat(left.height, left.width, cv.CV_8U) cv.ConvertScale( disparity_left, disp_left_visual, -20 ); cv.Save( "disparity.pgm", disp_left_visual ); # save the map # cutting the object farthest of a threshold (120) cut(disp_left_visual,left,120) cv.NamedWindow('Disparity map', cv.CV_WINDOW_AUTOSIZE) cv.ShowImage('Disparity map', disp_left_visual) cv.WaitKey() These are the two input image I used to test the program (respectively left and right): Result using threshold = 100 Result using threshold = 120 Result using threshold = 180 Source: http://glowingpython.blogspot.com/2011/11/computing-disparity-map-in-opencv.html
February 21, 2012
by Giuseppe Vettigli
· 26,107 Views
article thumbnail
Django: Excluding Some Views from Middleware
In my Django applications, I tend to use custom middleware extensively for common tasks. I have middleware that logs page runtime, middleware that sets context that most views will end up needing anyway, and middleware that copies the HTTP_REFERRER header from an entry page into the session scope for use later in the session. At some point, I inadvertently created a middleware class invalidated the browser cache for certain views. Typically, just wrapping a view in @cache_control(max_age=3600) is enough to have the browser cache that view for an hour. But if you do something innocuous like evaluate request.user.is_authenticated() in a middleware class, then Django will set the Vary: Cookie header, invalidating the cache. In my case, what I really wanted was a decorator that I could attach to a view that would skip my custom middleware, like an exclude list. Of course, you could just attach your middleware explicitly to each view that needs it, but that's needless code repetition if a middleware should wrap almost all views. You could also change each of your middleware classes to exclude particular views by URL, but you might end up having to alter many different middleware classes with that logic. As another option, you can use the following decorator/middleware pair to short-circuit the middleware execution of any view, for any middleware defined in your settings file AFTER this one. """ Allows short-curcuiting of ALL remaining middleware by attaching the @shortcircuitmiddleware decorator as the TOP LEVEL decorator of a view. Example settings.py: MIDDLEWARE_CLASSES = ( 'django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', # THIS MIDDLEWARE 'myapp.middleware.shortcircuit.ShortCircuitMiddleware', # SOME OTHER MIDDLE WARE YOU WANT TO SKIP SOMETIMES 'myapp.middleware.package.MostOfTheTimeMiddleware', # MORE MIDDLEWARE YOU WANT TO SKIP SOMETIMES HERE ) Example view to exclude from MostOfTheTimeMiddleware (and any subsequent): @shortcircuitmiddleware def myview(request): ... """ def shortcircuitmiddleware(f): """ view decorator, the sole purpose to is 'rename' the function '_shortcircuitmiddleware' """ def _shortcircuitmiddleware(*args, **kwargs): return f(*args, **kwargs) return _shortcircuitmiddleware class ShortCircuitMiddleware(object): """ Middleware; looks for a view function named '_shortcircuitmiddleware' and short-circuits. Relies on the fact that if you return an HttpResponse from a view, it will short-circuit other middleware, see: https://docs.djangoproject.com/en/dev/topics/http/middleware/#process-request """ def process_view(self, request, view_func, view_args, view_kwargs): if view_func.func_name == "_shortcircuitmiddleware": return view_func(request, *view_args, **view_kwargs) return None Source: http://bitkickers.blogspot.com/2011/08/django-exclude-some-views-from.html
February 20, 2012
by Chase Seibert
· 13,183 Views
article thumbnail
How to deploy a neo4j instance in Amazon EC2 in 10 minutes
Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database. In this post I will explain how to deploy a neo4j instance in Amazon EC2 web service. For this tutorial to take you no more than 10 minutes you should be able to execute properly some bash commands like mv, tar, ssh and scp (secure copy). I also assume that you have an account in Amazon Web Services and you are familiar to the process of launching instances. If not, I strongly recommend you to follow this starting guide and complete it till you manage to connect to your instance with ssh. Start downloading the latest stable version of neo4j. Which you can find here. The “Community Edition” fits well for development purposes. Do not forget to select the Unix version of the server. This will download a tar.gz file which you will copy to your EC2 instance later. While you download the neo4j server open the AWS Management Console and launch a Basic 32-bit Amazon Linux AMI. If you want to launch an Ubuntu AMI please notice that it doesn’t ship with Java, which is required for running neo4j. If you are not familiar with key pairs, pem files or security groups I insist you to follow the EC2 starting guide I mentioned above. You can either create a new security group or use the default, but you will need to configure a new security rule for the neo4j server port. After launching the instance, create a TCP rule on port 7474 with source 0.0.0.0/0. Here you are opening port 7474 for anyone. If you are planning to use the neo4j REST API and remotely call it from another server, for example a Rails application hosted in Heroku, for security reasons, you may want to change the source field to the address of your Heroku server. Do not forget to open port 22 (SSH), this is typically the first rule normal people create after launching an instance. You are almost done! You should now install neo4j in your instance. Open a terminal in your localhost and navigate to the path where you downloaded neo4j. Copy the file to your Amazon instance by using the scp command: scp -i your_pem_file.pem neo4j-community-1.6.M01-unix.tar.gz ec2-user@YOUR_PUBLIC_INSTANCE_DNS:/home/ec2-user Please notice that you will need to change the path to your pem file, typically placed in ~/.ssh, the filename of the neo4j server you just downloaded and the plublic DNS of your instance. Now connect to your instance with SSH: ssh -i your_pem_file.pem ec2-user@YOUR_PUBLIC_INSTANCE_DNS Untar the neo4j server: tar xvfz neo4j-community-1.6.M01-unix.tar.gz.tar.gz Move it to /usr/local and rename the folder to neo4j: sudo mv neo4j-community-1.6.M01 /usr/local/neo4j Almost done!!! You should now open neo4j-server.properties under the conf directory and add the following line: org.neo4j.server.webserver.address=0.0.0.0 This lines allows anyone to connect remotely to your neo4j database server. Now run the start script. From the neo4j server folder. sudo ./bin/neo4j start Finally, open a browser and access the webadmin interface of your neo4j database by typing http://YOUR_PUBLIC_INSTANCE_DNS:7474. You should see the Neo4j Monitoring and Management Tool, pretty cool! If not, ask me You can now try using the REST API and the curl bash command to insert nodes and relationships. I hope this post helped you, good luck! Follow me on Twitter @negarnil Source: http://www.cloudtmp.com/java/how-to-deploy-a-neo4j-instance-in-amazon-ec2-in-10-minutes/
December 27, 2011
by Nicolas Garnil
· 27,399 Views · 1 Like
article thumbnail
Zero Downtime – What is it and why is it important?
For most large web applications, uptime is of foremost importants. Any outage can be seen by customers as a frustration, or opportunity to move to a competitor. What's more for a site that also includes e-commerce, it can mean real lost sales. Zero Downtime describes a site without service interruption. To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure. If you're using cloud hosting, are you redundant to alternate availability zones and regions? Are you using geographically distributed load balancing? Do you have multiple clustered databases on the backend, and multiple webservers load balanced. All of these requirements will increase uptime, but may not bring you close to zero downtime. For that you'll need thorough testing. The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage. The ultimate test is the outage itself. Sean Hull on Quora: What is zero downtime and why is it important? Source: http://www.iheavy.com/2011/06/23/zero-downtime-what-is-it-and-why-is-it-important/
November 23, 2011
by Sean Hull
· 26,149 Views
article thumbnail
Handling PHP Sessions in Windows Azure
One of the challenges in building a distributed web application is in handling sessions. When you have multiple instances of an application running and session data is written to local files (as is the default behavior for the session handling functions in PHP) a user session can be lost when a session is started on one instance but subsequent requests are directed (via a load balancer) to other instances. To successfully manage sessions across multiple instances, you need a common data store. In this post I’ll show you how the Windows Azure SDK for PHP makes this easy by storing session data in Windows Azure Table storage. In the 4.0 release of the Windows Azure SDK for PHP, session handling via Windows Azure Table and Blob storage was included in the newly added SessionHandler class. Note: The SessionHandler class supports storing session data in Table storage or Blob storage. I will focus on using Table storage in this post largely because I haven’t been able to come up with a scenario in which using Blob storage would be better (or even necessary). If you have ideas about how/why Blob storage would be better, I’d love to hear them. The SessionHandler class makes it possible to write code for handling sessions in the same way you always have, but the session data is stored on a Windows Azure Table instead of local files. To accomplish this, precede your usual session handling code with these lines: require_once 'Microsoft/WindowsAzure/Storage/Table.php'; require_once 'Microsoft/WindowsAzure/SessionHandler.php'; $storageClient = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', 'your storage account name', 'your storage account key'); $sessionHandler = new Microsoft_WindowsAzure_SessionHandler($storageClient , 'sessionstable'); $sessionHandler->register(); Now you can call session_start() and other session functions as you normally would. Nicely, it just works. Really, that’s all there is to using the SessionHandler, but I found it interesting to take a look at how it works. The first interesting thing to note is that the register method is simply calling the session_set_save_handler function to essentially map the session handling functionality to custom functions. Here’s what the method looks like from the source code: public function register() { return session_set_save_handler(array($this, 'open'), array($this, 'close'), array($this, 'read'), array($this, 'write'), array($this, 'destroy'), array($this, 'gc') ); } The reading, writing, and deleting of session data is only slightly more complicated. When writing session data, the key-value pairs that make up the data are first serialized and then base64 encoded. The serialization of the data allows for lots of flexibility in the data you want to store (i.e. you don’t have to worry about matching some schema in the data store). When storing data in a table, each entry must have a partition key and row key that uniquely identify it. The partition key is a string (“sessions” by default, but this is changeable in the class constructor) and the the row key is the session ID. (For more information about the structure of Tables, see this post.) Finally, the data is either updated (it it already exists in the Table) or a new entry is inserted. Here’s a portion of the write function: $serializedData = base64_encode(serialize($serializedData)); $sessionRecord = new Microsoft_WindowsAzure_Storage_DynamicTableEntity($this->_sessionContainerPartition, $id); $sessionRecord->sessionExpires = time(); $sessionRecord->serializedData = $serializedData; try { $this->_storage->updateEntity($this->_sessionContainer, $sessionRecord); } catch (Microsoft_WindowsAzure_Exception $unknownRecord) { $this->_storage->insertEntity($this->_sessionContainer, $sessionRecord); } Not surprisingly, when session data is read from the table, it is retrieved by session ID, base64 decoded, and unserialized. Again, here’s a snippet that show’s what is happening: $sessionRecord = $this->_storage->retrieveEntityById( $this->_sessionContainer, $this->_sessionContainerPartition, $id ); return unserialize(base64_decode($sessionRecord->serializedData)); As you can see, the SessionHandler class makes good use of the storage APIs in the SDK. To learn more about the SessionHandler class (and the storage APIs), check out the documentation on Codeplex. You can, of course, get the complete source code here: http://phpazure.codeplex.com/SourceControl/list/changesets. As I investigated the session handling in the Windows Azure SDK for PHP, I noticed that the absence of support for SQL Azure as a session store was conspicuous. I’m curious about how many people would prefer to use SQL Azure over Azure Tables as a session store. If you have an opinion on this, please let me know in the comments.
October 19, 2011
by Brian Swan
· 7,882 Views
article thumbnail
EC2 Interview – AWS Interview – Cloud Interview – 8 Questions
If you're looking for a cloud expert, specifically someone who knows Amazon Web Services and EC2, you'll want to have a battery of questions to assess their knowledge.
September 15, 2011
by Sean Hull
· 111,875 Views · 1 Like
article thumbnail
Cloud Integration with Apache Camel and Amazon Web Services (AWS): S3, SQS and SNS
The integration framework Apache Camel already supports several important cloud services (see my overview article at http://www.kai-waehner.de/blog/2011/07/09/cloud-computing-heterogeneity-will-require-cloud-integration-apache-camel-is-already-prepared for more details). This article describes the combination of Apache Camel and the Amazon Web Services (AWS) interfaces of Simple Storage Service (S3), Simple Queue Service (SQS) and Simple Notification Service (SNS). Thus, The concept of Infrastructure as a Service (IaaS) is used to access messaging systems and data storage without any need for configuration. Registration to AWS and Setup of Camel First, you have to register to the Amazon Web Services (for free). Most AWS services include a free monthly quota, which is absolutely sufficient to play around and develop some simple applications. As its name states, AWS uses technology-independent web services. Besides, APIs for several different programming languages are available to ease development. By the way, Camel uses the AWS SDK for Java (http://aws.amazon.com/sdkforjava), of course. The documentation is detailed and easy to understand, including tutorials, screenshots and code examples . Hint 1: You should read the introductions to S3, SQS and SNS (go to http://aws.amazon.com and click on „products“) and play around with the AWS Management Console (http://aws.amazon.com/console) before you continue. This step is very easy and takes less than one hour. Then, you will have a much better understanding about AWS and where Camel can help you! Hint 2: It really helps to look at the source code of the camel-aws component, It helps you to understand how Camel uses the AWS Java API internally. If you want to write tests, you can do it the same way. In the past, I was afraid of looking at „complex“ source code of open source frameworks. But there is no need to be scared! The camel-aws component (and most other camel components) contain only of a few classes. Everything is easy to understand. It helps you to understand Camel internals, the AWS API, and to spot and solve errors due to exceptions in your code. In the meanwhile, the current Camel version 2.8 supports three AWS services: S3, SQS and SNS. All of them use similar concepts. Therefore, they are included in one single camel component: „camel-aws“. You have to add the libraries to your existing Camel project. As always, the simplest way is to use Maven and add the following dependency to the pom.xml: org.apache.camel camel-aws ${camel-version} Configuration of the Camel Endpoint The implementation and configuration of all three services is very similar. The URI looks like this (the code shows the SQS service): aws-sqs://queue-name[?options] There are two alternatives to configure your endpoint. Using Parameters The easy way is to use two paramters in the URI of your endpoint: „accessKey“ and „secretKey“ (you receive both after your AWS registration). “aws-sqs://unique-queue-name?accessKey=“INSERT_ME“&secretKey=INSERT_ME” Be aware of the following problem, which can result in a strange, non-speaking exception (thanks to Brendan Long): You’ll need to URL encode any +’s in your secret key (otherwise, they’ll be treated as spaces). + = %2B, so if your secretkey was “my+secret\key”, your Camel URL should have “secretKey=my%2Bsecret\key”. “Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.” Source: WC3 URI Recommendations Adding a configured AmazonClient to the Registry If you need to do more configuration (e.g. because your system is behind a firewall), you have to add an AmazonClient object to your registry. The following code shows an example using SQS, but SNS and S3 use exactly the same concept. @Override protected JndiRegistry createRegistry() throws Exception { JndiRegistry registry = super.createRegistry(); AWSCredentials awsCredentials = new BasicAWSCredentials(“INSERT_ME”, “INSERT_ME”); ClientConfiguration clientConfiguration = new ClientConfiguration(); clientConfiguration.setProxyHost(“http://myProxyHost”); clientConfiguration.setProxyPort(8080); AmazonSQSClient client = new AmazonSQSClient(awsCredentials, clientConfiguration); registry.bind(“amazonSQSClient”, client); return registry; } This example overwrites the createRegistry() method of a JUnit test (extending CamelTestSupport). You can also add this information to your runtime Camel application, of course. Apache Camel and the Simple Storage Service (S3) Simple Storage Service (S3) is a key-value-store. You can store small to very large data. The usage is very easy. You create buckets and put key-value data into these buckets. You can also create folders within buckets to organize your data. That’s it. You can monitor your buckets using the AWS Management Console – an intuitive GUI supporting most AWS services. The following example shows both alternatives for accessing the Amazon services (as described above): Paramenters and the AmazonClient. // Transfer data from your file inbox to the AWS S3 service from(“file:files/inbox”) // This is the key of your key-value data .setHeader(S3Constants.KEY, simple(“This is a static key”)) // Using parameters for accessing the AWS service .to(“aws-s3://camel-integration-bucket-mwea-kw?accessKey=INSERT_ME&secretKey=INSERT_ME&region=eu-west-1″); // Transfer data from the AWS S3 service to your file outbox from(“aws-s3://camel-integration-bucket-mwea-kw?amazonS3Client=#amazonS3Client&region=eu-wes”) .to(“file:files/outbox”); There are some additional parameters, for instance you can submit the desired AWS region or delete data after receiving it (see http://camel.apache.org/aws-s3.html and the corresponding SQS and SNS sites for more details about parameters and message headers). As you see in the code, you can use the AWS-S3 endpoint for producing and for consuming messages. Each bucket must be unique, thus you have to add some specific information such as your company to its name. Hint: If a bucket does not exist, Camel is creating it automatically (as the AWS API does). This concept is also used for SQS queues and SNS topics. Apache Camel and the Simple Queue Service (SQS) The Simple Queue Service (SQS) is similar to a JMS provider such as WebSphere MQ or ActiveMQ (but with some differences). You create queues and send messages to them. Consumers receive the messages. Contrary to most other AWS services, you cannot monitor queues by using the AWS management console directly. You have to use the service „Cloudwatch“ (http://aws.amazon.com/cloudwatch) and start an EC2 instance to monitor queues and its content. As you can see in the following code example, the syntax and concepts are almost the same as for the S3 service: from(“file:inbox”) .to(“aws-sqs://camel-integration-queue-mwea-kw?accessKey=INSERT_ME&secretKey=INSERT_ME”); from(“aws-sqs://camel-integration-queue-mwea-kw?amazonSQSClient=#amazonSQSClient”) .to(“file:outbox?fileName=sqs-${date:now:yyyy.MM.dd-hh:mm:ss:SS}”); Again, you can use the AWS-SQS endpoint for producing and for consuming messages. Each queue name must be unique. There exist two important differences to JMS (copy & paste from the AWS documentation): Q: How many times will I receive each message? Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies. Q: Why are there separate ReceiveMessage and DeleteMessage operations? When Amazon SQS returns a message to you, that message stays in the queue, whether or not you actually received the message. You are responsible for deleting the message; the delete request acknowledges that you’re done processing the message. If you don’t delete the message, Amazon SQS will deliver it again on another receive request. Apache Camel and the Simple Notification Service (SNS) The Simple Notification Service (SNS) acts like JMS topics. You create a topic, consumers subscribe to the topic and then receive notifications. Several transport protocols are supported: HTTP(S), Email and SQS. Further interfaces will be added in the future, e.g. the Short Message Service (SMS) for mobile phones. Contrary to S3 and SQS, Camel only offers a producer endpoint for this AWS service. You can only create topics and send messages via Camel. The reason is simple: Camel already offers endpoints for consuming these messages: HTTP, Email and SQS are already available. There is one tradeoff: A consumer cannot subscribe to topics using Camel – at the moment. The AWS Management Console has to be used. A very interesting discussion can be read on the Camel JIRA issue regarding the following questions: Should Camel be able to subscribe to topics? Should the producer contain this feature or should there be a consumer? In my opinion, there should be a consumer which is able to subscribe to topics, otherwise Camel is missing a key part of the AWS SNS service! Please read the discussion and contribute your opinion: https://issues.apache.org/jira/browse/CAMEL-3476. Apache Camel is already ready for the Cloud Computing Era AWS offers many more services for the cloud. Probably, it does not make sense to integrate everyone into Camel, but more AWS services will be supported in the future. For instance, SimpleDB and the Relational Database Service (RDS) are already planned and make sende, too: http://camel.apache.org/aws.html. The conclusion is easy: Apache Camel is already ready for the cloud computing era. Several important cloud services are already supported. Cloud integration will become very important in the future. Thus, Camel is on a very good way. Hopefully, we will see more cloud components, soon. I will continue to write articles about other Camel cloud components (and new AWS addons, ouf course). For instance, a component for the Platform as a Service (PaaS) product Google App Engine (GAE) is already available. If you have any additional important information, questions or other feedback, please write a comment. Thank you in advance… Best regards, Kai Wähner (Twitter: @KaiWaehner) [Content from my Blog: Cloud Integration with Apache Camel and Amazon Web Services (AWS): S3, SQS and SNS]
August 30, 2011
by Kai Wähner DZone Core CORE
· 26,153 Views
article thumbnail
Java NIO vs. IO
when studying both the java nio and io api's, a question quickly pops into mind: when should i use io and when should i use nio? in this text i will try to shed some light on the differences between java nio and io, their use cases, and how they affect the design of your code. main differences of java nio and io the table below summarizes the main differences between java nio and io. i will get into more detail about each difference in the sections following the table. io nio stream oriented buffer oriented blocking io non blocking io selectors stream oriented vs. buffer oriented the first big difference between java nio and io is that io is stream oriented, where nio is buffer oriented. so, what does that mean? java io being stream oriented means that you read one or more bytes at a time, from a stream. what you do with the read bytes is up to you. they are not cached anywhere. furthermore, you cannot move forth and back in the data in a stream. if you need to move forth and back in the data read from a stream, you will need to cache it in a buffer first. java nio's buffer oriented approach is slightly different. data is read into a buffer from which it is later processed. you can move forth and back in the buffer as you need to. this gives you a bit more flexibility during processing. however, you also need to check if the buffer contains all the data you need in order to fully process it. and, you need to make sure that when reading more data into the buffer, you do not overwrite data in the buffer you have not yet processed. blocking vs. non-blocking io java io's various streams are blocking. that means, that when a thread invokes a read() or write(), that thread is blocked until there is some data to read, or the data is fully written. the thread can do nothing else in the meantime. java nio's non-blocking mode enables a thread to request reading data from a channel, and only get what is currently available, or nothing at all, if no data is currently available. rather than remain blocked until data becomes available for reading, the thread can go on with something else. the same is true for non-blocking writing. a thread can request that some data be written to a channel, but not wait for it to be fully written. the thread can then go on and do something else in the mean time. what threads spend their idle time on when not blocked in io calls, is usually performing io on other channels in the meantime. that is, a single thread can now manage multiple channels of input and output. selectors java nio's selectors allow a single thread to monitor multiple channels of input. you can register multiple channels with a selector, then use a single thread to "select" the channels that have input available for processing, or select the channels that are ready for writing. this selector mechanism makes it easy for a single thread to manage multiple channels. how nio and io influences application design whether you choose nio or io as your io toolkit may impact the following aspects of your application design: the api calls to the nio or io classes. the processing of data. the number of thread used to process the data. the api calls of course the api calls when using nio look different than when using io. this is no surprise. rather than just read the data byte for byte from e.g. an inputstream, the data must first be read into a buffer, and then be processed from there. the processing of data the processing of the data is also affected when using a pure nio design, vs. an io design. in an io design you read the data byte for byte from an inputstream or a reader. imagine you were processing a stream of line based textual data. for instance: name: anna age: 25 email: [email protected] phone: 1234567890 this stream of text lines could be processed like this: inputstream input = ... ; // get the inputstream from the client socket bufferedreader reader = new bufferedreader(new inputstreamreader(input)); string nameline = reader.readline(); string ageline = reader.readline(); string emailline = reader.readline(); string phoneline = reader.readline(); notice how the processing state is determined by how far the program has executed. in other words, once the first reader.readline() method returns, you know for sure that a full line of text has been read. the readline() blocks until a full line is read, that's why. you also know that this line contains the name. similarly, when the second readline() call returns, you know that this line contains the age etc. as you can see, the program progresses only when there is new data to read, and for each step you know what that data is. once the executing thread have progressed past reading a certain piece of data in the code, the thread is not going backwards in the data (mostly not). this principle is also illustrated in this diagram: java io: reading data from a blocking stream. a nio implementation would look different. here is a simplified example: bytebuffer buffer = bytebuffer.allocate(48); int bytesread = inchannel.read(buffer); notice the second line which reads bytes from the channel into the bytebuffer. when that method call returns you don't know if all the data you need is inside the buffer. all you know is that the buffer contains some bytes. this makes processing somewhat harder. imagine if, after the first read(buffer) call, that all what was read into the buffer was half a line. for instance, "name: an". can you process that data? not really. you need to wait until at leas a full line of data has been into the buffer, before it makes sense to process any of the data at all. so how do you know if the buffer contains enough data for it to make sense to be processed? well, you don't. the only way to find out, is to look at the data in the buffer. the result is, that you may have to inspect the data in the buffer several times before you know if all the data is inthere. this is both inefficient, and can become messy in terms of program design. for instance: bytebuffer buffer = bytebuffer.allocate(48); int bytesread = inchannel.read(buffer); while(! bufferfull(bytesread) ) { bytesread = inchannel.read(buffer); } the bufferfull() method has to keep track of how much data is read into the buffer, and return either true or false, depending on whether the buffer is full. in other words, if the buffer is ready for processing, it is considered full. the bufferfull() method scans through the buffer, but must leave the buffer in the same state as before the bufferfull() method was called. if not, the next data read into the buffer might not be read in at the correct location. this is not impossible, but it is yet another issue to watch out for. if the buffer is full, it can be processed. if it is not full, you might be able to partially process whatever data is there, if that makes sense in your particular case. in many cases it doesn't. the is-data-in-buffer-ready loop is illustrated in this diagram: java nio: reading data from a channel until all needed data is in buffer. summary nio allows you to manage multiple channels (network connections or files) using only a single (or few) threads, but the cost is that parsing the data might be somewhat more complicated than when reading data from a blocking stream. if you need to manage thousands of open connections simultanously, which each only send a little data, for instance a chat server, implementing the server in nio is probably an advantage. similarly, if you need to keep a lot of open connections to other computers, e.g. in a p2p network, using a single thread to manage all of your outbound connections might be an advantage. this one thread, multiple connections design is illustrated in this diagram: java nio: a single thread managing multiple connections. if you have fewer connections with very high bandwidth, sending a lot of data at a time, perhaps a classic io server implementation might be the best fit. this diagram illustrates a classic io server design: java io: a classic io server design - one connection handled by one thread. from http://tutorials.jenkov.com/java-nio/nio-vs-io.html
August 28, 2011
by Jakob Jenkov
· 133,865 Views · 19 Likes
article thumbnail
Developing Android Apps with NetBeans, Maven, and VirtualBox
I am an experienced Java developer who has used various IDEs and prefer NetBeans IDE over all others by a long shot. I am also very fond of Maven as the tool to simplify and automate nearly every aspect of the development of my Java project throughout its lifecycle. Recently, I started developing Android applications and naturally I looked for a Maven plugin that would manage my Android projects. Luckily I found the maven-android-plugin which worked like a charm and allowed me to use Maven for developing my Android projects. The Android Emulator from the Android SDK seemed unusably slow. Lucklily, I found a way to use an Android Virtual Machine for VirtualBox that worked nearly as fast as my native computer! This page documents my experiences. Tested Environment Dev machine: Ubuntu 11.04 Linux IDE: NetBeans VirtualBox: 4.0.8 r71778 Android SDK Revision 11, Add on XML Schema #1, Repository XML Schema #3 (from About in SDK and AVD Manager) Android Version: 2.2 Overview of Steps Download and install the Android SDK on your dev machine Attach an Android Device to dev machine Configure and load your device for development and other use Create an initial Android maven project Connect Android Device to Android SDK Debug Android app using NetBeans Graphical Debuger Download and Install Android SDK Download and install the Android SDK on your dev machine as described here. Make sure to set the following in dev machine ~/.bashrc file: export ANDROID_HOME=$HOME/android-sdk-linux_x86 #Change as needed export PATH="$ANDROID_HOME/tools:$ANDROID_HOME/platform-tools:$PATH" Attaching an Android Device to Dev Machine If you have an actual device that is usually always best. If not, you must use a virtual Android device which usually has various limitations (e.g. no GPS, Camera etc.). The Android SDK makes it easy to create a new Virtual Device but the resulting device is painfully slow in my experience and not usable. Do not bother with this. Instead, create a virtual Android device using VirtualBox as described in the following steps: Install virtual box and initial Android VM as described here: http://androidspin.com/2011/01/24/howto-install-android-x86-2-2-in-virtualbox/ http://geeknizer.com/how-to-run-google-android-in-virtualbox-vmware-on-netbooks/ Configure Android VM so it is connected bidirectionally with your dev machine over TCP as described here: http://stackoverflow.com/questions/61156/virtualbox-host-guest-network-setup I used the approach of configuring a HOST ONLY network adapater and a second NAT adapter on the Android VM within virtual box. Configuring your Android Device This section describes various things I did to setup a dev environment for my Android device: Root the device. I used Universal AndRoot Install ConnectBot so you have ssh and related network utilities Creating Initial Android Maven Application Create initial project using instructions here. I found it best to create stub project structure using the maven-archtype-plugin and the archtypes at https://github.com/akquinet/android-archetypes/wiki Connecting Android VM Device to Android SDK In order for your code to be deployed from NetBeans IDE to Android Device and in order for you to monitor your deployed app from the Dalvik Debug Monitor (ddms) you need to connect your android VM device to the android sdk over TCP as described in the following steps. On Android Device open the Terminal Emulator Type su to become root (your device must be rooted for this Type following commands in root shell: setprop service.adb.tcp.port 5555 stop adbd start adbd Type the following commands on dev machine shell. TODO: Note that IP address below is whatever is the ip address associated with the device (see ifconfig on linux for device vboxnet0) adb tcpip 5555 adb connect 192.168.0.101:5555 For details on above steps see: http://stackoverflow.com/questions/2604727/how-can-i-connect-to-android-with-adb-over-tcp Set up port forwarding as described here http://redkrieg.com/2010/10/11/adb-over-ssh-fun-with-port-forwards/ (this is where I am most fuzzy) Build your maven android project using Right-Click / Clean and Build Now for the acid test whether you can deploy your app to the device from NetBeans IDE! Right-click / Custom / Goal to show Run Maven dialog. Enter android:deploy in Goals field. Select Remember As button and enter android:deploy for its text field. If all is well, the app will deploy to the device and will show up in its "Applications" screen. Debugging Android App Using NetBeans Graphical Debugger Once you can build and deploy your app to the real or virtual Android device, here are the steps to debug the app using NetBeans debugger: On Device: Start the app (TODO: determine how to start app on device with JVM options so it can wait for debugger connection. This should be easy) On Dev Machine run Dalvik Debug Monitor (ddms) in background: $ANDROID_HOME/tools/ddms & Lookup your app in ddms and get its debug port. This is described here but does not address NetBeans specifically In NetBeans do: Debug / Attach Debugger and specify the port looked up in ddms in previous step. You may leave rest of the fields with defaults. Click OK
June 18, 2011
by Farrukh Najmi
· 173,472 Views
article thumbnail
Document Management in the iCloud
Apple’s new iCloud platform provides a centralized cloud storage solution for managing documents from any computer and iOS device. iCloud document storage is not just “cloud storage”. It provides a complete document management solution that keeps files synced between a user’s mobile and desktop devices, with a copy always available in “the cloud”. When any documents is created on a device using iCloud document storage it first gets stored in the application’s local sandbox and moved to a user’s iCloud account later. On its way to the iCloud, it is first moved out of the application’s local sandbox and into a local system-managed directory where it can be monitored by the iCloud service. After that transfer, the file is transferred to iCloud and to the user’s other devices when network conditions are optimal. The iCloud storage service requires files to be stored locally to prevent large numbers of conflicting changes from occurring at the same time. The iCloud storage service uses file coordinators to mediate changes between applications and the service that facilitates the transfer of the document to and from iCloud. The file coordinator acts like a locking mechanism for documents, preventing applications and the storage service from modifying the document simultaneously. Applications that store documents to iCloud specify one or more containers and sub-directories in which to store documents in a user’s iCloud account. Applications have control over naming and creating their own storage containers, but have to request access for a user’s iCloud Storage, allowing users to manage exactly what gets stored in the cloud. When storing documents using iCloud, applications don’t need to store the full URL to document. The iCloud storage service handles assigning a URL to the most recent document, with appropriate container and sub-directories as it transitions between devices and the iCloud. When an application queries a document it will always return the correct URL, location and version. In addition to file sync and storage users can also opt to have their applications and application settings backed up directly to their iCloud account, making it easier to restore applications to their most recent state on any new or existing iOS device. The iCloud Storage service handles end to end syncing and backing up for all user’s documents–handling local storage, version control, conflict resolution, transferring to the cloud, and between all their devices. Apple offers access to iCloud Storage services to iOS application developers via iCloud Storage APIs, so the same document management features are available across any application installed on Mac Desktop, Laptops, IPhone, IPod and IPad devices. Apple’s intent is to not just provide “cloud storage”, they want to provide a seamless experience for users across all applications and devices so they don’t have to think about where their music, video, images, and other documents are. It just happens.
June 12, 2011
by Kin Lane
· 7,538 Views
article thumbnail
Bootstrapping CDI in several environments
i feel like writing some posts about cdi (contexts and dependency injection). so this is the first one of a series of x posts ( 0 javax.enterprise cdi-api 1.0 provided an empty beans.xml will do to enable cdi you must have a beans.xml file in your project (under the meta-inf or web-inf). that’s because cdi needs to identify the beans in your classpath (this is called bean discovery) and build its internal metamodel. with the beans.xml file cdi knows it has beans to discover. so, for all the following examples i’ll make it simple and will leave this file completely empty. java ee 6 containers let’s start with the easiest possible environment : java ee 6 containers . why is it the simplest ? well, because you don’t have to do anything : cdi is part of java ee 6 as well as the web profile 1.0 so you don’t need to manually bootstrap it. let’s see how to inject a cdi bean within an ejb 3.1 and a servlet 3.0 . ejb 3.1 since ejb 3.1 you can use the ejbcontainer api to get an in-memory embedded ejb container and you can easily unit test your ejbs. so let’s write an ejb and a test class. first let’s have a look at the code of the ejb. as you can see, with version 3.1 an ejb is just a pojo : no inheritance, no interface, just one @stateless annotation. it gets a reference of the hello bean buy using the @inject annotation and uses it in the saysomething() method. @stateless public class mainejb31 { @inject hello hello; public string saysomething() { return hello.sayhelloworld(); } } you can now package the mainejb31, hello and world classes with the empty beans.xml file into a jar, deploy it to glassfish 3.x , and it will work. but if you don’t want to bother deploying it to glassfish and just unit test it, this is what you need to do : public class mainejbtest { private static ejbcontainer ec; private static context ctx; @beforeclass public static void initcontainer() throws exception { map properties = new hashmap(); properties.put(ejbcontainer.modules, new file("target/classes")); ec = ejbcontainer.createejbcontainer(properties); ctx = ec.getcontext(); } @afterclass public static void closecontainer() throws exception { if (ec != null) ec.close(); } @test public void shoulddisplayhelloworld() throws exception { // looks up the ejb mainejb31 mainejb = (mainejb31) ctx.lookup("java:global/classes/mainejb!org.antoniogoncalves.cdi.helloworld.mainejb"); assertequals("should say hello world !!!", "hello world !!!", mainejb.saysomething()); } } in the code above the method initcontainer() initializes the ejbcontainer. the shoulddisplayhelloworld() looks up the ejb (using the new portable jndi name ), invokes it and makes sure the saysomething() method returns hello world !!!. green test. that was pretty easy too. servlet 3.0 servlet 3.0 is part of java ee 6, so again, there is no needed configuration to bootstrap cdi. let’s use the new @webservlet annotation and write a very simple one that injects a reference of hello and displays an html page with hello world !!!. this is what the servlet looks like : @webservlet(urlpatterns = "/mainservlet") public class mainservlet30 extends httpservlet { @inject hello hello; @override protected void service(httpservletrequest req, httpservletresponse resp) throws servletexception, ioexception { resp.setcontenttype("text/html"); printwriter out = resp.getwriter(); out.println(""); out.println(""); out.println(""); out.println(saysomething()); out.println(""); out.println(""); out.close(); } public string saysomething() { return hello.sayhelloworld(); } } thanks to the @webservlet i don’t need any web.xml (it’s optional in servlet 3.0) to map the mainservlet30 to the /mainservlet url. you can now package the mainservlet30, hello and world classes with the empty beans.xml and no web.xml into a war, deploy it to glassfish 3.x , go to http://localhost:8080/bootstrapping-servlet30-1.0/mainservlet and it will work. unfortunately servlet 3.0 doesn’t have an api for the container (such as ejbcontainer). there is no servletcontainer api that would let you use an embedded servlet container in a standard way and, why not, easily unit test it. application client container not many people know it, but java ee (or even older j2ee versions) comes with an application client container (acc). it’s like an ejb or servlet container but for plain pojos. for example you can develop a swing application (yes, i’m sure that some of you still use swing), run it into the acc and get some extra services given by the container (security, naming, certain annotations…). glassfish v3 has an acc that you can launch in a command line : appclient -jar . so i thought, great, i can use cdi with acc the same way i use it within ejb or servlet container, no need to bootstrap anything, it’s all out of the box. i was wrong . as per the cdi specification (section 12.1), cdi is not required to support application client bean archives. so the glassfish application client container doesn’t support it. i haven’t tried the jboss acc , maybe it works. other containers the beauty of cdi is that it doesn’t require java ee 6 . you can use cdi with simple pojos in a java se environment, as well as some servlet 2.5 containers. of course it’s not as easy to bootstrap because you need a bit of configuration. but it then works fine (not always but). java se 6 ok, so until now there was nothing to do to bootstrap cdi. it is already bundled with the ejb 3.1 and servlet 3.0 containers of java ee 6 (and web profile). so the idea here is to use cdi in a simple java se environment. coming back to our hello and world classes, we need a pojo with an entry point that will bootstrap cdi so we can use injection to get those classes. in standard java se when we say entry point , we think of a public static void main(string[] args) method. well, we need something similar… but different. weld is the reference implementation of cdi. that means it implements the specification, the standard apis (mostly found in javax.inject and javax.enterprise.context packages) but also some proprietary code (in org.jboss.weld package). bootstrapping cdi in java se is not specified so you will need to use specific weld features. you can do that in two different flavors: by observing the containerinitialized event or using the programatic bootstrap api consisting of the weld and weldcontainer classes. the following code uses the containerinitialized event. as you can see, it uses the @observes annotation that i’ll explain in a future post. but the idea is that this class is listening to the event and processes the code once the event is triggered. import org.jboss.weld.environment.se.events.containerinitialized; import javax.enterprise.event.observes; import javax.inject.inject; public class mainjavase6 { @inject hello hello; public void saysomething(@observes containerinitialized event) { system.out.println(hello.sayhelloworld()); } } but who trigers the containerinitialized event ? well, it’s the org.jboss.weld.environment.se.startmain class. i’m using maven so a nice trick is to use the exec-maven-plugin to run the startmain class. download the code , have a look at the pom.xml and give it a try. the other possibility is to programmatically bootstrap the weld container. this can be handy in unit testing. the code below initializes the weld container (with new weld().initialize()) and then looks for the hello class (using weld.instance().select(hello.class).get()). import org.jboss.weld.environment.se.weld; import org.jboss.weld.environment.se.weldcontainer; import org.junit.beforeclass; import org.junit.test; import static junit.framework.assert.assertequals; public class hellotest { @test public void shoulddisplayhelloworld() { weldcontainer weld = new weld().initialize(); hello hello = weld.instance().select(hello.class).get(); assertequals("should say hello world !!!", "hello world !!!", hello.sayhelloworld()); } } execute the test with mvn test and it should be green. as you can see, there is a bit more work using cdi in a java se environment, but it’s not that complicated. tomcat 6.x ok, and what about your legacy servlet 2.5 containers ? the first one that comes in mind is tomcat 6.x ( note that tomcat 7.x will implement servlet 3.0 but is still in beta version at the time of writing this post ). weld provides support for tomcat but you need to configure it a bit to make cdi work. first of all, this is a servlet 2.5, not a 3.0. so the code of the servlet is slightly different from the one seen before (no annotation allowed) and of course, you need your good old web.xml file : public class mainservlet25 extends httpservlet { @inject hello hello; @override protected void service(httpservletrequest req, httpservletresponse resp) throws servletexception, ioexception { resp.setcontenttype("text/html"); printwriter out = resp.getwriter(); out.println(""); out.println(""); out.println(""); out.println(saysomething()); out.println(""); out.println(""); out.close(); } public string saysomething() { return hello.sayhelloworld(); } } because we don’t have a @webservlet annotation in servlet 2.5, we need to declare and map it in the web.xml (using the servlet and servlet-mapping tags). then, you need to explicitly specify the servlet listener to boot weld and control its interaction with requests (org.jboss.weld.environment.servlet.listener). tomcat has a read-only jndi, so weld can’t automatically bind the beanmanager extension spi. to bind the beanmanager into jndi, you should populate meta-inf/context.xml and make the beanmanager available to your deployment by adding it to your web.xml: mainservlet25 org.antoniogoncalves.cdi.bootstrapping.servlet.mainservlet25 mainservlet25 /mainservlet org.jboss.weld.environment.servlet.listener beanmanager javax.enterprise.inject.spi.beanmanager the meta-inf/context.xml file is an optional file which contains a context for a single tomcat web application. this can be used to define certain behaviours for your application, jndi resources and other settings. package all the files (mainservlet25, hello, world, meta-inf/context.xml, beans.xml and web.xml) into a war and deploy it into tomcat 6.x. go to http://localhost:8080/bootstrapping-servlet25-tomcat-1.0/mainservlet and you will see your hello world page. jetty 6.x another famous servlet 2.5 containers is jetty 6.x (at codehaus) and jetty 7.x ( note that jetty 8.x will implement servlet 3.0 but it’s still in experimental stage at the time of writing this post ). if you look at the weld documentation, there is actually support for jetty 6.x and 7.x . the code is the same one as tomcat (because it’s a servlet 2.5 container), but the configuration changes. with jetty you need to add two files under web-inf : jetty-env.xml and jetty-web.xml : beanmanager javax.enterprise.inject.spi.beanmanager org.jboss.weld.resources.managerobjectfactory true package all the files (mainservlet25, hello, world, web-inf/jetty-env.xml, web-inf/jetty-web.xml, beans.xml and web.xml) into a war and deploy it into jetty 6.x. go to http://localhost:8080/bootstrapping-servlet25-jetty6/mainservlet and you will see your hello world page. there was a mistake in the weld documentation so i couldn’t make it work. i started a thread on the weld forum and thanks to dan allen , pete muir and all the weld team, this was fixed and i managed to make it work. simple as posting an email to the forum . thanks for your help guys. spring 3.x here is the tricky part. spring 3.x implements the jsr 330 : dependency injection for java , which means that @inject works out of the box. but i didn’t find a way to integrate cdi with spring 3.x . the weld documentation mentions that because of its extension points, “ integration with third-party frameworks such as spring (…) was envisaged by the designers of cdi “. i did find this blog that simulates cdi features by enabling spring ones. what i didn’t find is a clear statement or roadmap on springsource about supporting cdi or not in future releases. the last trace of this topic is a comment on a long tss flaming thread . at that time (16 december 2009), juergen huller said “ with respect to implementing cdi on top of spring (…) trying to hammer it into the semantic frame of another framework such as cdi would be an exercise that is certainly achievable (…) but ultimately pointless “. but if you have any fresh news about it, let me know. conclusion as i said, this post is not about explaining cdi, i’ll do that in future posts. i just wanted to focus on how to bootstrap it in several environments so you can try by yourself. as you saw, it’s much simpler to use cdi within an ejb 3.1 or servlet 3.0 container in java ee 6. i’ve used glassfish 3.x but it should also work with other java ee 6 or web profile containers such as jboss 6 or resin . when you don’t use java ee 6, there is a bit more work to do. depending on your environment or servlet container you need some configuration to bootstrap weld. by the way, i’ve used weld because it’s the reference implementation, the one bunddled with glassfish and jboss. but you could also use openwebbeans , another cdi implementation. download the code , give it a try, and give me some feedback. from http://agoncal.wordpress.com/2011/01/12/bootstrapping-cdi-in-several-environments/
April 28, 2011
by Antonio Goncalves
· 31,390 Views
  • Previous
  • ...
  • 276
  • 277
  • 278
  • 279
  • 280
  • 281
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×