DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Tools Topics

article thumbnail
Taking Browser Screenshots With No Display (Selenium/Xvfb)
In my last two blog posts, I showed examples of using Selenium WebDriver to capture screenshots, and running in a headless (no X-server) mode. This example combines the two solutions to capture screenshots inside a virtual display. To achieve this, I use a combination of Selenium WebDriver and pyvirtualdisplay (which uses xvfb) to run a browser in a virtual display and capture screenshots. the setup you need is: Selenium 2 Python bindings: PyPI pyvirtualdisplay Python package (depends on xvfb): PyPI On Debian/Ubuntu Linux systems, you can install everything with: $ sudo apt-get install python-pip xvfb xserver-xephyr $ sudo pip install selenium once you have it setup, the following code example should work: #!/usr/bin/env python from pyvirtualdisplay import Display from selenium import webdriver display = Display(visible=0, size=(800, 600)) display.start() browser = webdriver.Firefox() browser.get('http://www.google.com') browser.save_screenshot('screenie.png') browser.quit() display.stop() this will: launch a virtual display launch Firefox browser inside the virtual display navigate to google.com capture and save a screenshot close the browser stop the virtual display
May 16, 2012
by Corey Goldberg
· 25,479 Views
article thumbnail
Managing and Monitoring Drupal Sites on Windows Azure
A few weeks ago, I co-authored an article (with my colleague Rama Ramani) about how the Screen Actors Guild Awards website migrated its Drupal deployment from LAMP to Windows Azure: Azure Real World: Migrating a Drupal Site from LAMP to Windows Azure. Since then, Rama and another colleague, Jason Roth, have been working on writing up how the SAG Awards website was managed and monitored in Windows Azure. The article below is the fruit of their work…a very interesting/educational read. Overview Drupal is an open source content management system that runs on PHP. Windows Azure offers a flexible platform for hosting, managing, and scaling Drupal deployments. This paper focuses on an approach to host Drupal sites on Windows Azure, based on learning from a BPD Customer Programs Design Win engagement with the Screen Actors Guild Awards Drupal website. This paper covers guidelines and best practices for managing an existing Drupal web site in Windows Azure. For more information on how to migrate Drupal applications to Windows Azure, see Azure Real World: Migrating a Drupal Site from LAMP to Windows Azure. The target audience for this paper is Drupal administrators who have some exposure to Windows Azure. More detailed pointers to Windows Azure content is provided throughout the paper as links. Drupal Application Architecture on Windows Azure Before reviewing the management and monitoring guidelines, it is important to understand the architecture of a typical Drupal deployment on Windows Azure. First, the following diagram displays the basic architecture of Drupal running on Windows and IIS7. In the Windows Server scenario, you could have one or more machines hosting the web site in a farm. Those machines would either persist the site content to the file system or point to other network shares. For Windows Azure, the basic architecture is the same, but there are some differences. In Windows Azure the site is hosted on a web role. A web role instance is hosted on a Windows Server 2008 virtual machine within the Windows Azure datacenter. Like the web farm, you can have multiple instances running the site. But there is no persistence guarantee for the data on the file system. Because of this, much of the shared site content should be stored in Windows Azure Blob storage. This allows them to be highly available and durable. Usually, a large portion of the site caters to static content which lends well to caching. And caching can be applied in a set of places – browser level caching, CDN to cache content in the edge closer to the browser clients, caching in Azure to reduce the load on backend, etc. Finally, the database can be located in SQL Azure. The following diagram shows these differences. For monitoring and management, we will look at Drupal on Windows Azure from three perspectives: Availability: Ensure the web site does not go down and that all tiers are setup correctly. Apply best practices to ensure that the site is deployed across data centers and perform backup operations regularly. Scalability: Correctly handle changes in user load. Understand the performance characteristics of the site. Manageability: Correctly handle updates. Make code and site changes with no downtime when possible. Although some management tasks span one or more of these categories, it is still helpful to discuss Drupal management on Windows Azure within these focus areas. Availability One main goal is that the Drupal site remains running and accessible to all end-users. This involves monitoring both the site and the SQL Azure database that the site depends on. In this section, we will briefly look at monitoring and backup tasks. Other crossover areas that affect availability will be discussed in the next section on scalability. Monitoring With any application, monitoring plays an important role with managing availability. Monitoring data can reveal whether users are successfully using the site or whether computing resources are meeting the demand. Other data reveals error counts and possibly points to issues in a specific tier of the deployment. There are several monitoring tools that can be used. The Windows Azure Management Portal. Windows Azure diagnostic data. Custom monitoring scripts. System Center Operations Manager. Third party tools such as Azure Diagnostics Manager and Azure Storage Explorer. The Windows Azure Management Portal can be used to ensure that your deployments are successful and running. You can also use the portal to manage features such as Remote Desktop so that you can directly connect to machines that are running the Drupal site. Windows Azure diagnostics allows you to collect performance counters and logs off of the web role instances that are running the Drupal site. Although there are many options for configuring diagnostics in Azure, the best solution with Drupal is to use a diagnostics configuration file. The following configuration file demonstrates some basic performance counters that can monitor resources such as memory, processor utilization, and network bandwidth. For more information about setting up diagnostic configuration files, see How to Use the Windows Azure Diagnostics Configuration File. This information is stored locally on each role instance and then transferred to Windows Azure storage per a defined schedule or on-demand. See Getting Started with Storing and Viewing Diagnostic Data in Windows Azure Storage. Various monitoring tools, such as Azure Diagnostics Manager, help you to more easily analyze diagnostic data. Monitoring the performance of the machines hosting the Drupal site is only part of the story. In order to plan properly for both availability and scalability, you should also monitor site traffic, including user load patterns and trends. Standard and custom diagnostic data could contribute to this, but there are also third-party tools that monitor web traffic. For example, if you know that spikes occur in your application during certain days of the week, you could make changes to the application to handle the additional load and increase the availability of the Drupal solution. Backup Tasks To remain highly available, it is important to backup your data as a defense-in-depth strategy for disaster recovery. This is true even though SQL Azure and Windows Azure Storage both implement redundancy to prevent data loss. One obvious reason is that these services cannot prevent administrator error if data is accidentally deleted or incorrectly changed. SQL Azure does not currently have a formal backup technology, although there are many third-party tools and solutions that provide this capability. Usually the database size for a Drupal site is relatively small. In the case of SAG Awards, it was only ~100-150 MB. So performing an entire backup using any strategy was relatively fast. If your database is much larger, you might have to test various backup strategies to find the one that works best. Apart from third-party SQL Azure backup solutions, there are several strategies for obtaining a backup of your data: · Use the Drush tool and the portabledb-export command. · Periodically copy the database using the CREATE DATABASE Transact-SQL command. · Use Data-tier applications (DAC) to assist with backup and restore of the database. SQL Azure backup and data security techniques are described in more detail in the topic, Business Continuity in SQL Azure. Note that bandwidth costs accrue with any backup operation that transfers information outside of the Windows Azure datacenter. To reduce costs, you can copy the database to a database within the same datacenter. Or you can export the data-tier applications to blob storage in the same datacenter. Another potential backup task involves the files in Blob storage. If you keep a master copy of all media files uploaded to Blob storage, then you already have an on-premises backup of those files. However, if multiple administrators are loading files into Blob storage for use on the Drupal site, it is a good idea to enumerate the storage account and to download any new files to a central location. The following PHP script demonstrates how this can be done by backing up all files in Blob storage after a specified modification date. setProxy(true, 'YOUR_PROXY_IF_NEEDED', 80); $blobs = (array)$blobObj->listBlobs(AZURE_STORAGE_CONTAINER, '', '', 35000); backupBlobs($blobs, $blobObj); function backupBlobs($blobs, $blobObj) { foreach ($blobs as $blob) { if (strtotime($blob->lastmodified) >= DEFAULT_BACKUP_FROM_DATE && strtotime($blob->lastmodified) <= DEFAULT_BACKUP_TO_DATE) { $path = pathinfo($blob->name); if ($path['basename'] != '$$$.$$$') { $dir = $path['dirname']; $oldDir = getcwd(); if (handleDirectory($dir)) { chdir($dir); $blobObj->getBlob( AZURE_STORAGE_CONTAINER, $blob->name, $path['basename'] ); chdir($oldDir); } } } } } function handleDirectory($dir) { if (!checkDirExists($dir)) { return mkdir($dir, 0755, true); } return true; } function checkDirExists($dir) { if(file_exists($dir) && is_dir($dir)) { return true; } return false; } ?> This script has a dependency on the Windows Azure SDK for PHP. Also note there are several parameters that you must modify such as the storage account, secret, and backup location. As with SQL Azure, bandwidth and transaction charges apply to a backup script like this. Scalability Drupal sites on Windows Azure can scale as load increased through typical strategies of scale-up, scale-out, and caching. The following sections describe the specifics of how these strategies are implemented in Windows Azure. Typically you make scalability decisions based on monitoring and capacity planning. Monitoring can be done in staging during testing or in production with real-time load. Capacity planning factors in projections for changes in user demand. Scale Up When you configure your web role prior to deployment, you have the option of specifying the Virtual Machine (VM) size, such as Small or ExtraLarge. Each size tier adds additional memory, processing power, and network bandwidth to each instance of your web role. For cost efficiency and smaller units of scale, you can test your application under expected load to find the smallest virtual machine size that meets your requirements. The workload usually in most popular Drupal websites can be separated out into a limited set of Drupal admins making content changes and a large user base who perform mostly read-only workload. End users can be allowed to make ‘writes’, such as uploading blogs or posting in forums, but those changes are not ‘content changes’. Drupal admins are setup to operate without caching so that the writes are made directly to SQL Azure or the corresponding backend database. This workload performs well with Large or ExtraLarge VM sizes. Also, note that the VM size is closely tied to all hardware resources, so if there are many content-rich pages that are streaming content, then the VM size requirements are higher. To make changes to the Virtual Machine size setting, you must change the vmsize attribute of the WebRole element in the service definition file, ServiceDefinition.csdef. A virtual machine size change requires existing applications to be redeployed. Scale Out In addition to the size of each web role instance, you can increase or decrease the number of instances that are running the Drupal site. This spreads the web requests across more servers, enabling the site to handle more users. To change the number of running instances of your web role, see How to Scale Applications by Increasing or Decreasing the Number of Role Instances. Note that some configuration changes can cause your existing web role instances to recycle. You can choose to handle this situation by applying the configuration change and continue running. This is done by handling the RoleEnvironment.Changing event. For more information see, How to Use the RoleEnvironment.Changing Event. A common question for any Windows Azure solution is whether there is some type of built-in automatic scaling. Windows Azure does not provide a service that provides auto-scaling. However, it is possible to create a custom solution that scales Azure services using the Service Management API. For an example of this approach, see An Auto-Scaling Module for PHP Applications in Windows Azure. Caching Caching is an important strategy for scaling Drupal applications on Windows Azure. One reason for this is that SQL Azure implements throttling mechanisms to regulate the load on any one database in the cloud. Code that uses SQL Azure should have robust error handling and retry logic to account for this. For more information, see Error Messages (SQL Azure Database). Because of the potential for load-related throttling as well as for general performance improvement, it is strongly recommended to use caching. Although Windows Azure provides a Caching service, this service does not currently have interoperability with PHP. Because of this, the best solution for caching in Drupal is to use a module that uses an open-source caching technology, such as Memcached. Outside of a specific Drupal module, you can also configure Memcached to work in PHP for Windows Azure. For more information, see Running Memcached on Windows Azure for PHP. Here is also an example of how to get Memcached working in Windows Azure using a plugin: Windows Azure Memcached plugin. In a future paper, we hope to cover this architecture in more detail. For now, here are several design and management considerations related to caching. Area Consideration Design and Implementation For a technology like Memcached, will the cache be collocated (spread across all web role instances)? Or will you attempt to setup a dedicated cache ring with worker roles that only run Memcached? Configuration What memory is required and how will items in the cache be invalidated? Performance and Monitoring What mechanisms will be used to detect the performance and overall health of the cache? For ease of use and cost savings, collocation of the cache across the web role instances of the Drupal site works best. However, this assumes that there is available reserve memory on each instance to apply toward caching. It is possible to increase the virtual machine size setting to increase the amount of available memory on each machine. It is also possible to add additional web role instances to add to the overall memory of the cache while at the same time improving the ability of the web site to respond to load. It is possible to create a dedicated cache cluster in the cloud, but the steps for this are beyond the scope of this paper[RR1] . For Windows Azure Blob storage, there is also a caching feature built into the service called the Content Delivery Network (CDN). CDN provides high-bandwidth access to files in Blob storage by caching copies of the files in edge nodes around the world. Even within a single geographic region, you could see performance improvements as there are many more edge nodes than Windows Azure datacenters. For more information, see Delivering High-Bandwidth Content with the Windows Azure CDN. Manageability It is important to note that each hosted service has a Staging environment and a Production environment. This can be used to manage deployments, because you can load and test and application in staging before performing a VIP swap with production. From a manageability standpoint, Drupal has an advantage on Windows Azure in the way that site content is stored. Because the data necessary to serve pages is stored in the database and blob storage, there is no need to redeploy the application to change the content of the site. Another best practice is to use a separate storage account for diagnostic data than the one that is used for the application itself. This can improve performance and also helps to separate the cost of diagnostic monitoring from the cost of the running application. As mentioned previously, there are several tools that can assist with managing Windows Azure applications. The following table summarizes a few of these choices. Tool Description Windows Azure Management Portal The web interface of the Windows Azure management portal shows deployments, instance counts and properties, and supports many different common management and monitoring tasks. Azure Diagnostics Managerq[RR2] [JR3] A Red Gate Software product that provides advanced monitoring and management of diagnostic data. This tool can be very useful for easily analyzing the performance of the Drupal site to determine appropriate scaling decisions. Azure Storage Explorer A tool created by Neudesic for viewing Windows Azure storage account. This can be useful for viewing both diagnostic data and the files in Blob storage.
April 25, 2012
by Brian Swan
· 8,742 Views
article thumbnail
Amazon EMR Tutorial: Running a Hadoop MapReduce Job Using Custom JAR
See original post at https://muhammadkhojaye.blogspot.com/2012/04/how-to-run-amazon-elastic-mapreduce-job.html Introduction Amazon EMR is a web service which can be used to easily and efficiently process enormous amounts of data. It uses a hosted Hadoop framework running on the web-scale infrastructure of Amazon EC2 and Amazon S3. Amazon EMR removes most of the cumbersome details of Hadoop while taking care of provisioning of Hadoop, running the job flow, terminating the job flow, moving the data between Amazon EC2 and Amazon S3, and optimizing Hadoop. In this tutorial, we will use a developed WordCount Java example using Hadoop and thereafter, we execute our program on Amazon Elastic MapReduce. Prerequisites You must have valid AWS account credentials. You should also have a general familiarity with using the Eclipse IDE before you begin. The reader can also use any other IDE of their choice. Step 1 – Develop MapReduce WordCount Java Program In this section, we are first going to develop a WordCount application. A WordCount program will determine how many times different words appear in a set of files. In Eclipse (or whatever the IDE you are using), Create simple Java Project with the name "WordCount". Create a java class name Map and override the map method as follow, public class Map extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Create a java class named Reduce and override the reduce method as shown below, public class Reduce extends Reducer { @Override protected void reduce(Text key, java.lang.Iterable values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } Create a java class named WordCount and defined the main method as below, public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } Export the WordCount program in a jar using eclipse and save it to some location on disk. Make sure that you have provided the Main Class (WordCount.jar) during extraction ofu8u the jar file as shown below. Our jar is ready!!! Step 2 – Upload the WordCount JAR and Input Files to Amazon S3 Now we are going to upload the WordCount jar to Amazon S3. First, go to the following URL: https://console.aws.amazon.com/s3/home Next, click “Create Bucket”, give your bucket a name, and click the “Create” button. Select your new S3 bucket in the left-hand pane. Upload the WordCount JAR and sample input file for counting the words. Step 3 – Running an Elastic MapReduce job Now that the JAR is uploaded into S3, all we need to do is to create a new Job flow. let's execute the steps below. (I encourage readers to check out the following link for details regarding each step, How to Create a Job Flow Using a Custom JAR ) Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/ Click Create New Job Flow. In the DEFINE JOB FLOW page, enter the following details, a) Job Flow Name = WordCountJob b) Select Run your own applications) Select Custom JAR in the drop-down list) Click Continue In the SPECIFY PARAMETERS page, enter values in the boxes using the following table as a guide, and then click Continue.JAR Location = bucketName/jarFileLocationJAR Arguments =s3n://bucketName/inputFileLocations3n://bucketName/outputpath Please note that the output path must be unique each time we execute the job. The Hadoop always create a folder with the same name specified here. After executing the job, just wait and monitor your job that runs through the Hadoop flow. You can also look for errors by using the Debug button. The job should be complete within 10 to 15 minutes (can also depend on the size of the input). After completing the job, You can view results in the S3 Browser panel. You can also download the files from S3 and can analyze the outcome of the job. Amazon Elastic MapReduce Resources Amazon Elastic MapReduce Documentation,http://aws.amazon.com/documentation/elasticmapreduce/ Amazon Elastic MapReduce Getting Started Guide,http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/ Amazon Elastic MapReduce Developer Guide,http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/ Apache Hadoop,http://hadoop.apache.org/ See more at https://muhammadkhojaye.blogspot.com/2012/04/how-to-run-amazon-elastic-mapreduce-job.html
April 23, 2012
by Muhammad Ali Khojaye
· 59,014 Views
article thumbnail
Face Detection using HTML5, Javascript, Webrtc, Websockets, Jetty and OpenCV
How to create a real-time face detection system using HTML5, JavaScript, and OpenCV, leveraging WebRTC for webcam access and WebSockets for client-server communication.
April 23, 2012
by Jos Dirksen
· 53,088 Views
article thumbnail
How to Use Sigma.js with Neo4j
i’ve done a few posts recently using d3.js and now i want to show you how to use two other great javascript libraries to visualize your graphs. we’ll start with sigma.js and soon i’ll do another post with three.js . we’re going to create our graph and group our nodes into five clusters. you’ll notice later on that we’re going to give our clustered nodes colors using rgb values so we’ll be able to see them move around until they find their right place in our layout. we’ll be using two sigma.js plugins, the gefx (graph exchange xml format) parser and the forceatlas2 layout. you can see what a gefx file looks like below. notice it comes from gephi which is an interactive visualization and exploration platform, which runs on all major operating systems, is open source, and is free. ... ... in order to build this file, we will need to get the nodes and edges from the graph and create an xml file. get '/graph.xml' do @nodes = nodes @edges = edges builder :graph end we’ll use cypher to get our nodes and edges: def nodes neo = neography::rest.new cypher_query = " start node = node:nodes_index(type='user')" cypher_query << " return id(node), node" neo.execute_query(cypher_query)["data"].collect{|n| {"id" => n[0]}.merge(n[1]["data"])} end we need the node and relationship ids, so notice i’m using the id() function in both cases. def edges neo = neography::rest.new cypher_query = " start source = node:nodes_index(type='user')" cypher_query << " match source -[rel]-> target" cypher_query << " return id(rel), id(source), id(target)" neo.execute_query(cypher_query)["data"].collect{|n| {"id" => n[0], "source" => n[1], "target" => n[2]} } end so far we have seen graphs represented as json, and we’ve built these manually. today we’ll take advantage of the builder ruby gem to build our graph in xml. xml.instruct! :xml xml.gexf 'xmlns' => "http://www.gephi.org/gexf", 'xmlns:viz' => "http://www.gephi.org/gexf/viz" do xml.graph 'defaultedgetype' => "directed", 'idtype' => "string", 'type' => "static" do xml.nodes :count => @nodes.size do @nodes.each do |n| xml.node :id => n["id"], :label => n["name"] do xml.tag!("viz:size", :value => n["size"]) xml.tag!("viz:color", :b => n["b"], :g => n["g"], :r => n["r"]) xml.tag!("viz:position", :x => n["x"], :y => n["y"]) end end end xml.edges :count => @edges.size do @edges.each do |e| xml.edge:id => e["id"], :source => e["source"], :target => e["target"] end end end end you can get the code on github as usual and see it running live on heroku. you will want to see it live on heroku so you can see the nodes in random positions and then move to form clusters. use your mouse wheel to zoom in, and click and drag to move around. credit goes out to alexis jacomy and mathieu jacomy . you’ve seen me create numerous random graphs, but for completeness here is the code for this graph. notice how i create 5 clusters and for each node i assign half its relationships to other nodes in their cluster and half to random nodes? this is so the forceatlas2 layout plugin clusters our nodes neatly. def create_graph neo = neography::rest.new graph_exists = neo.get_node_properties(1) return if graph_exists && graph_exists['name'] names = 500.times.collect{|x| generate_text} clusters = 5.times.collect{|x| {:r => rand(256), :g => rand(256), :b => rand(256)} } commands = [] names.each_index do |n| cluster = clusters[n % clusters.size] commands << [:create_node, {:name => names[n], :size => 5.0 + rand(20.0), :r => cluster[:r], :g => cluster[:g], :b => cluster[:b], :x => rand(600) - 300, :y => rand(150) - 150 }] end names.each_index do |from| commands << [:add_node_to_index, "nodes_index", "type", "user", "{#{from}"] connected = [] # create clustered relationships members = 20.times.collect{|x| x * 10 + (from % clusters.size)} members.delete(from) rels = 3 rels.times do |x| to = members[x] connected << to commands << [:create_relationship, "follows", "{#{from}", "{#{to}"] unless to == from end # create random relationships rels = 3 rels.times do |x| to = rand(names.size) commands << [:create_relationship, "follows", "{#{from}", "{#{to}"] unless (to == from) || connected.include?(to) end end batch_result = neo.batch *commands end
April 12, 2012
by Max De Marzi
· 15,373 Views
article thumbnail
Filtering the Stack Trace From Hell
I love stack traces. Not because I love errors, but the moment they occur, stack trace is priceless source of information. For instance in web application the stack trace shows you the complete request processing path, from HTTP socket, through filters, servlets, controllers, services, DAOs, etc. - up to the place, where an error occurred. You can read them as a good book, where every event has cause and effect. I even implemented some enhancements in the way Logback prints exceptions, see Logging exceptions root cause first. But one thing's been bothering me for a while. The infamous “stack trace from hell" symptom – stack traces containing hundreds of irrelevant, cryptic, often auto-generated methods. AOP frameworks and over-engineered libraries tend to produce insanely long execution traces. Let me show a real-life example. In a sample application I am using the following technology stack: Colours are important. According to framework/layer colour I painted a sample stack trace, caused by exception thrown somewhere deep while trying to fetch data from the database: No longer that pleasant, don't you think? Placing Spring between application and Hibernate in the first diagram was a huge oversimplification. Spring framework is a glue code that wires up and intercepts your business logic with surrounding layers. That is why application code is scattered and interleaved by dozens of lines of technical invocations (see green lines). I put as much stuff as I could into the application (Spring AOP, method-level @Secured annotations, custom aspects and interceptors, etc.) to emphasize the problem – but it is not Spring specific. EJB servers generate equally terrible stack traces (...from hell) between EJB calls. Should I care? Think about it, when you innocently call BookService.listBooks() from BookController.listBooks() do you expect to see this? at com.blogspot.nurkiewicz.BookService.listBooks() at com.blogspot.nurkiewicz.BookService$$FastClassByCGLIB$$e7645040.invoke() at net.sf.cglib.proxy.MethodProxy.invoke() at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint() at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed() at com.blogspot.nurkiewicz.LoggingAspect.logging() at sun.reflect.NativeMethodAccessorImpl.invoke0() at sun.reflect.NativeMethodAccessorImpl.invoke() at sun.reflect.DelegatingMethodAccessorImpl.invoke() at java.lang.reflect.Method.invoke() at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs() at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod() at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke() at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() at org.springframework.aop.interceptor.AbstractTraceInterceptor.invoke() at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() at org.springframework.transaction.interceptor.TransactionInterceptor.invoke() at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke() at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed() at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept() at com.blogspot.nurkiewicz.BookService$$EnhancerByCGLIB$$7cb147e4.listBooks() at com.blogspot.nurkiewicz.web.BookController.listBooks() And have you even noticed there is custom aspect in between? That's the thing, there is so much noise in the stack traces nowadays that following the actual business logic is virtually impossible. One of the best troubleshooting tools we have is bloated with irrelevant framework-related stuff we don't need in 99% of the cases. Tools and IDEs are doing a good job of reducing the noise. Eclipse has stack trace filter patterns for Junit, IntelliJ IDEA supports console folding customization. See also: Cleaning noise out of Java stack traces, which inspired me to write this article. So why not having such possibility at the very root – in the logging framework such as Logback? I implemented a very simple enhancement in Logback. Basically you can define a set of stack trace frame patterns that are suppose to be excluded from stack traces. Typically you will use package or class names that you are not interested in seeing. This is a sample logback.xml excerpt with the new feature enabled: %d{HH:mm:ss.SSS} | %-5level | %thread | %logger{1} | %m%n%rEx{full, java.lang.reflect.Method, org.apache.catalina, org.springframework.aop, org.springframework.security, org.springframework.transaction, org.springframework.web, sun.reflect, net.sf.cglib, ByCGLIB } I am a bit extreme in filtering almost whole Spring framework + Java reflection and CGLIB classes. But it is just to give you an impression how much can you get. The very same error after applying my enhancement to Logback: Just as a reminder, green is our application. Finally in one place, finally you can really see what was your code doing when an error occurred: at com.blogspot.nurkiewicz.DefaultBookHelper.findBooks() at com.blogspot.nurkiewicz.BookService.listBooks() at com.blogspot.nurkiewicz.LoggingAspect.logging() at com.blogspot.nurkiewicz.web.BookController.listBooks() Simpler? If you like this feature, I opened a ticket LBCLASSIC-325: Filtering out selected stack trace frames. Vote and discuss. This is only a proof-of-concept, but if you like to have a look at the implementation (improvements are welcome!), it is available under my fork of Logback (around 20 lines of code).
March 20, 2012
by Tomasz Nurkiewicz
· 65,858 Views · 4 Likes
article thumbnail
Hadoop Basics—Creating a MapReduce Program
The Map Reduce Framework works in two main phases to process the data, which are the "map" phase and the "reduce" phase.
March 18, 2012
by Carlo Scarioni
· 212,702 Views · 4 Likes
article thumbnail
Intellij vs. Eclipse: Why IDEA is Better
The one major difference between IDEA and Eclipse is that IDEA "feels context", which effectively makes IDEA intelligent.
March 15, 2012
by Andrei Solntsev
· 667,351 Views · 26 Likes
article thumbnail
Why You Need a Git Pre-Commit Hook and Why Most Are Wrong
a pre-commit hook is a piece of code that runs before every commit and determines whether or not the commit should be accepted. think of it as the gatekeeper to your codebase. want to ensure you didn’t accidentally leave any pdb s in your code? pre-commit hook. want to make sure your javascript is jshint approved? pre-commit hook. want to guarantee clean, readable pep8 -compliant code? pre-commit hook. want to pipe all of the comments in your codebase through strunk & white ? please don’t. the pre-commit hook is just an executable file that runs before every commit. if it exits with zero status, the commit is accepted. if it exits with a non-zero status, the commit is rejected. (note: a pre-commit hook can be bypassed by passing the --no-verify argument.) along with the pre-commit hook there are numerous other git hooks that are available: post-commit, post-merge, pre-receive, and others that can be found here . why most pre-commit hooks are wrong be wary of the above’s example as the majority of pre-commit hooks you’ll see on the web are wrong. most test against whatever files are currently on disk, not what is in the staging area (the files actually being committed). we avoid this in our hook by stashing all changes that are not part of the staging area before running our checks and then popping the changes afterwards. this is very important because a file could be fine on disk while the changes that are being committed are wrong. the code below is the pre-commit hook we use at yipit. our hook is simply a set of checks to be run against any files that have been modified in this commit. each check can be configured to include/exclude particular types of files. it is designed for a django environment, but should be adaptable to other environments with minor changes. note that you need git 1.7.7+ #!/usr/bin/env python import os import re import subprocess import sys modified = re.compile('^(?:m|a)(\s+)(?p.*)') checks = [ { 'output': 'checking for pdbs...', 'command': 'grep -n "import pdb" %s', 'ignore_files': ['.*pre-commit'], 'print_filename': true, }, { 'output': 'checking for ipdbs...', 'command': 'grep -n "import ipdb" %s', 'ignore_files': ['.*pre-commit'], 'print_filename': true, }, { 'output': 'checking for print statements...', 'command': 'grep -n print %s', 'match_files': ['.*\.py$'], 'ignore_files': ['.*migrations.*', '.*management/commands.*', '.*manage.py', '.*/scripts/.*'], 'print_filename': true, }, { 'output': 'checking for console.log()...', 'command': 'grep -n console.log %s', 'match_files': ['.*yipit/.*\.js$'], 'print_filename': true, }, { 'output': 'checking for debugger...', 'command': 'grep -n debugger %s', 'match_files': ['.*\.js$'], 'print_filename': true, }, { 'output': 'running jshint...', # by default, jshint prints 'lint free!' upon success. we want to filter this out. 'command': 'jshint %s | grep -v "lint free!"', 'match_files': ['.*yipit/.*\.js$'], 'print_filename': false, }, { 'output': 'running pyflakes...', 'command': 'pyflakes %s', 'match_files': ['.*\.py$'], 'ignore_files': ['.*settings/.*', '.*manage.py', '.*migrations.*', '.*/terrain/.*'], 'print_filename': false, }, { 'output': 'running pep8...', 'command': 'pep8 -r --ignore=e501,w293 %s', 'match_files': ['.*\.py$'], 'ignore_files': ['.*migrations.*'], 'print_filename': false, }, { 'output': 'checking for sass changes...', 'command': 'sass --quiet --update %s', 'match_files': ['.*\.scss$'], 'print_filename': true, }, ] def matches_file(file_name, match_files): return any(re.compile(match_file).match(file_name) for match_file in match_files) def check_files(files, check): result = 0 print check['output'] for file_name in files: if not 'match_files' in check or matches_file(file_name, check['match_files']): if not 'ignore_files' in check or not matches_file(file_name, check['ignore_files']): process = subprocess.popen(check['command'] % file_name, stdout=subprocess.pipe, stderr=subprocess.pipe, shell=true) out, err = process.communicate() if out or err: if check['print_filename']: prefix = '\t%s:' % file_name else: prefix = '\t' output_lines = ['%s%s' % (prefix, line) for line in out.splitlines()] print '\n'.join(output_lines) if err: print err result = 1 return result def main(all_files): # stash any changes to the working tree that are not going to be committed subprocess.call(['git', 'stash', '-u', '--keep-index'], stdout=subprocess.pipe) files = [] if all_files: for root, dirs, file_names in os.walk('.'): for file_name in file_names: files.append(os.path.join(root, file_name)) else: p = subprocess.popen(['git', 'status', '--porcelain'], stdout=subprocess.pipe) out, err = p.communicate() for line in out.splitlines(): match = modified.match(line) if match: files.append(match.group('name')) result = 0 print 'running django code validator...' return_code = subprocess.call('$virtual_env/bin/python manage.py validate', shell=true) result = return_code or result for check in checks: result = check_files(files, check) or result # unstash changes to the working tree that we had stashed subprocess.call(['git', 'reset', '--hard'], stdout=subprocess.pipe, stderr=subprocess.pipe) subprocess.call(['git', 'stash', 'pop', '-q'], stdout=subprocess.pipe, stderr=subprocess.pipe) sys.exit(result) if __name__ == '__main__': all_files = false if len(sys.argv) > 1 and sys.argv[1] == '--all-files': all_files = true main(all_files) to use this hook or a hook that you create yourself, simply copy the file to .git/hooks/pre-commit inside of your project and make sure that it is executable or add in to your git repo and setup a symlink.
March 3, 2012
by Steve Pulec
· 23,474 Views · 1 Like
article thumbnail
Creating a build pipeline using Maven, Jenkins, Subversion and Nexus.
for a while now, we had been operating in the wild west when it comes to building our applications and deploying to production. builds were typically done straight from the developer’s ide and manually deployed to one of our app servers. we had a manual process in place, where the developer would do the following steps. check all project code into subversion and tag build the application. archive the application binary to a network drive deploy to production update our deployment wiki with the date and version number of the app that was just deployed. the problem is that there were occasionally times where one of these steps were missed, and it always seemed to be at a time when we needed to either rollback to the previous version, or branch from the tag to do a bugfix. sometimes the previous version had not been archived to the network, or the developer forgot to tag svn. we were already using jenkins to perform automated builds, so we wanted to look at extending it further to perform release builds. the maven release plug-in provides a good starting point for creating an automated release process. we have also just started using the nexus maven repository and wanted to incorporate that as well to archive our binaries to, rather than archiving them to a network drive. the first step is to set up the project’s pom file with the deploy plugin as well as include configuration information about our nexus and subversion repositories. org.apache.maven.plugins maven-release-plugin 2.2.2 http://mks:8080/svn/jrepo/tags/frameworks/siestaframework the release plugin configuration is pretty straightforward. the configuration takes the subversion url of the location where the tags will reside for this project. the next step is to configure the svn location where the code will be checked out from. scm:svn:http://mks:8080/svn/jrepo/trunk/frameworks/siestaframework http://mks:8080/svn the last step in configuring the project is to set up the location where the binaries will be archived to. in our case, the nexus repository. lynden-java-release lynden release repository http://cisunwk:8081/nexus/content/repositories/lynden-java-release the project is now ready to use the maven release plug-in. the release plugin provides a number of useful goals. release:clean – cleans the workspace in the event the last release process was not successful. release: prepare – performs a number of operations checks to make sure that there are no uncommitted changes. ensures that there are no snapshot dependencies in the pom file, changes the version of the application and removes snapshot from the version. ie 1.0.3-snapshot becomes 1.0.3 run project tests against modified poms commit the modified pom tag the code in subersion increment the version number and append snapshot. ie 1.0.3 becomes 1.0.4-snapshot commit modified pom release: perform – performs the release process checks out the code using the previously defined tag runs the deploy maven goal to move the resulting binary to the repository. putting it all together the last step in this process is to configure jenkins to allow release builds on-demand, meaning we want the user to have to explicitly kick off a release build for this process to take place. we have download and installed the release jenkins plug-in in order to allow developers to kick off release builds from jenkins. the release plug-in will execute tasks after the normal build has finished. below is a screenshot of the configuration of one of our projects. the release build option for the project is enabled by selecting the “configure release build” option in the “build environment” section. the maven release plug-in is activated by adding the goals to the “after successful release build” section. (the –b option enables batch mode so that the release plug-in will not ask the user for input, but use defaults instead.) once the release option has been configured for a project there will be a “release” icon on the left navigation menu for the project. selecting this will kick off a build and then the maven release process, assuming the build succeeds. finally a look at svn and nexus verifies that the build for version 1.0.4 of the siesta-framework project has been tagged in svn and uploaded to nexus. the next steps for this project will be to generate release notes for release builds, and also to automate a deployment pipeline, so that developers can deploy to our test, staging and production servers via jenkins rather than manually from their development workstations. twitter: @robterp blog: http://rterp.wordpress.com
February 29, 2012
by Rob Terpilowski
· 86,811 Views
article thumbnail
How to Write Vim Plugins with Python
Originally Authored by Dejan Noveski I'm not going to dive into how good or extendible Vim is. If you are reading this article, you probably know that. The thing that makes Vim so good, is the scripting environment behind it called VimL. Using this scripting language, you can write any functionality/plugin you need for Vim. Each plugin you use is written in this language. Here's the best part. You only need very little knowledge of VimL to be able to write plugins, if you know Python (or Ruby). What's a vim plugin anyway A Vim plugin is a .vim script that defines functions, mappings, syntax rules, commands that may, or may not, manipulate the windows, buffers, lines. It is a complete piece of code with some specific functionality. Usually, a plugin consists of several functions mappings command definitions and event hooks. When writing vim plugins with Python, often, everything outside the functions is written in VimL. But those are vim commands and they can be learned fast. In fact, VimL can be learned fast, but using python gives so much flexibility. Think about using urllib/httplib/simplejson for accessing some web service that helps editing in Vim. This is why most of the plugins that work with web services are usually done in VimL+Python. Any prerequisites? You must have vim compiled with +python support. You can check that using the command: vim --version | grep +python Vim package in Ubuntu and it's derivatives comes with +python support. To Work - Vimmit.vim What's better than starting with a simple example? This is a plugin that, when called, will retrieve the homepage of Reddit and will display it in the current buffer. Start by opening "vimmit.vim" file (in vim). Since we are writing python code, its good to check if Vim supports Python: if !has('python') echo "Error: Required vim compiled with +python" finish endif This piece is writen in VimL. It's best if we stick to VimL for things like this, mappings and event hooks. This function will check if Vim has python support or it will end the script with an error message. We continue with the main function Reddit(). This is where we use Python and do the main functionality: " Vim comments start with a double quote. " Function definition is VimL. We can mix VimL and Python in " function definition. function! Reddit() " We start the python code like the next line. python << EOF # the vim module contains everything we need to interface with vim from # python. We need urllib2 for the web service consumer. import vim, urllib2 # we need json for parsing the response import json # we define a timeout that we'll use in the API call. We don't want # users to wait much. TIMEOUT = 20 URL = "http://reddit.com/.json" try: # Get the posts and parse the json response response = urllib2.urlopen(URL, None, TIMEOUT).read() json_response = json.loads(response) posts = json_response.get("data", "").get("children", "") # vim.current.buffer is the current buffer. It's list-like object. # each line is an item in the list. We can loop through them delete # them, alter them etc. # Here we delete all lines in the current buffer del vim.current.buffer[:] # Here we append some lines above. Aesthetics. vim.current.buffer[0] = 80*"-" for post in posts: # In the next few lines, we get the post details post_data = post.get("data", {}) up = post_data.get("ups", 0) down = post_data.get("downs", 0) title = post_data.get("title", "NO TITLE").encode("utf-8") score = post_data.get("score", 0) permalink = post_data.get("permalink").encode("utf-8") url = post_data.get("url").encode("utf-8") comments = post_data.get("num_comments") # And here we append line by line to the buffer. # First the upvotes vim.current.buffer.append("↑ %s"%up) # Then the title and the url vim.current.buffer.append(" %s [%s]"%(title, url,)) # Then the downvotes and number of comments vim.current.buffer.append("↓ %s | comments: %s [%s]"%(down, comments, permalink,)) # And last we append some "-" for visual appeal. vim.current.buffer.append(80*"-") except Exception, e: print e EOF " Here the python code is closed. We can continue writing VimL or python again. endfunction Save the file, source it in vim (:source vimmit.vim) and: :call Reddit() Now, the way we call the function is not so elegant. So we define a command: command! -nargs=0 Reddit call Reddit() We define the command :Reddit to call the function. After adding this, open a new bufer and do :Reddit . Home page will be loaded in the buffer. The -nargs argument states how many arguments the command will take. Function Arguments, Eval and Command Q: How does one access functional arguments? function! SomeName(arg1, arg2, arg3) " Get the first argument by name in VimL let firstarg=a:arg1 " Get the second argument by position in Viml let secondarg=a:1 " Get the arguments in python python << EOF import vim first_argument = vim.eval("a:arg1") #or vim.eval("a:0") second_argument = vim.eval("a:arg2") #or vim.eval("a:1") You can define a function with arbitrary number of arguments by putting "..." instead of argument names. You can access these arguments only by position, and you can mix them with named arguments (arg1, arg2, ...) Q: How can I call Vim commands from Python? vim.command("[vim-command-here]") Q: How to define global variables and access them in VimL and Python? Global vars are prefixed with g:. If you want to define one in your script, best thing to do is check if it exists and if doesn't define it and assign some default value to it: if !exists("g:reddit_apicall_timeout") let g:reddit_apicall_timeout=40 endif You can access it from python using the vim module: TIMEOUT = vim.eval("g:reddit_apicall_timeout") If you want to override this setting, you can write: let g:reddit_apicall_timeout=60 in .vimrc . Additional Notes VimL is pretty easy once you try it. Remember that print works and everything you can do with python, you can do in here. Here you can find the documentation for the vim python module. Vimdoc is the possibly the only resource you will need when writing vim plugins. You can also check this IBM developerWorks article . Now, try to extend "vimmit.vim" so the user is able to choose a subreddit (as a first functional argument). Source: http://brainacle.com/how-to-write-vim-plugins-with-python.html
February 23, 2012
by Chris Smith
· 17,104 Views · 2 Likes
article thumbnail
Installing Puppet on Oracle Linux: Avoid the Pitfalls
Oracle Linux builds have a list of public yum repositories on the Oracle website, and they don't come configured with the builds, so if you're trying to install Puppet, you'll want to avoid this pitfall, along with a few others. We’ve been spending some time trying to setup our developer environment on a Oracle Linux 5.7 build and one of the first steps was to install Puppet as we’ve already created scripts which automate the installation of most things. Unfortunately Oracle Linux builds don’t come with any yum repos configured so when you run the following command… ls -alh /etc/yum.repos.d/ …you don’t see anything We eventually realised that there are a list of public yum repositories on the Oracle website, of which we needed to download the definition for Oracle Linux 5 like so: cd /etc/yum.repos.d wget http://public-yum.oracle.com/public-yum-el5.repo We then need to edit that file to enable the appropriate repository. In this case we want to enable ol5_u7_base: [ol5_u7_base] name=Oracle Linux $releasever - U7 - $basearch - base baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL5/7/base/$basearch/ gpgkey=http://public-yum.oracle.com/RPM-GPG-KEY-oracle-el5 gpgcheck=1 enabled=1 I made the mistake of enabling ol5_u5_base which led to us getting some really weird problems whereby yum got confused as to which version of libselinux we had installed and was therefore unable to install libselinux-ruby as its dependencies weren’t being properly satisfied. Calling ‘yum list installed’ suggested that we had libselinux 1.33.4.5-7 installed but if we ran ‘yum install libselinux’ then it suggested we already had 1.33.4.5-5 installed. Very confusing! After trying to uninstall and downgrade libselinux and pretty much destroying the installation in the process, another colleague spotted my mistake. We also found that we had to add the epel repo which gave us access to some other packages that we needed: rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm After all that was done we were able to run the command to install puppet: yum install puppet That installs puppet 2.6.12 as that’s the latest version in that repo. The latest stable version is 2.7.9 but I think we’ll need to hook up a puppet specific repo to get that working. Source: http://www.markhneedham.com/blog/2012/01/18/installing-puppet-on-oracle-linux
January 18, 2012
by Mark Needham
· 8,274 Views
article thumbnail
HowTo: Build a VNC Client for the Browser
VNC is just a special case of client-server, though perhaps an especially cool one. Quite a few rising web technologies do robust client-server work extra well (Node.js, WebSockets, etc.) -- and in-browser VNC is nothing new. Here are two (open-source, of course): noVNC is more ambitiously HTML5-duplexed, using WebSockets as well as Canvas. It's quite popular, and has its own 10-page Github wiki. Also supports wss:// encryption. Use this if you want a reliable, battle-tested HTML5 client. (WebSocket fallback is provided by web-socket-js.) vnc.js was written in 24 hours, during LinkedIn's first public Intern Hackday. So of course it hasn't been tested thoroughly, and probably could be written a little more cleanly. But there's something beautifully coherent about an app written in a single session. If the app really does work, then some of the decisions will make a little more sense -- it's possible to get into the developer's mind a little more easily -- and breaking down the code doesn't result in as many 'why did they do this??' moments, because the developers' minds were never far from any part of the project, at any moment during development. vnc.js doesn't use WebSockets (it uses Socket.io instead), but that's fine -- a little less HTML5, a little more slick JavaScript doesn't hurt anyone. Plus the marathoning hackers behind vnc.js put together a sweet little tutorial detailing the decisions made that 24-hour period, emphasizing the rapid thought-process behind the architecture (in clear diagrams), and a very practical abstraction for easier in-browser work with TCP (using Node.js and Socket.io) and RFB. Both packages are worth checking out; the hacking tutorial is a fun read for any web developer interested in coding a VNC client, or even just sophisticated with with different network protocols in the browser.
December 30, 2011
by John Esposito
· 19,801 Views
article thumbnail
How to deploy a neo4j instance in Amazon EC2 in 10 minutes
Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database. In this post I will explain how to deploy a neo4j instance in Amazon EC2 web service. For this tutorial to take you no more than 10 minutes you should be able to execute properly some bash commands like mv, tar, ssh and scp (secure copy). I also assume that you have an account in Amazon Web Services and you are familiar to the process of launching instances. If not, I strongly recommend you to follow this starting guide and complete it till you manage to connect to your instance with ssh. Start downloading the latest stable version of neo4j. Which you can find here. The “Community Edition” fits well for development purposes. Do not forget to select the Unix version of the server. This will download a tar.gz file which you will copy to your EC2 instance later. While you download the neo4j server open the AWS Management Console and launch a Basic 32-bit Amazon Linux AMI. If you want to launch an Ubuntu AMI please notice that it doesn’t ship with Java, which is required for running neo4j. If you are not familiar with key pairs, pem files or security groups I insist you to follow the EC2 starting guide I mentioned above. You can either create a new security group or use the default, but you will need to configure a new security rule for the neo4j server port. After launching the instance, create a TCP rule on port 7474 with source 0.0.0.0/0. Here you are opening port 7474 for anyone. If you are planning to use the neo4j REST API and remotely call it from another server, for example a Rails application hosted in Heroku, for security reasons, you may want to change the source field to the address of your Heroku server. Do not forget to open port 22 (SSH), this is typically the first rule normal people create after launching an instance. You are almost done! You should now install neo4j in your instance. Open a terminal in your localhost and navigate to the path where you downloaded neo4j. Copy the file to your Amazon instance by using the scp command: scp -i your_pem_file.pem neo4j-community-1.6.M01-unix.tar.gz ec2-user@YOUR_PUBLIC_INSTANCE_DNS:/home/ec2-user Please notice that you will need to change the path to your pem file, typically placed in ~/.ssh, the filename of the neo4j server you just downloaded and the plublic DNS of your instance. Now connect to your instance with SSH: ssh -i your_pem_file.pem ec2-user@YOUR_PUBLIC_INSTANCE_DNS Untar the neo4j server: tar xvfz neo4j-community-1.6.M01-unix.tar.gz.tar.gz Move it to /usr/local and rename the folder to neo4j: sudo mv neo4j-community-1.6.M01 /usr/local/neo4j Almost done!!! You should now open neo4j-server.properties under the conf directory and add the following line: org.neo4j.server.webserver.address=0.0.0.0 This lines allows anyone to connect remotely to your neo4j database server. Now run the start script. From the neo4j server folder. sudo ./bin/neo4j start Finally, open a browser and access the webadmin interface of your neo4j database by typing http://YOUR_PUBLIC_INSTANCE_DNS:7474. You should see the Neo4j Monitoring and Management Tool, pretty cool! If not, ask me You can now try using the REST API and the curl bash command to insert nodes and relationships. I hope this post helped you, good luck! Follow me on Twitter @negarnil Source: http://www.cloudtmp.com/java/how-to-deploy-a-neo4j-instance-in-amazon-ec2-in-10-minutes/
December 27, 2011
by Nicolas Garnil
· 27,397 Views · 1 Like
article thumbnail
HTML5 Canvas + WebSockets = Multiplayer Space Shooter In Browser
Recently I ran across Rawkets, a slick site taking two emerging web technologies -- HTML5 Canvas and WebSockets -- and combining them in the most obvious way possible: a multiplayer space shooter. Why Canvas? No plugins -- graphical Yes; and why WebSockets? Low latency -- multiplayer Yes. Sadly, every time I join the game, nobody else is there. If I wanted single-player HTML5 gaming, I could check out another project by Rawkets' creator, Rob Hawkes: straight-up Asteroids, using the HTML5 game engine Impact. But WebSockets won't help Asteroids, because Asteroids runs totally on just one client. Rawkets, on the other hand, has multiple clients running Canvas, with their own JavaScript, connecting via WebSockets, all taking through Node.js on the server, producing something like this: I can't tell whether the game is any fun, because I've never seen anyone else in there. (Also, it doesn't seem to work in Chrome). But as a tech demo it's a cool idea, and conceptually straightforward enough to inspire. (If you're impressed, Rob also links from the game site to to his HTML5 Canvas book -- though apparently the book assumes virtually no knowledge of Canvas or JavaScript, and doesn't progress all that far.) Check it out, and maybe shoot someone else's ship down -- fairly fairly, of course, because WebSockets will keep multiplex channels persistently open...
December 26, 2011
by John Esposito
· 10,335 Views
article thumbnail
MySQL vs. Neo4j on a Large-Scale Graph Traversal
this post presents an analysis of mysql (a relational database) and neo4j (a graph database) in a side-by-side comparison on a simple graph traversal. the data set that was used was an artificially generated graph with natural statistics. the graph has 1 million vertices and 4 million edges. the degree distribution of this graph on a log-log plot is provided below. a visualization of a 1,000 vertex subset of the graph is diagrammed above. loading the graph the graph data set was loaded both into mysql and neo4j. in mysql a single table was used with the following schema. create table graph ( outv int not null, inv int not null ); create index outv_index using btree on graph (outv); create index inv_index using btree on graph (inv); after loading the data, the table appears as below. the first line reads: “vertex 0 is connected to vertex 1.” mysql> select * from graph limit 10; +------+-----+ | outv | inv | +------+-----+ | 0 | 1 | | 0 | 2 | | 0 | 6 | | 0 | 7 | | 0 | 8 | | 0 | 9 | | 0 | 10 | | 0 | 12 | | 0 | 19 | | 0 | 25 | +------+-----+ 10 rows in set (0.04 sec) the 1 million vertex graph data set was also loaded into neo4j. in gremlin , the graph edges appear as below. the first line reads: “vertex 0 is connected to vertex 992915.” gremlin> g.e[1..10] ==>e[183][0-related->992915] ==>e[182][0-related->952836] ==>e[181][0-related->910150] ==>e[180][0-related->897901] ==>e[179][0-related->871349] ==>e[178][0-related->857804] ==>e[177][0-related->798969] ==>e[176][0-related->773168] ==>e[175][0-related->725516] ==>e[174][0-related->700292] warming up the caches before traversing the graph data structure in both mysql and neo4j, each database had a “ warm up ” procedure run on it. in mysql, a “select * from graph” was evaluated and all of the results were iterated through. in neo4j, every vertex in the graph was iterated through and the outgoing edges of each vertex were retrieved. finally, for both mysql and neo4j, the experiment discussed next was run twice in a row and the results of the second run were evaluated. traversing the graph the traversal that was evaluated on each database started from some root vertex and emanated n-steps out. there was no sorting, no distinct-ing, etc. the only two variables for the experiments are the length of the traversal and the root vertex to start the traversal from. in mysql, the following 5 queries denote traversals of length 1 through 5. note that the “?” is a variable parameter of the query that denotes the root vertex. select a.inv from graph as a where a.outv=? select b.inv from graph as a, graph as b where a.inv=b.outv and a.outv=? select c.inv from graph as a, graph as b, graph as c where a.inv=b.outv and b.inv=c.outv and a.outv=? select d.inv from graph as a, graph as b, graph as c, graph as d where a.inv=b.outv and b.inv=c.outv and c.inv=d.outv and a.outv=? select e.inv from graph as a, graph as b, graph as c, graph as d, graph as e where a.inv=b.outv and b.inv=c.outv and c.inv=d.outv and d.inv=e.outv and a.outv=? for neo4j, the blueprints pipes framework was used. a pipe of length n was constructed using the following static method. public static pipeline createpipeline(final integer steps) { final arraylist pipes = new arraylist(); for (int i = 0; i < steps; i++) { pipe pipe1 = new vertexedgepipe(vertexedgepipe.step.out_edges); pipe pipe2 = new edgevertexpipe(edgevertexpipe.step.in_vertex); pipes.add(pipe1); pipes.add(pipe2); } return new pipeline(pipes); } for both mysql and neo4j, the results of the query (sql and pipes) were iterated through. thus, all results were retrieved for each query. in mysql, this was done as follows. while (resultset.next()) { resultset.getint(finalcolumn); } in neo4j, this is done as follows. while (pipeline.hasnext()) { pipeline.next(); } experimental results the artificial graph dataset was constructed with a “ rich get richer “, preferential attachment model . thus, the vertices created earlier are the most dense (i.e. highest number of adjacent vertices). this property was used to limit the amount of time it would take to evaluate the tests for each traversal. only the first 250 vertices were used as roots of the traversals. before presenting timing results, note that all of these experiments were run on a macbook pro with a 2.66ghz intel core 2 duo and 4gigs of ram at 1067 mhz ddr3. the packages used were java 1.6, mysql jdbc 5.0.8, and blueprints pipes 0.1.2. java version "1.6.0_17" java(tm) se runtime environment (build 1.6.0_17-b04-248-10m3025) java hotspot(tm) 64-bit server vm (build 14.3-b01-101, mixed mode) the following java virtual machine parameters were used: -xmx1000m -xms500m below are the total running times for both mysql (red) and neo4j (blue) for traversals of length 1, 2, 3, and 4. the raw data is presented below along with the total number of vertices returned by each traversal—which, of course, is the same for both mysql and neo4j given that its the same graph data set being processed. also realize that traversals can loop and thus, many of the same vertices are returned multiple times. finally, note that only neo4j has the running time for a traversal of length 5. mysql did not finish after waiting 2 hours to complete. in comparison, neo4j took 14.37 minutes to complete a 5 step traversal. [mysql steps-1] time(ms):124 -- vertices_returned:11360 [mysql steps-2] time(ms):922 -- vertices_returned:162640 [mysql steps-3] time(ms):8851 -- vertices_returned:2206437 [mysql steps-4] time(ms):112930 -- vertices_returned:28125623 [mysql steps-5] n/a [neo4j steps-1] time(ms):27 -- vertices_returned:11360 [neo4j steps-2] time(ms):474 -- vertices_returned:162640 [neo4j steps-3] time(ms):3366 -- vertices_returned:2206437 [neo4j steps-4] time(ms):49312 -- vertices_returned:28125623 [neo4j steps-5] time(ms):862399 -- vertices_returned:358765631 next, the individual data points for both mysql and neo4j are presented in the plot below. each point denotes how long it took to return n number of vertices for the varying traversal lengths. finally, the data below provides the number of vertices returned per millisecond (on average) for each of the traversals. again, mysql did not finish in its 2 hour limit for a traversal of length 5. [mysql steps-1] vertices/ms:91.6128847554668 [mysql steps-2] vertices/ms:176.399127537985 [mysql steps-3] vertices/ms:249.286746556076 [mysql steps-4] vertices/ms:249.053599519823 [mysql steps-5] n/a [neo4j steps-1] vertices/ms:420.740351166341 [neo4j steps-2] vertices/ms:343.122344772028 [neo4j steps-3] vertices/ms:655.507125256186 [neo4j steps-4] vertices/ms:570.360621871775 [neo4j steps-5] vertices/ms:416.00886711325 conclusion in conclusion, given a traversal of an artificial graph with natural statistics, the graph database neo4j is more optimal than the relational database mysql. however, no attempts have been made to optimize the java vm, the sql queries, etc. these experiments were run with both neo4j and mysql “out of the box” and with a “natural syntax” for both types of queries. source: http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/
December 5, 2011
by Marko Rodriguez
· 58,361 Views · 1 Like
article thumbnail
Freight Management System on NetBeans
Lynden is a family of transportation and logistics companies specialized in shipping to Alaska and other locations worldwide. Over land, on the water, in the air - or in any combination - Lynden has been helping customers solve transportation problems for over a century. The Lynden Freight Management System is a NetBeans Platform application which serves a dual purpose as both a planning and freight tracking tool. The Planning module allows terminal managers to see all freight that is currently inbound to their location as well as freight that is scheduled to depart from their location so they can make the most efficient use of their dock space and resources as possible. The Trace module allows customer service personnel to search for customer account information, view the tracking history of any given freight item in the system as well as display any documents related to the shipment, such as bills of lading or delivery receipts. NetBeans Platform Lynden has benefited from the NetBeans Platform as it allows developers to focus on the business logic of our applications rather than the underlying "plumbing". We are able to leverage built-in support for event handling, enable/disable functionality on UI controls, dockable windows, and automatic updates for our application with minimal work compared to rolling our own framework. We chose to go the desktop application route as we have a number of existing desktop applications here that this application will likely need to interface with at some point, as well as a commercial set of rich UI components that we have been using for some time now. For the initial deployment, we will be pushing the installer out to employee PCs via the Landesk remote desktop administration tool. Future updates to various modules within the application will be done via the update center functionality built into the NetBeans Platform. Screenshots
November 19, 2011
by Rob Terpilowski
· 11,744 Views · 3 Likes
article thumbnail
RDF data in Neo4J - the Tinkerpop story
My previous blog post discussed the use of Neo4J as a RDF triple store. Michael Hunger however informed me that the neo-rdf-sail component is no longer under active development and advised me to have a look at Tinkerpop’s Sail implementation. As mentioned in my previous blog post, I recently got asked to implement a storage and querying platform for biological RDF (Resource Description Framework) data. Traditional RDF stores are not really an option as my solution should also provide the ability to calculate shortest paths between random subjects. Calculating shortest path is however one of the strong selling points of Graph Databases and more specifically Neo4J. Unfortunately, the neo-rdf-sail component, which suits my requirements perfectly, is no longer under active development. Tinkerpop’s Sail implementation however, fills the void with an even better alternative! 1. What is Tinkerpop? Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the Blueprints framework. Blueprints can be considered as the JDBC of Graph Databases. By providing a collection of generic interfaces, it allows to develop graph-based applications, without introducing explicit dependencies on concrete Graph Database implementations. Additionally, Blueprints provides concrete bindings for the Neo4J, OrientDB and Dex Graph Databases. On top of Blueprints, the Tinkerpop team developed an entire range of graph technologies, including Gremlin, a powerful, domain-specific language designed for traversing graphs. Hence, once a Blueprints binding is available for a particular Graph Database, an entire range of technologies can be leveraged. 2. Tinkerpop and Sail Last time, I talked about exposing a Neo4J Graph Database (containing RDF triples) through the Sail interface, which is part of the openrdf.org project. By doing so, we can reuse an entire range of RDF utilities (parsers and query evaluators) that are part of the openrdf.org project. The Blueprints framework provides us with a similar ability: each Graph Database binding that implements the Tinkerpop TransactionalGraph and IndexableGraph interfaces can be exposed as a GraphSail, which is Tinkerpop’s implementation of the Sail interface. Once you have your Sail available, storing and querying RDF is analogous to the piece of code shown in my previous blog article. // Create the sail graph database graph = new MyNeo4jGraph("var/flights", 100000); graph.setTransactionMode(TransactionalGraph.Mode.MANUAL); sail = new GraphSail(graph); // Initialize the sail store sail.initialize(); // Get the sail repository connection connection = new SailRepository(sail).getConnection(); // Import the data connection.add(getResource("sneeair.rdf"), null, RDFFormat.RDFXML); // Execute SPARQL query TupleQuery durationquery = connection.prepareTupleQuery(QueryLanguage.SPARQL, "PREFIX io: " + "PREFIX fl: " + "SELECT ?number ?departure ?destination " + "WHERE { " + "?flight io:flight ?number . " + "?flight fl:flightFromCityName ?departure . " + "?flight fl:flightToCityName ?destination . " + "?flight io:duration \"1:35\" . " + "}"); TupleQueryResult result = durationquery.evaluate(); The two first lines of code require some more clarification. A TransactionalGraph can be run in MANUAL or AUTOMATIC transaction mode. In AUTOMATIC mode, transactions are basically ignored, in the sense that each item that gets created is immediately persisted in the underlying Graph Database. Although this fits my needs, AUTOMATIC mode is extremely slow in case of Neo4J because of the continuous IO access. MANUAL mode on the other hand is very fast; a new transaction is created at the moment the import of the RDF data file starts and is only committed to the Neo4J data store once all RDF triples are parsed and created. Unfortunately, MANUAL mode does not scale either in my specific situation; as some of my RDF data files contain over 50 million RDF triples, they can not fit into memory (i.e. Java heap space error). Requiring fast imports, I extended the default Neo4J Blueprints binding to support intermediate commits. I based my implementation on Neo4J’s best practices for big transactions. The idea is rather simple: you specify the maximum number of items that can be kept in memory, before they should be committed to the Neo4J data store. Once this number is reached, the current transaction is committed and a new one is automatically started. Simple, but very effective! public class MyNeo4jGraph extends Neo4jGraph { private long numberOfItems = 0; private long maxNumberOfItems = 1; public MyNeo4jGraph(final String directory, long maxNumberOfItems) { super(directory, null); this.maxNumberOfItems = maxNumberOfItems; } public MyNeo4jGraph(final String directory, final Map configuration, long maxNumberOfItems) { super(directory, configuration); this.maxNumberOfItems = maxNumberOfItems; } public Vertex addVertex(final Object id) { Vertex vertex = super.addVertex(id); commitIfRequired(); return vertex; } public Edge addEdge(final Object id, final Vertex outVertex, final Vertex inVertex, final String label) { Edge edge = super.addEdge(id, outVertex, inVertex, label); commitIfRequired(); return edge; } private void commitIfRequired() { // Check whether commit should be executed if (++numberOfItems % maxNumberOfItems == 0) { // Stop the transaction stopTransaction(Conclusion.SUCCESS); // Immediately start a new one startTransaction(); } } } 3. Shortest path calculation Although Blueprints allows you to abstract away the Neo4J implementation details, it still provides you with access to the raw Neo4J data store if needed. Hence, one can still use the graph algorithms provided in the neo4j-graph-algo component to calculate shortest paths between random subjects. The complete source code can be found on the Datablend public GitHub repository.
October 24, 2011
by Davy Suvee
· 25,274 Views
article thumbnail
Using a Java Servlet Filter to intercept the response HTTP status code with NetBeans IDE 7 and Maven
Version 2.3 of the Java servlet spec introduced the concept of filters. According to the documentation from Oracle’s site: “A filter dynamically intercepts requests and responses to transform or use the information contained in the requests or responses”. Today I’ll show you how to build a simple filter to intercept the response HTTP response code using annotations introduced in the Servlet 3.0 specification. With NetBeans IDE 7 create a new Maven Java Web Application called: Intercept Delete the index.jsp file under the Web Pages folder. Right-click on the project and add a new servlet called: MainServlet Since we are using the new Servlet 3 annotations we don’t need to set a whole lot of properties. Maven generates a decent MainServlet.java file for us, I just removed the comments for the output. My file looks like this: package com.giantflyingsaucer.intercept; import java.io.IOException; import java.io.PrintWriter; import javax.servlet.ServletException; import javax.servlet.annotation.WebServlet; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; @WebServlet(name = "MainServlet", urlPatterns = {"/"}) public class MainServlet extends HttpServlet { protected void processRequest(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=UTF-8"); PrintWriter out = response.getWriter(); try { out.println(""); out.println(""); out.println(""); out.println(""); out.println(""); out.println("Servlet MainServlet"); out.println(""); out.println(""); } finally { out.close(); } } // /** * Handles the HTTP GET method. * @param request servlet request * @param response servlet response * @throws ServletException if a servlet-specific error occurs * @throws IOException if an I/O error occurs */ @Override protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { processRequest(request, response); } /** * Handles the HTTP POST method. * @param request servlet request * @param response servlet response * @throws ServletException if a servlet-specific error occurs * @throws IOException if an I/O error occurs */ @Override protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { processRequest(request, response); } /** * Returns a short description of the servlet. * @return a String containing servlet description */ @Override public String getServletInfo() { return "Short description"; }// } Right-click on the project and add a Filter called: InterceptFilter We will add the following two lines to the doFilter method. HttpServletResponse hsr = (HttpServletResponse) response; System.out.println("HTTP Status: " + hsr.getStatus()); My doFilter method looks like this: @Override public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { if (debug) { log("InterceptFilter:doFilter()"); } doBeforeProcessing(request, response); HttpServletResponse hsr = (HttpServletResponse) response; System.out.println("HTTP Status: " + hsr.getStatus()); Throwable problem = null; try { chain.doFilter(request, response); } catch (Throwable t) { problem = t; t.printStackTrace(); } doAfterProcessing(request, response); if (problem != null) { if (problem instanceof ServletException) { throw (ServletException) problem; } if (problem instanceof IOException) { throw (IOException) problem; } sendProcessingError(problem, response); } } Clean and Build the project and deploy it to Apache Tomcat. Access the URL with a browser and take a look at your catalina.out file and you should see the HTTP response code. Note: You shouldn’t need to do any changes to the web.xml file for this project to work. From http://www.giantflyingsaucer.com/blog/?p=3279
October 23, 2011
by Chad Lung
· 43,899 Views
article thumbnail
Handling PHP Sessions in Windows Azure
One of the challenges in building a distributed web application is in handling sessions. When you have multiple instances of an application running and session data is written to local files (as is the default behavior for the session handling functions in PHP) a user session can be lost when a session is started on one instance but subsequent requests are directed (via a load balancer) to other instances. To successfully manage sessions across multiple instances, you need a common data store. In this post I’ll show you how the Windows Azure SDK for PHP makes this easy by storing session data in Windows Azure Table storage. In the 4.0 release of the Windows Azure SDK for PHP, session handling via Windows Azure Table and Blob storage was included in the newly added SessionHandler class. Note: The SessionHandler class supports storing session data in Table storage or Blob storage. I will focus on using Table storage in this post largely because I haven’t been able to come up with a scenario in which using Blob storage would be better (or even necessary). If you have ideas about how/why Blob storage would be better, I’d love to hear them. The SessionHandler class makes it possible to write code for handling sessions in the same way you always have, but the session data is stored on a Windows Azure Table instead of local files. To accomplish this, precede your usual session handling code with these lines: require_once 'Microsoft/WindowsAzure/Storage/Table.php'; require_once 'Microsoft/WindowsAzure/SessionHandler.php'; $storageClient = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', 'your storage account name', 'your storage account key'); $sessionHandler = new Microsoft_WindowsAzure_SessionHandler($storageClient , 'sessionstable'); $sessionHandler->register(); Now you can call session_start() and other session functions as you normally would. Nicely, it just works. Really, that’s all there is to using the SessionHandler, but I found it interesting to take a look at how it works. The first interesting thing to note is that the register method is simply calling the session_set_save_handler function to essentially map the session handling functionality to custom functions. Here’s what the method looks like from the source code: public function register() { return session_set_save_handler(array($this, 'open'), array($this, 'close'), array($this, 'read'), array($this, 'write'), array($this, 'destroy'), array($this, 'gc') ); } The reading, writing, and deleting of session data is only slightly more complicated. When writing session data, the key-value pairs that make up the data are first serialized and then base64 encoded. The serialization of the data allows for lots of flexibility in the data you want to store (i.e. you don’t have to worry about matching some schema in the data store). When storing data in a table, each entry must have a partition key and row key that uniquely identify it. The partition key is a string (“sessions” by default, but this is changeable in the class constructor) and the the row key is the session ID. (For more information about the structure of Tables, see this post.) Finally, the data is either updated (it it already exists in the Table) or a new entry is inserted. Here’s a portion of the write function: $serializedData = base64_encode(serialize($serializedData)); $sessionRecord = new Microsoft_WindowsAzure_Storage_DynamicTableEntity($this->_sessionContainerPartition, $id); $sessionRecord->sessionExpires = time(); $sessionRecord->serializedData = $serializedData; try { $this->_storage->updateEntity($this->_sessionContainer, $sessionRecord); } catch (Microsoft_WindowsAzure_Exception $unknownRecord) { $this->_storage->insertEntity($this->_sessionContainer, $sessionRecord); } Not surprisingly, when session data is read from the table, it is retrieved by session ID, base64 decoded, and unserialized. Again, here’s a snippet that show’s what is happening: $sessionRecord = $this->_storage->retrieveEntityById( $this->_sessionContainer, $this->_sessionContainerPartition, $id ); return unserialize(base64_decode($sessionRecord->serializedData)); As you can see, the SessionHandler class makes good use of the storage APIs in the SDK. To learn more about the SessionHandler class (and the storage APIs), check out the documentation on Codeplex. You can, of course, get the complete source code here: http://phpazure.codeplex.com/SourceControl/list/changesets. As I investigated the session handling in the Windows Azure SDK for PHP, I noticed that the absence of support for SQL Azure as a session store was conspicuous. I’m curious about how many people would prefer to use SQL Azure over Azure Tables as a session store. If you have an opinion on this, please let me know in the comments.
October 19, 2011
by Brian Swan
· 7,873 Views
  • Previous
  • ...
  • 306
  • 307
  • 308
  • 309
  • 310
  • 311
  • 312
  • 313
  • 314
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×