Open Source Resources for Engineers

The Latest Open Source Topics

Bixo Labs shows how to use Solr as a NoSQL solution for big data Many people use the Hadoop open source project to process large data sets because it’s a great solution for scalable, reliable data processing workflows. Hadoop is by far the most popular system for handling big data, with companies using massive clusters to store and process petabytes of data on thousands of servers. Since it emerged from the Nutch open source web crawler project in 2006, Hadoop has grown in every way imaginable – users, developers, associated projects (aka the “Hadoop ecosystem”). Starting at roughly the same time, the Solr open source project has become the most widely used search solution on planet Earth. Solr wraps the API-level indexing and search functionality of Lucene with a RESTful API, GUI, and lots of useful administrative and data integration functionality. The interesting thing about combining these two open source projects is that you can use Hadoop to crunch the data, and then serve it up in Solr. And we’re not talking about just free-text search; Solr can be used as a key-value store (i.e. a NoSQL database) via its support for range queries. Even on a single server, Solr can easily handle many millions of records (“documents” in Lucene lingo). Even better, Solr now supports sharding and replication via the new, cutting-edge SolrCloud functionality. Background I started using Hadoop & Solr about five years ago, as key pieces of the Krugle code search startup I co-founded in 2005. Back then, Hadoop was still part of the Nutch web crawler we used to extract information about open source projects. And Solr was fresh out of the oven, having just been released as open source by CNET. At Bixo Labs we use Hadoop, Solr, Cascading, Mahout, and many other open source technologies to create custom data processing workflows. The web is a common source of our input data, which we crawl using the Bixo open source project. The Problem During a web crawl, the state of the crawl is contained in something commonly called a “crawl DB”. For broad crawls, this has to be something that works with billions of records, since you need one entry for each known URL. Each “record” has the URL as the key, and contains important state information such as the time and result of the last request. For Hadoop-based crawlers such as Nutch and Bixo, the crawl DB is commonly kept in a set of flat files, where each file is a Hadoop “SequenceFile”. These are just a packed array of serialized key/value objects. Sometimes we need to poke at this data, and here’s where the simple flat-file structure creates a problem. There’s no easy way run queries against the data, but we can’t store it in a traditional database since billions of records + RDBMS == pain and suffering. Here is where scalable NoSQL solutions shine. For example, the Nutch project is currently re-factoring this crawl DB layer to allow plugging in HBase. Other options include Cassandra, MongoDB, CouchDB, etc. But for simple analytics and exploration on smaller datasets, a Solr-based solution works and is easier to configure. Plus you get useful and surprising fun functionality like facets, geospatial queries, range queries, free-form text search, and lots of other goodies for free. Architecture So what exactly would such a Hadoop + Solr system look like? As mentioned previously, in this example our input data comes from a Bixo web crawler’s CrawlDB, with one entry for each known URL. But the input data could just as easily be log files, or records from a traditional RDBMS, or the output of another data processing workflow. The key point is that we’re going to take a bunch of input data, (optionally) munge it into a more useful format, and then generate a Lucene index that we access via Solr. Hadoop For the uninitiated, Hadoop implements both a distributed file system (aka “HDFS”) and an execution layer that supports the map-reduce programming model. Typically data is loaded and transformed during the map phase, and then combined/saved during the reduce phase. In our example, the map phase reads in Hadoop compressed SequenceFiles that contain the state of our web crawl, and our reduce phase write out Lucene indexes. The focus of this article isn’t on how to write Hadoop map-reduce jobs, but I did want to show you the code that implements the guts of the job. Note that it’s not typical Hadoop key/value manipulation code, which is painful to write, debug, and maintain. Instead we use Cascading, which is an open source workflow planning and data processing API that creates Hadoop jobs from shorter, more representative code. The snippet below reads SequenceFiles from HDFS, and pipes those records into a sink (output) that stores them using a LuceneScheme, which in turn saves records as Lucene documents in an index. Tap source = new Hfs(new SequenceFile(CRAWLDB_FIELDS), inputDir); Pipe urlPipe = new Pipe("crawldb urls"); urlPipe = new Each(urlPipe, new ExtractDomain()); Tap sink = new Hfs(new LuceneScheme(SOLR_FIELDS, STORE_SETTINGS, INDEX_SETTINGS, StandardAnalyzer.class, MAX_FIELD_LENGTH), outputDir, true); FlowConnector fc = new FlowConnector(); fc.connect(source, sink, urlPipe).complete(); We defined CRAWLDB_FIELDS and SOLR_FIELDS to be the set of input and output data elements, using names like “url” and “status”. We take advantage of the Lucene Scheme that we’ve created for Cascading, which lets us easily map from Cascading’s view of the world (records with fields) to Lucene’s index (documents with fields). We don’t have a Cascading Scheme that directly supports Solr (wouldn’t that be handy?), but we can make-do for now since we can do simple analysis for this example. We indexed all of the fields so that we can perform queries against them. Only the status message contains normal English text, so that’s the only one we have to analyze (i.e., break the text up into terms using spaces and other token delimiters). In addition, the ExtractDomain operation pulls the domain from the URL field and builds a new Solr field containing just the domain. This will allow us to do queries against the domain of the URL as well as the complete URL. We could also have chosen to apply a custom analyzer to the URL to break it into several pieces (i.e., protocol, domain, port, path, query parameters) that could have been queried individually. Running the Hadoop Job For simplicity and pay-as-you-go, it’s hard to beat Amazon’s EC2 and Elastic Mapreduce offerings for running Hadoop jobs. You can easily spin up a cluster of 50 servers, run your job, save the results, and shut it down – all without needing to buy hardware or pay for IT support. There are many ways to create and configure a Hadoop cluster; for us, we’re very familiar with the (modified) EC2 Hadoop scripts that you can find in the Bixo distribution. Step-by-step instructions are available at http://openbixo.org/documentation/running-bixo-in-ec2/ The code for this article is available via GitHub, at http://github.com/bixolabs/hadoop2solr. The README displayed on that page contains step-by-step instructions for building and running the job. After the job is done, we’ll copy the resulting index out of the Hadoop distributed file system (HDFS) and onto the Hadoop cluster’s master server, then kill off the one slave we used. The Hadoop master is now ready to be configured as our Solr server. Solr On the Solr side of things, we need to create a schema that matches the index we’re generating. The key section of our schema.xml file is where we define the fields. These fields have a one-to-one correspondence with the SOLR_FIELDS we defined in our Hadoop workflow. They also need to use the same Lucene settings as what we defined in the static IndexWorkflow.java STORE_SETTINGS and INDEX_SETTINGS. Once we have this defined, all that’s left is to set up a server that we can use. To keep it simple, we’ll use the single EC2 instance in Amazon’s cloud (m1.large) that we used as our master for the Hadoop job, and run the simple Solr search server that relies on embedded Jetty to provide the webapp container. Similar to the Hadoop job, step-by-step instructions are in the README for the hadoop2solr project on GitHub. But in a nutshell, we’ll copy and unzip a Solr 1.4.1 setup on the EC2 server, do the same for our custom Solr configuration, create a symlink to the index, and then start it running with: Giving it a Try Now comes the interesting part. Since we opened up the default Jetty port used by Solr (8983) on this EC2 instance, we can directly access Solr’s handy admin console by pointing our browser at http://:8983/solr/admin % cd solr % java -Dsolr.solr.home=../solr-conf -Dsolr.data.dir=../solr-data -jar start.jar From here we can run queries against Solr: We can also use curl to talk to the server via HTTP requests: curl http://:8983/solr/select/?q=-status%3AFETCHED+and+-status%3AUNFETCHED The response is XML by default. Below is an example of the response from the above request, where we found 2,546 matches in 94ms. Now here’s what I find amazing. For an index of 82 million documents, running on a fairly wimpy box (EC2 m1.large = 2 virtual cores), the typical response time for a simple query like “status:FETCHED” is only 400 milliseconds, to find 9M documents. Even a complex query such as (status not FETCHED and not UNFETCHED) only takes six seconds. Scaling Obviously we could use beefier boxes. If we switched to something like m1.xlarge (15GB of memory, 4 virtual cores) then it’s likely we could handle upwards of 200M “records” in our Solr index and still get reasonable response times. If we wanted to scale beyond a single box, there are a number of solutions. Even out of the box Solr supports sharding, where your HTTP request can specify multiple servers to use in parallel. More recently, the Solr trunk has support for SolrCloud. This uses the ZooKeeper open source project to simplify coordination of multiple Solr servers. Finally, the Katta open source project supports Lucene-level distributed search, with many of the features needed for production quality distributed search that have not yet been added to SolrCloud. Summary The combination of Hadoop and Solr makes it easy to crunch lots of data and then quickly serve up the results via a fast, flexible search & query API. Because Solr supports query-style requests, it’s suitable as a NoSQL replacement for traditional databases in many situations, especially when the size of the data exceeds what is reasonable with a typical RDBMS. Solr has some limitations that you should be aware of, specifically: · Updating the index works best as a batch operation. Individual records can be updated, but each commit (index update) generates a new Lucene segment, which will impact performance. · Current support for replication, fail-over, and other attributes that you’d want in a production-grade solution aren’t yet there in SolrCloud. If this matters to you, consider Katta instead. · Many SQL queries can’t be easily mapped to Solr queries. The code for this article is available via GitHub, at http://github.com/bixolabs/hadoop2solr. The README displayed on that page contains additional technical details.

April 4, 2011

by Ken Krugler

· 119,696 Views

Configuring Sonar with Maven

Sonar is an open source web-based application to manage code quality which covers seven axes of code quality as: Architecture and design, comments, duplications, unit tests, complexity, potential bugs and coding rules. Developed in Java and can cover projects in Java, Flex, PHP, PL/SQL, Cobol and Visual Basic 6. It's very efficient to navigate, offering visual reporting and you can follow metrics evolution of your project and combine them. There is an online project called Nemo dedicated to open source projects, as you can see projects like Jetty, Apache Lucene and Apache Tomcat. So, let's config Sonar to work together with Maven. First of all set up Sonar server and other configurations in Maven's settings.xml file: sonar true jdbc:postgresql://localhost/sonar org.postgresql.Driver user password http://localhost:9000 You must to have a Sonar server running and define it in sonar.host.url parameter. In this example I'm using the default Sonar URL, http://localhost:9000 , and you must set user name and password for your database. To install local Sonar Server you can see this link. After that, to execute the code analysers and save results in Sonar database just execute mvn sonar:sonar. You can config Sonar to execute in your CI (Continuous Integration) application. Hudson is a perfect match, because there is a Sonar plugin for it. See the final result below: I prefer Teamcity CI, but there isn't a stable plugin as you can see in compatibility matrix. To resolve this plugin's problem, just add sonar:sonar in the maven command executed by Teamcity. Sonar is an essential tool for your software project to help you evaluating how cohesive your classes are, warning for future problems as code complexity or duplication problems arise, and will help you to have a cleaner code. If you have any question or some difficulties, contact me.

March 21, 2011

by Valdemar Júnior

· 143,625 Views · 2 Likes

Apache Solr: Get Started, Get Excited!

we've all seen them on various websites. crappy search utilities. they are a constant reminder that search is not something you should take lightly when building a website or application. search is not just google's game anymore. when a java library called lucene was introduced into the apache ecosystem, and then solr was built on top of that, open source developers began to wield some serious power when it came to customizing search features. in this article you'll be introduced to apache solr and a wealth of applications that have been built with it. the content is divided as follows: introduction setup solr applications summary 1. introduction apache solr is an open source search server. it is based on the full text search engine called apache lucene . so basically solr is an http wrapper around an inverted index provided by lucene. an inverted index could be seen as a list of words where each word-entry links to the documents it is contained in. that way getting all documents for the search query "dzone" is a simple 'get' operation. one advantage of solr in enterprise projects is that you don't need any java code, although java itself has to be installed. if you are unsure when to use solr and when lucene, these answers could help. if you need to build your solr index from websites, you should take a look into the open source crawler called apache nutch before creating your own solution. to be convinced that solr is actually used in a lot of enterprise projects, take a look at this amazing list of public projects powered by solr . if you encounter problems then the mailing list or stackoverflow will help you. to make the introduction complete i would like to mention my personal link list and the resources page which lists books, articles and more interesting material. 2. setup solr 2.1. installation as the very first step, you should follow the official tutorial which covers the basic aspects of any search use case: indexing - get the data of any form into solr. examples: json, xml, csv and sql-database. this step creates the inverted index - i.e. it links every term to its documents. querying - ask solr to return the most relevant documents for the users' query to follow the official tutorial you'll have to download java and the latest version of solr here . more information about installation is available at the official description . next you'll have to decide which web server you choose for solr. in the official tutorial, jetty is used, but you can also use tomcat. when you choose tomcat be sure you are setting the utf-8 encoding in the server.xml . i would also research the different versions of solr, which can be quite confusing for beginners: the current stable version is 1.4.1. use this if you need a stable search and don't need one of the latest features. the next stable version of solr will be 3.x the versions 1.5 and 2.x will be skipped in order to reach the same versioning as lucene. version 4.x is the latest development branch. solr 4.x handles advanced features like language detection via tika, spatial search , results grouping (group by field / collapsing), a new "user-facing" query parser ( edismax handler ), near real time indexing, huge fuzzy search performance improvements, sql join-a like feature and more. 2.2. indexing if you've followed the official tutorial you have pushed some xml files into the solr index. this process is called indexing or feeding. there are a lot more possibilities to get data into solr: using the data import handler (dih) is a really powerful language neutral option. it allows you to read from a sql database, from csv, xml files, rss feeds, emails, etc. without any java knowledge. dih handles full-imports and delta-imports. this is necessary when only a small amount of documents were added, updated or deleted. the http interface is used from the post tool, which you have already used in the official tutorial to index xml files. client libraries in different languages also exist. (e.g. for java (solrj) or python ). before indexing you'll have to decide which data fields should be searchable and how the fields should get indexed. for example, when you have a field with html in it, then you can strip irrelevant characters , tokenize the text into 'searchable terms', lower case the terms and finally stem the terms . in contrast, if you would have a field with text in it that should not be interpreted (e.g. urls) you shouldn't tokenize it and use the default field type string. please refer to the official documentation about field and field type definitions in the schema.xml file. when designing an index keep in mind the advice from mauricio : "the document is what you will search for. " for example, if you have tweets and you want to search for similar users, you'll need to setup a user index - created from the tweets. then every document is a user. if you want to search for tweets, then setup a tweet index; then every document is a tweet. of course, you can setup both indices with the multi index options of solr. please also note that there is a project called solr cell which lets you extract the relevant information out of several different document types with the help of tika. 2.3. querying for debugging it is very convenient to use the http interface with a browser to query solr and get back xml. use firefox and the xml will be displayed nicely: you can also use the velocity contribution , a cross-browser tool, which will be covered in more detail in the section about 'search application prototyping' . to query the index you can use the dismax handler or standard query handler . you can filter and sort the results: q=superman&fq=type:book&sort=price asc you can also do a lot more ; one other concept is boosting. in solr you can boost while indexing and while querying. to prefer the terms in the title write: q=title:superman^2 subject:superman when using the dismax request handler write: q=superman&qf=title^2 subject check out all the various query options like fuzzy search , spellcheck query input , facets , collapsing and suffix query support . 3. applications now i will list some interesting use cases for solr - in no particular order. to see how powerful and flexible this open source search server is. 3.1. drupal integration the drupal integration can be seen as generic use case to integrate solr into php projects. for the php integration you have the choice to either use the http interface for querying and retrieving xml or json. or to use the php solr client library . here is a screenshot of a typical faceted search in drupal : for more information about faceted search look into the wiki of solr . more php projects which integrates solr: open source typo3- solr module magento enterprise - solr module . the open source integration is out dated. oxid - solr module . no open source integration available. 3.2. hathi trust the hathi trust project is a nice example that proves solr's ability to search big digital libraries. to quote directly from the article : "... the index for our one million book index is over 200 gigabytes ... so we expect to end up with a two terabyte index for 10 million books" other examples for libraries: vufind - aims to replace opac internet archive national library of australia 3.3. auto suggestions mainly, there are two approaches to implement auto-suggestions (also called auto-completion) with solr: via facets or via ngramfilterfactory . to push it to the extreme you can use a lucene index entirely in ram. this approach is used in a large music shop in germany. live examples for auto suggestions: kaufda.de 3.4. spatial search applications when mentioning spatial search, people have geographical based applications in mind. with solr, this ordinary use case is attainable . some examples for this are : city search - city guides yellow pages kaufda.de spatial search can be useful in many different ways : for bioinformatics, fingerprints search, facial search, etc. (getting the fingerprint of a document is important for duplicate detection). the simplest approach is implemented in jetwick to reduce duplicate tweets, but this yields a performance of o(n) where n is the number of queried terms. this is okay for 10 or less terms, but it can get even better at o(1)! the idea is to use a special hash set to get all similar documents. this technique is called local sensitive hashing . read this nice paper about 'near similarity search and plagiarism analysis' for more information. 3.5. duckduckgo duckduckgo is made with open source and its "zero click" information is done with the help of solr using the dismax query handler: the index for that feature contains 18m documents and has a size of ~12gb. for this case had to tune solr: " i have two requirements that differ a bit from most sites with respect to solr: i generally only show one result, with sometimes a couple below if you click on them. therefore, it was really important that the first result is what people expected. false positives are really bad in 0-click, so i needed a way to not show anything if a match wasn't too relevant. i got around these by a) tweaking dismax and schema and b) adding my own relevancy filter on top that would re-order and not show anything in various situations. " all the rest is done with tuned open source products. to quote gabriel again: "the main results are a hybrid of a lot of things, including external apis, e.g. bing, wolframalpha, yahoo, my own indexes and negative indexes (spam removal), etc. there are a bunch of different types of data i'm working with. " check out the other cool features such as privacy or bang searches . 3.6. clustering support with carrot2 carrot2 is one of the "contributed plugins" of solr. with carrot2 you can support clustering : " clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. " see some research papers regarding clustering here . here is one visual example when applying clustering on the search "pannous" - our company : 3.7. near real time search solr isn't real time yet, but you can tune solr to the point where it becomes near real time, which means that the time ('real time latency') that a document takes to be searchable after it gets indexed is less than 60 seconds even if you need to update frequently. to make this work, you can setup two indices. one write-only index "w" for the indexer and one read-only index "r" for your application. index r refers to the same data directory of w, which has to be defined in the solrconfig.xml of r via: /pathto/indexw/data/ to make sure your users and the r index see the indexed documents of w, you have to trigger an empty commit every 60 seconds: wget -q http://localhost:port/solr/update?stream.body=%3ccommit/%3e -o /dev/null everytime such a commit is triggered a new searcher without any cache entries is created. this can harm performance for visitors hitting the empty cache directly after this commit, but you can fill the cache with static searches with the help of the newsearcher entry in your solrconfig.xml. additionally, the autowarmcount property needs to be tuned, which fills the cache with a newsearcher from old entries. also, take a look at the article 'scaling lucene and solr' , where experts explain in detail what to do with large indices (=> 'sharding') and what to do for high query volume (=> 'replicating'). 3.8. loggly = full text search in logs feeding log files into solr and searching them at near real-time shows that solr can handle massive amounts of data and queries the data quickly. i've setup a simple project where i'm doing similar things , but loggly has done a lot more to make the same task real-time and distributed. you'll need to keep the write index as small as possible otherwise commit time will increase too great. loggly creates a new solr index every 5 minutes and includes this when searching using the distributed capabilities of solr ! they are merging the cores to keep the number of indices small, but this is not as simple as it sounds. watch this video to get some details about their work. 3.9. solandra = solr + cassandra solandra combines solr and the distributed database cassandra , which was created by facebook for its inbox search and then open sourced. at the moment solandra is not intended for production use. there are still some bugs and the distributed limitations of solr apply to solandra too. tthe developers are working very hard to make solandra better. jetwick can now run via solandra just by changing the solrconfig.xml. solandra also has the advantages of being real-time (no optimize, no commit!) and distributed without any major setup involved. the same is true for solr cloud. 3.10. category browsing via facets solr provides facets , which make it easy to show the user some useful filter options like those shown in the "drupal integration" example. like i described earlier , it is even possible to browse through a deep category tree. the main advantage here is that the categories depend on the query. this way the user can further filter the search results with this category tree provided by you. here is an example where this feature is implemented for one of the biggest second hand stores in germany. a click on 'schauspieler' shows its sub-items: other shops: game-change 3.11. jetwick - open twitter search you may have noticed that twitter is using lucene under the hood . twitter has a very extreme use case: over 1,000 tweets per second, over 12,000 queries per second, but the real-time latency is under 10 seconds! however, the relevancy at that volume is often not that good in my opinion. twitter search often contains a lot of duplicates and noise. reducing this was one reason i created jetwick in my spare time. i'm mentioning jetwick here because it makes extreme use of facets which provides all the filters to the user. facets are used for the rss-alike feature (saved searches), the various filters like language and retweet-count on the left, and to get trending terms and links on the right: to make jetwick more scalable i'll need to decide which of the following distribution options to choose: use solr cloud with zookeeper use solandra move from solr to elasticsearch which is also based on apache lucene other examples with a lot of facets: cnet reviews - product reviews. electronics reviews, computer reviews & more. shopper.com - compare prices and shop for computers, cell phones, digital cameras & more. zappos - shoes and clothing. manta.com - find companies. connect with customers. 3.12. plaxo - online address management plaxo.com , which is now owned by comcast, hosts web addresses for more than 40 million people and offers smart search through the addresses - with the help of solr. plaxo is trying to get the latest 'social' information of your contacts through blog posts, tweets, etc. plaxo also tries to reduce duplicates . 3.13. replace fast or google search several users report that they have migrated from a commercial search solution like fast or google search appliance (gsa) to solr (or lucene). the reasons for that migration are different: fast drops linux support and google can make integration problems. the main reason for me is that solr isn't a black box —you can tweak the source code, maintain old versions and fix your bugs more quickly! 3.14. search application prototyping with the help of the already integrated velocity plugin and the data import handler it is possible to create an application prototype for your search within a few hours. the next version of solr makes the use of velocity easier. the gui is available via http://localhost:port/solr/browse if you are a ruby on rails user, you can take a look into flare. to learn more about search application prototyping, check out this video introduction and take a look at these slides. 3.15. solr as a whitelist imagine you are the new google and you have a lot of different types of data to display e.g. 'news', 'video', 'music', 'maps', 'shopping' and much more. some of those types can only be retrieved from some legacy systems and you only want to show the most appropriated types based on your business logic . e.g. a query which contains 'new york' should result in the selection of results from 'maps', but 'new yorker' should prefer results from the 'shopping' type. with solr you can set up such a whitelist-index that will help to decide which type is more important for the search query. for example if you get more or more relevant results for the 'shopping' type then you should prefer results from this type. without the whitelist-index - i.e. having all data in separate indices or systems, would make it nearly impossible to compare the relevancy. the whitelist-index can be used as illustrated in the next steps. 1. query the whitelist-index, 2. decide which data types to display, 3. query the sub-systems and 4. display results from the selected types only. 3.16. future solr is also useful for scientific applications, such as a dna search systems. i believe solr can also be used for completely different alphabets so that you can query nucleotide sequences - instead of words - to get the matching genes and determine which organism the sequence occurs in, something similar to blast . another idea you could harness would be to build a very personalized search. every user can drag and drop their websites of choice and query them afterwards. for example, often i only need stackoverflow, some wikis and some mailing lists with the expected results, but normal web search engines (google, bing, etc.) give me results that are too cluttered. my final idea for a future solr-based app could be a lucene/solr implementation of desktop search. solr's facets would be especially handy to quickly filter different sources (files, folders, bookmarks, man pages, ...). it would be a great way to wade through those extra messy desktops. 4. summary the next time you think about a problem, think about solr! even if you don't know java and even if you know nothing about search: solr should be in your toolbox. solr doesn't only offer professional full text search, it could also add valuable features to your application. some of them i covered in this article, but i'm sure there are still some exciting possibilities waiting for you!

January 25, 2011

by Peter Karussell

· 147,415 Views

Interview: Music Composer on the NetBeans Platform

Steven Yi (pictured right) is a programmer and composer living in Rochester, NY. He studied music composition in college and became a programmer afterwards. He started off as a Flash and server side developer (which he did for about 7 years), and has spent the past few years at his current company doing mobile development with J2ME, Android, and iPhone, as well as server-side development with Spring, Hibernate, etc. He started learning and using Java and Swing for personal work in 2000 and has been using it since then for the development of blue, the focus of the interview that follows below. In the interview, Steven talks about the "blue" music composer, how it works, and how the NetBeans Platform and Python form the basis of this cool open-sourced Java music composer. What's "blue"? blue is a music composition environment I started in the fall of 2000. It was actually my very first Java program! At the time, I had started using the music software Csound (http://www.csounds.com) to compose, but found it slow to work with when it came to accomplishing what I was interested in musically. I had the idea to create a simple program that would have a timeline and the ability to scale musical score material in time. Fast forward many years later: in trying to solve other musical problems, and responding to feedback from the community of users, I've expanded blue's features a great deal. It now includes things like a mixer and effects system, a GUI builder tool for creating synthesizer interfaces, embedded Jython processing of musical scripts, and more. It's been quite satisfying to create a tool that can express my musical interests and to find a community of users who have found value in this program for their own musical work. Some screenshots: The Orchestra manager shows a BlueSynthBuilder instrument being edited. The "Reson 6" instrument is shown in edit mode. The BSB Object Properties panel shows the properties for the selected knob: The Score timeline shows a project using multiple parameter automations. The values automated include things like volume, panning, and a time pointer for a phase-vocoder instrument. All BlueSynthBuilder instruments, Effects, and the Mixer volume sliders can be automated: The Score timeline showing the author's composition "Reminiscences". The timeline shows multiple Python SoundObjects used. The SoundObject Editor shows the editor for the selected SoundObject in the timeline. The SoundObject Properties panel shows different properties for the selected SoundObject: The Score timeline showing a Tracker SoundObject being used. The timeline is configured to snap at every 4 beats and the time bar has been configured to show in numbers rather than time: The Score timeline showing a PianoRoll SoundObject being used. The PianoRoll is unique in that it is microtonal, meaning it can adapt the number of steps per "octave", depending on the values configured from a tuning file. In this screenshot, the scale loaded was a Bohlen-Pierce scale, which has 13-tones per tritave (octave and a half): The blue Mixer is shown docked into the bottom bar and in an open state. The interfaces for user-created Chorus and Reverb effects are shown. The interfaces were created using the same GUI builder tool that is found in the BlueSynthBuilder instrument: It's got a very special appearance. How did that come about? blue's custom look and feel started off one day when I was using my Palm PDA. I remember thinking that I enjoyed the look of the device with the backlight on, and so I wanted to recreate that kind of look for my program. Later, I modified the color scheme to tone it down in some ways, but I also introduced more colors than white and cyan to highlight secondary and tertiary features. Maybe now it is now more like Tron than it is like Palm. :) Overall, I enjoy the darker look of the application when I'm working on music. I tend to work on music when I have free time, and that is usually only late at night—I've found having a darker screen has been easier on my eyes. Also, if anyone was wondering, yes, blue is my favorite color. The blue look and feel is encapsulated in a module named "blue-plaf" and is available in the "blue" Mercurial repository (http://bluemusic.hg.sourceforge.net/hgweb/bluemusic/blue). The look and feel is quite hacked up (redoing it properly has been another item on my todo list), but it can be dropped into another application and it should work, as shown below with the CRUD Sample (which can be created from a tutorial found here): Can you explain how blue's timeline works? blue has a concept of SoundLayers and SoundObjects. SoundObjects are objects that primarily produce notes and have a start and duration. There are many different types of SoundObjects in blue and each has an editor (viewed in the SoundObject Editor TopComponent when a SoundObject is selected), and a BarRenderer, which is used to draw the content area of the bar on the timeline. A PolyObject is a special SoundObject. It consists of SoundLayers, which contain SoundObjects. The root timeline is itself just a PolyObject that you can add as many layers to as you like. You can also group individual SoundObjects into their own PolyObjects, and then use the resulting PolyObjects just like any other SoundObject on the timeline. If you double-click a PolyObject, the timeline is then reset with the timeline of the PolyObject you selected. As a result, PolyObjects allow timelines to be embedded within other timelines. If you think about how music is grouped into motives, phrases, sections, and even larger groupings, you can see how PolyObjects might represent these kinds of musical abstractions. For the component design, the ScoreTopComponent starts off with a JSplitPane to split between a SoundLayerListPanel on the left and a JScrollPane on the right. The JScrollPane has a ScoreTimeCanvas (the main timeline) in the main viewPort's view, a panel with the the time bar and tempo editor in the column header, and the corner is used to open up an extra panel to modify properties for the timeline. The JScrollPane has customized JScrollBars used to add the ± buttons that perform zooming on the timeline. There are a number of other features involved that are implemented amongst a number of classes, but the details of how viewPorts are synchronized (among other things) may be a bit too technical to discuss here. For those who are interested, the code can be viewed within the blue.ui.core.score package within the blue-ui-core module. How did blue come to find itself atop the NetBeans Platform? I first started to be interested in NetBeans IDE around the time 4.1 came out, but didn't really get into using it until the release of 5.0. At that time, I had hand-written Swing components for about 4-5 years (I don't really remember when 5.0 was released), and I found Matisse to be quite nice and began using it here and there. I had looked at the NetBeans Platform as an RCP at that time, but found it to be quite a bit to understand. However, I still kept it on my radar. Around the time 6.0 or 6.5 came out, I started to reconsider migrating to the NetBeans Platform once again. By this time, I had moved over to using NetBeans IDE full time for blue development and had been using NetBeans IDE more in general—particularly Java Web development and Ruby on Rails. One of the biggest things I found attractive about NetBeans IDE is its windowing system... and the things I read about in the platform development articles I'd seen online made me curious once again to see what the NetBeans Platform offered. I still felt that there was going to be a big learning curve to learn the NetBeans Platform, but the NetBeans Platform tutorials online were really quite helpful, as were the members of the NetBeans Platform mailing list, and there were also many more books available to help me get started. I think I ultimately spent about 6-8 months migrating blue to using the NetBeans Platform. Granted, it was a busy time in my life and I was working on this only in my spare time, so I think in the end it was a reasonable amount of time. Users have been very positive about the new blue interface and application as a whole, and I think it has been worth spending the time to use the NetBeans Platform. blue's window layout is quite unexpected for a NetBeans Platform application. By the time I had started migrating blue to the NetBeans Platform, the application was already some 7-8 years old. The interface I designed for blue in pure Swing was influenced by my experiences in using Flash, looking at other music composition environments (Digital Audio Workstations and Sequencing Programs), and evaluating the different aspects of working with Csound. Mapping the components from the Swing-based application to the NetBeans Platform was a little tricky in that I couldn't quite get the exact same design of panels as I had in pure Swing. In the end, I tried to think about where most of the components resided physically, and created TopComponents and placed them in the center, left, right, or bottom parts of the main window. I kept some of the dialogs from the old codebase as-is, but I migrated others to be TopComponents so that they could be docked, opened, or dragged out into a dialog as the user wished. In the end, the GUI is different and took a little getting used to after years of using and building the old interface, but I quickly adjusted to the changes and I think there is much greater consistency and usability now. The users have responded very positively to the general polish of the application and to being able to customize their environment. I myself have very much enjoyed being able to dock all of the windows as well as using full-screen mode, especially when I am on my netbook and composing. Excellent! What features of the NetBeans Platform are you using and what do you find to be most useful? Currently, I am using only a very small part of the NetBeans Platform. By the time I started to move my code to the NetBeans Platform, the codebase was already some 7 or 8 years old. I took the approach recommended to me on the mailing list and started off small, focusing primarily on migrating my project to using the Windows System API, the Options API, and a few other utility API's like IO and Dialogs. Having an old codebase, I found that I spent most of my time during migration just reorganizing my UI into TopComponents and working out communications between the components. I also spent time looking at API's that I had developed myself and seeing which ones could be replaced with API's provided by the NetBeans Platform. At this time, the application is still using a number of API's I wrote from the old codebase, but over time I would like to migrate more of the appplication to use the Nodes and Visual Library API's. I think migrating a codebase of this size in phases really worked out well. In the first phase, I was able to take advantage of the Window System API and have a very visible result on the application and gained a lot for usability. Also, a big part of the migration involved moving the codebase from a monolithic source tree and partitioning it into logical modules. I think there really is a great deal of benefit to working with a codebase with modular design, and that too is a very positive result of working with the NetBeans Platform. Please say something about how Jython relates to this application, how you are using it (what the benefits are), and your general opinions on Jython. I have had a Python SoundObject in blue for quite some time—I think since 2002. For me, it is one of the most important tools in blue when it comes to accomplishing what I want musically. With computer music, we have a lot of tools for what I call Common Practice computer music: PianoRolls, Pattern Editors, and Notation Objects. For computer musicians who are interested in Uncommon Practice music, the ability to use a scripting language opens up a number of ways to express musical ideas that cannot be easily conveyed using those other tools. In blue, Jython is primarily used to allow users to write scripts that will generate notes. For myself, I use Python scripts to model orchestral composition, creating Performer and PerformerGroup objects that I write in Python. I also write performance functions, usually per-project, to perform different musical material in different ways. Other users have used Python scripts in exploring things like algorithmic composition and genetic algorithms in their work. A blue project can contain any number of Python objects. The score generated by each Python object is translated and scaled in time by moving and resizing the SoundObject in the timeline. This allows a user who may want to use scripting to create musical material to also take advantage of blue's timeline to organize how the different musical objects will work together. One of the things I most appreciate about Jython (and scripting languages on the JVM in general) is that it is embeddable within a Java application. By packaging and embedding the Jython interpreter within the blue application, users can rest assured that the Python scripts they write can be interpreted anywhere that blue is installed. It's an extra assurance that their musical projects will be long-lasting, but they can still take advantage of a full programming language like Python in their work. Overall, I think that Jython is a fine piece of software and I hope that it will continue to grow and develop for years to come. Is the application open source and are you looking for code contributions and, if so, in which areas? Yes, the application is available under the GPL v2 license, and the source code can be viewed from the Mercurial repository on SourceForge at http://bluemusic.hg.sourceforge.net/hgweb/bluemusic/blue/. I am a strong proponent of open source, especially for creative work. In the same way that we can today look at and study musical works by composers of the past (like Josquin and Bach), I would like to imagine that the work composers and other artists are now creating with computers will also be open and available for study in the future. I believe that using open source software for creative work greatly helps in making musical projects available for the years ahead. I have done most of the development of blue myself, and over the years I've certainly built up a long list of things that I would like to implement. Users have also made wonderful feature requests that I would love to see in the program—but unfortunately, there are only so many hours in the day. It would certainly be nice to have others contributing code! Beyond new features, there are a number of infrastructural things that would be nice to address. The codebase is many years old, and while the application has been refactored multiple times over its lifetime, there are still some areas of the application that could be much more cleanly implemented. Also, in moving over to the NetBeans Platforms, I only really took the first steps. There are a number of components within the application that could probably be better served by migrating to using more of the NetBeans API's. For internal work, things like modifying the timeline to implement zooming to use Graphics2d and transforms, implementing a better waveform renderer for audio files, and further enhancing the instrument GUI builder are all things I'd like to see. I'd also love to get help in migrating all of the tables and trees to using the Nodes API, something that I have not yet had the time to do. It would also be nice to get the manual (currently in HTML and PDF, generated from DocBook) integrated into the application as JavaHelp, but this is another thing that I have had to postpone due to lack of time. For features, some interesting things I'd love to see are a Notation SoundObject, a separate graphical instrument builder using the Visual Library API, and a Sampler instrument. There's also a sound drawing SoundObject, enhancements to existing SoundObjects, and more I'd love to see moving forward. Maybe someone will find these kinds of things interesting and will take a look at blue's code sometime! Thanks Steven and happy music making with blue!

May 25, 2010

by Geertjan Wielenga

· 17,924 Views

Open Source NoSQL Databases

For almost a year now, the idea of "NoSQL" has been spreading due to the demand for relational database alternatives. Maybe the biggest motivation behind NoSQL is scalability. Relational databases don't lend themselves well to the kind of horizontal scalability that's required for large-scale social networking or cloud applications, and ORMs can abstract away impedance mismatch only so much. In other cases, companies just don't need as many of the complex features and rigid schemas provided by relational databases. Most people are not suggesting that we all ditch the RDBMS, in fact, many companies don't really need to switch. Relational databases will probably be necessary for many applications years and years from now. In essence, NoSQL is a movement that aims to reexamine the way we structure data and draw attention to innovation in hopes of finding the solution to the next generation's data persistence problems. Here are some of the better known open source data stores/models labeled as "NoSQL": CouchDB- Document Store Maps keys to data It provides a RESTful JSON API and is written in Erlang You can upload functions to index data and then you can call those functions Has a very simple REST interface Provides an innovative replication strategy - nodes can reconnect, sync, and reconcile differences after being disconnected for long periods of time Enables new distributed types of applications and data MongoDB - Document Store Free-form key-value-like data store with good performance Powerful, expansive query model Usability rivals that of Redis Good for complex data storage needs. Production-quality sharding capabilities Neo4j - GraphDB Disk-based Has a restricted, single-threaded model for graph traversal Has optional layers to expose Neo4j as an RDF store Can handle graphs of several billion nodes, relationships, or properties on a single machine Released under a dual license - free for non-commercial use Apache Hbase - Wide Column Store/Column Families Built on top of Hadoop, which has functionality similar to Google's GFS and MapReduce systems Hadoop's HDFS provides a mechanism that reliably stores and organizes large amounts of data Random access performance is on par with MySQL Has a high performance Thrift gateway Cascading source and sink modules Redis - Key Value/Tuple Store Provides a rich API and does more operations in memory, using disk only periodically. It's extremely fast Lets you append a value to the end of a list of items that's already been stored on a key. Has atomic operations, making it a best-of-breed tally server. Memcached - Key Value/Tuple Store High-performance, distributed memory object caching Free and open source Generic and agnostic to the objects/strings it caches It's all in-memory data Simple yet elegant design enables easy development and deployment Language neutral caching scheme. Most of the large properties on the web are using it now, except for Microsoft Project Voldemort - Eventually Consistent Key Value Store Used by LinkedIn Handles server failure transparently Pluggable serialization supports rich keys and values including lists and tuples with named fields Supports common serialization frameworks including Protocol Buffers, Thrift, and Java Serialization Data items are versioned Supports pluggable data placement strategies Memory caching and the storage system are combined Tokyo Cabinet and Tokyo Tyrant - Key Value/Tuple Store Supports hashtable mode, b-tree mode, and table mode It's fast and straightforward Good for small to medium-sized amounts of data that require rapid updating and can be easily modeled in terms of keys and values Cassandra - Wide Column Store/Column Families First developed by Facebook SuperColumns can turn a simple key-value architecture into an architecture that handles sorted lists, based on an index specified by the user. Can scale from one node to several thousand nodes clustered in different data centers. Can be tuned for more consistency or availability Smooth node replacement if one goes down ____ Some other well known NoSQL-style data stores that are closed source include Google BigTable and Amazon SimpleDB. GigaSpaces is a popular space-based Grid solution that has NoSQL qualities. Check out this informative post on NoSQL patterns.

February 23, 2010

by Mitch Pronschinske

· 46,075 Views

Four Methods to Automate Development Environment Setup

There are at least four methods that can be used in different combinations to make the process of setting up a complete development environment a lot less painful.

February 16, 2010

by Mitch Pronschinske

· 31,793 Views

Promiscuous Integration vs. Continuous Integration

The emergence of version control systems makes both promiscuous and continuous integration merging techniques more attractive. Which is better?

February 10, 2010

by Martin Fowler

· 50,146 Views · 2 Likes

Top Open Source ESB Projects

In today's software markets, open source technologies are giving commercial products some stiff competition. Enterprise Service Busses are no exception. Don Rippert, the chief technology officer at Accenture says, "ESBs are software products that allow you to create a business process with web services running on different platforms." Rippert believes an ESB is essential for achieveing the full potential of service-oriented architecture. In general, an ESB should provide flexibility built on a basis of standards. Jos Dirksen, an author of "Open Source ESBs in Action," said in a recent interview that today's top open source ESBs were "on par with commercial alternatives." Competition drives innovation, and this page has a list of the most competitive open source ESBs on the market. Here are the the forerunners among open source ESBs (in no particular order): JBoss ESB JBoss JBoss generally has mature components in its GA releases with no vendor-lockin characteristics. Their ESB leverages JEMStechnologies like the JBoss business rules engine for content-basedrouting and messaging. Content-based routing on the JBoss ESB can use Drools or XPath. The JBoss ESB supports XSLT and the Smookstransformation engine for XML and non-XML data formats. JBoss' ESBalso runs on the JBoss application server and features a pluggable architecture for swapping out ESBsubsystems. Apache ServiceMix Apache Apache ServiceMix 4 is OSGi based and a great option for integrating with an XML standards focussed landscape. Apache ServiceMix makes it very easy to hot-deploy new integration flows. Even the pluggable integration components are hot deployable. ServiceMix uses a JBI standard which provides a lot of components like JMS, BPEL, Web service, and Camel. The inclusion of Camel is a strong point for ServiceMix along with the Spring Framework, which is also supported. FUSE ESB is another great distribution of Apache ServiceMix. OpenESB Sun(Oracle) OpenESBhas an easy learning curve due to its solid integration with theGlassFish Application Server and Sun's popular IDE, NetBeans. TheNetbeans IDE provides countless integrated functions for administrationand development. The best thing about OpenESB is its toolset. OpenESB's tools include WSDL and schema editors, a JPI manager integrated into the service manager, and Antrunning in the background. Another tool is the Composite ApplicationService Assembly (CASA) editor, which gives you a graphical overview ofintegration applications. Many Java developers will love OpenESBbecause it comes straight from the home of Java. OpenESB is also OSGi based. MuleESB MuleSoft Mule is the most used open source integration platform. MuleESB's low cost along with easy configuration, expansion, and flexibility make it very popular. Java developers will find MuleESB easy to work with because it is Java centric. There’s also a powerful set of XML schemas in MuleESB. The creation of integration flows is very straightforward. MuleESB can have fairly complex integration flows up and running in minutes. It has many connectivity, routing, and transformation options right out of the box. WSO2 ESB WSO2 Other ESB products take a relatively heavyweight approach by using the JBI specification, but the relative newcomer, WSO2, takes a lightweight approach in its ESB. It does this by focusing on Web service standards for integration. The WSO2 ESB uses Apache Synapse, a nimble Web service mediation and routing engine that focuses on providing fast XML message processing. WSO2 takes advantage of Synapse's non-blocking http://s transport implementation over the Apache HttpComponents/NIO module. This allows the WSO2 ESB to handle thousands of parallel requests using a small amount of resources and threads. You can always expect great XML support from the WSO2 ESB because well-known XML expert James Clark is a company director at WSO2.

October 29, 2009

by Mitch Pronschinske

· 241,647 Views

Top 10 Open Source Projects for .NET Developers

There are many useful, practically essential, open source solutions available to .NET developers. This is a list of the ones that have been the most useful in my experience. Agree? Disagree? Leave a comment with your own suggestions. NHibernate - NHibernate is a mature, open source object-relational mapper for the .NET framework. It's actively developed, fully featured and used in thousands of successful projects. http://nhforge.org/Default.aspx NUnit - NUnit is a unit-testing framework for all .Net languages. Initially ported from JUnit, the current production release, version 2.4, is the fifth major release of this xUnit based unit testing tool for Microsoft .NET. It is written entirely in C# and has been completely redesigned to take advantage of many .NET language features, for example custom attributes and other reflection related capabilities. NUnit brings xUnit to all .NET languages. http://www.nunit.org/index.php jQuery - jQuery is a fast and concise JavaScript Library that simplifies HTML document traversing, event handling, animating, and Ajax interactions for rapid web development. jQuery is designed to change the way that you write JavaScript. http://jquery.com/ Rhino.Mocks - A dynamic mock object framework for the .Net platform. It's purpose is to ease testing by allowing the developer to create mock implementations of custom objects and verify the interactions using unit testing. http://ayende.com/projects/rhino-mocks.aspx MVC Contrib - This is the contrib project for the ASP.NET MVC framework. This project adds additional functionality on top of the MVC Framework. These enhancements can increase your productivity using the MVC Framework. It is written in C#. Founded by Eric Hexter and Jeffrey Palermo. http://www.codeplex.com/MVCContrib CruiseControl.NET - CruiseControl.NET is an Automated Continuous Integration server, implemented using the Microsoft .NET Framework. http://confluence.public.thoughtworks.org/display/CCNET/Welcome+to+CruiseControl.NET S#arp Architecture - Pronounced "Sharp Architecture," this is a solid architectural foundation for rapidly building maintainable web applications leveraging the ASP.NET MVC framework with NHibernate. The primary advantage to be sought in using any architectural framework is to decrease the code one has to write while increasing the quality of the end product. A framework should enable developers to spend little time on infrastructure details while allowing them to focus their attentions on the domain and user experience. http://code.google.com/p/sharp-architecture/ Spark View Engine - Spark is a view engine for Asp.Net Mvc and Castle Project MonoRail frameworks. The idea is to allow the html to dominate the flow and the code to fit seamlessly. http://sparkviewengine.com/ TortoiseSVN - A Subversion client, implemented as a windows shell extension. TortoiseSVN is a really easy to use Revision control / version control / source control software for Windows. Since it's not an integration for a specific IDE you can use it with whatever development tools you like. TortoiseSVN is free to use. You don't need to get a loan or pay a full years salary to use it. http://tortoisesvn.tigris.org/ Castle Windsor - Castle Project offers two Inversion of Control Containers. The MicroKernel and the Windsor Container. Castle Windsor aggregates the MicroKernel and exposes a powerful configuration support. It is suitable for common enterprise application needs. It is able to register facilities and components based on the configuration and adds support for interceptors. http://www.castleproject.org/container/index.html

May 4, 2009

by Alvin Ashcraft

· 88,267 Views

Enterprise Integration Patterns with Apache Camel Refcard Now Available!

Apache Camel is a powerful open source integration platform based on Enterprise Integration Patterns with Bean Integration. This Refcard provides you with eleven of the most essential patterns that anyone working with integration must know. This Refcard is targeted for software developers and enterprise architects, but anyone in the integration space can benefit as well. Download Now! About the Author: Claus Ibsen is a passionate open-source enthusiast who specializes in the integration space. As an engineer in the FuseSource Open Source Division he works full time on Apache Camel, FUSE Mediation Router (Apache Camel Enterprise) and related projects. Claus is very active in the Apache Camel and Fuse communities, writing blogs, twittering, assisting on the forums, irc channels and is driving the Apache Camel roadmap.

March 30, 2009

by Wei Ling Chen

· 4,977 Views

Performance Monitoring Using Glassbox

The industry is recognizing the fact that performance testing & engineering should be part of the project execution road map starting from the requirements gathering phase. At many times during project executions, performance engineering related activities are executed based on customer need or slow response time of application after development phase gets completed. Glassbox can be leveraged (by developers/testers/business users) during and after the development cycle to monitor the response times of requests with-out being aware of underlying application structure and code details. Analysis generated by Glassbox gives direct pointers on where is the bottleneck which causes slow response time for that particular request/page/URL. About Glassbox Glassbox is an open source web application which aid in performance monitoring and troubleshooting of multiple web applications deployed in container. Troubleshooting It contains the built-in knowledge repository of common problems which are used to pinpoint the issues and suggestions on causes as Java code executes. Performance Monitoring It monitors the requests as Java code executes and provides details about response times. Glassbox web client (AJAX GUI) provides nice summary dashboard view which contains various attributes like (server-name, application name, operation/request-URL, average time, no. of executions, status (slow / OK) and analysis details). By default, an operation that takes more than 1 sec execution time is marked as SLOW status. Such SLA can be modified using Glassbox properties file. Analysis part describes the problem precisely and very clearly in plain English words, rather than displaying large code/exception trace. This definitely increases developer productivity by reducing developer’s time spent in log files and using IDE debuggers. Internals The two main components of Glassbox are Monitor and Agent. Monitor uses Aspect-Oriented Programming (AOP) to monitor the JVM activity. Agent diagnoses and presents the monitoring results and uses knowledge repository to cross reference the problem with suggestions/solutions. Glassbox agent supports viewing of the analysis results using JMX (eg. Java 5 JConsole) Consoles. Glassbox extensively uses the AOP approach internally to monitor the Java code. This gives the benefit of not making any changes to source code or build-process and hence can work with any legacy web application/jar file as well. Technologies Glassbox should work on any application server that supports Servlet 2.3 or later. The servers where Glassbox is tested and installation process is automated are Apache Tomcat, weblogic, websphere, Resin, Oracle OC4J, websphere, Resin, Jetty & GlassFish. Overhead Having Glassbox application running on same container would generate a performance overhead. Typically this would affect the response time and memory overhead. Hence it is recommended to start the Glassbox application only when it’s required for performance monitoring. Licensing Glassbox is an open source project, it is free to download and run. Glassbox uses the GNU Lesser General Public License to distribute software and documentation. Demo Application Development & Deployment to Tomcat To test the capabilities of Glassbox, a sample application is developed which has a TestServlet class. This servlet calls DelayGenerator class’s generateDelay() method. This method calls Thread class’s sleep() method which suspends the execution of servlet. A counter is being initialized in DelayGenerator class which determines the time interval till which servlet is needed to be suspended. TestServlet.java /** * File: TestServlet.java * @author Viral Thakkar */ package com.infosys.star.glassbox; import java.io.IOException; import java.io.PrintWriter; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; public class TestServlet extends HttpServlet { protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { DelayGenerator delayObj = new DelayGenerator(); int delay = delayObj.generateDelay(); response.setContentType("text/html"); PrintWriter out = response.getWriter(); out.println(""); out.println(" Hello World from Test Servlet : "+delay+" milliseconds "); out.println(""); out.flush(); } } DelayGenerator.java /** * File: DelayGenerator.java * @author Viral Thakkar */ package com.infosys.star.glassbox; public class DelayGenerator { private static int counter = 1; public int generateDelay() { try { Thread.sleep(counter * 100); counter++; } catch (InterruptedException e) { e.printStackTrace(); } return counter*100; } } Glassbox Installation & Integration to Apache Tomcat 6.0 Glassbox installation is very straightforward for non-clustered environment for the server where it’s automated. Simply drop the glassbox.war file at the appropriate folder inside server folder or perform the server specific steps/configuration to deploy the war file. Browse to server url with context root as glassbox – http://<>:/glassbox. Follow the instructions available on this page. According to specific server, this page would suggest the configuration changes for a server. Please refer to Glassbox User Guide document for details on how to install Glassbox for clustered application server environment. For Apache Tomcat 6.0- Add following command line arguments to Tomcat’s Java options: -Dglassbox.install.dir=C:\Tomcat6.0\lib\glassbox -Djava.rmi.server.useCodebaseOnly=true -javaagent:C:\Tomcat6.0\lib\aspectjweaver.jar Monitoring & Technical Analysis Glassbox web client (URL- http://<>:<>/glassbox ) shows the summary and detailed view of all the requests/operations that container/JVM has executed. Summary Section View Different attributes (columns) which gets displayed in this table are as below - Attribute / Column Name Comments Status This indicates whether operation/request is performing OK, SLOW or FAILING Analysis For SLOW/FAILING status, this value provides the small summary of the cause of the problem. Operation This is name of the operation/request of an application Server Name of the server where monitoring is being done. In a clustered environment, this allows to distinguish operations on different servers. Executions This value indicates how many times this operation has run since the application server was started or Glassbox’s statistics were last reset. Click the request in above summary table to view its detailed analysis in below detailed section. Detailed Section View The details area provides information relating to operations selected in the summary table. Different sub-sections which gets displayed in this view are as below - Sub-section Name Comments Executive Summary High level summary view of the selected operation gets displayed in a table format. This is neat view to senior stake holders who are not interested in technical details. Technical Summary This section contains more technical details in paragraph and table representation formats to provide insight into root cause of the problem if any, like which operation, query is slow and statistics of same. Details like stack trace, thread lock name are provided to find and fix the problem. “Common solutions” sub section shows pointers to resolve the identified problem/s. “Glassbox has ruled out other potential problems” sub section saves time to know what problems have already been ruled out. Executive Summary View Technical Summary -> Technical Details Views Above two snapshots are parts of the Technical Details section and provide minute details at code level with line number so as to pinpoint where the problem is. Here cause is identified at Class com.infosys.star.glassbox.DelayGenerator inside Method generateDelay at line number 12 where Thread.sleep is invoked. Perform Load Testing Using JMeter and Monitor Using Glassbox Apache JMeter is used to test performance both on static and dynamic resources (files, Servlets, Perl scripts, Java Objects, Data Bases and Queries, FTP Servers and more). It can be used to simulate a heavy load on a server, network or object to test its strength or to analyze overall performance under different load types. It can be used to make a graphical analysis of performance or to test server/script/object behavior under heavy concurrent load. Using JMeter, create a test plan that simulates 10 users requesting for 1 page 5 times. i.e. 10 x 1 x 5 = 50 HTTP requests. First step is to add a Thread Group element. The Thread Group tells JMeter the number of users to simulate, how often the users should send requests, and the how many requests they should send. Next step is to add HTTP Request element to added Thread Group. In parallel, have the Glassbox up and running to monitor response time statistics of the load generated by JMeter application. Below is the Executive summary view of above test in Glassbox web UI interface. Section “Monitoring & Technical Analysis” contains the details to understand the Glassbox generated analysis. Conclusion Glassbox is not the replacement for performance testing tool like load runner. Glassbox aids in the project to various stakeholders in finding, conveying and fixing the performance problems at all phases starting build (development) to post deployment. Glassbox application to be started/installed only during monitoring time so as to avoid the performance overhead for other applications due to CPU & memory footprint occupied by Glassbox application on the container. During load testing of the application, Glassbox turns out to be good option to figure out the root causes inside an application code. References Glassbox web site - http://www.glassbox.com/glassbox/Home.html Glassbox User Guide - http://nchc.dl.sourceforge.net/sourceforge/glassbox/Glassboxv2.0UserGuide.pdf Apache JMeter - http://jakarta.apache.org/jmeter/ Download & Support Glassbox Download Link - http://www.glassbox.com/glassbox/Downloads.html Glassbox forum Link - http://sourceforge.net/forum/forum.php?forum_id=575670 About Author Viral Thakkar is a Technical Architect with the Banking and Capital Markets vertical at Infosys. He has 9.5 years of technology consulting experience mainly on Java/JEE technologies and frameworks with large banks and financial institutions across the globe. He has been part of many small and large-scale initiatives related to application development, architecture creation and strategy definition. From http://viralpatel.net/blogs

March 5, 2009

by Viral Thakkar

· 20,704 Views

Open Source : How Do You Stay Up To Date?

I Love the concepts and beliefs behind Open Source. I use Open Source libraries, applications etc. all the time. One of the things I have always found a challenge though, is knowing when a new release comes to be. Not only that, is this a simple point release, a major release or a security release. For the most part I want to stay up to date, especially with the libraries I use, when a security releases is made. Currently I play the, I hope I have the most stable, most secure version. Howeaver, I always thought that having a single place that I can sign-up to be informed about releases would be awesome. Do you have a way to do this or, are you subscribing to a multitude of RSS feeds and mailing lists to stay informed?

January 10, 2009

by Schalk Neethling

· 5,147 Views

The Three Pillars of Continuous Integration

Continuous Integration commonly known as CI is a process that consists of continuously compiling, testing, inspecting, and deploying source code. In any typical CI environment, this means running a new build every time code changes within a version control repository. Martin Fowler describes CI as: A software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly. While CI is actually a process, the term Continuous Integration often is associated with three important tools in particular. As shown in the image the three pillars of CI are: 1. A version control repository like Subversion, or CVS. 2. A CI Server such as Hudson, or Cruise Control 3. An automated build process like Ant or Nant So, let’s look at each of these in detail: Version Control Repository: Version control repositories also known as SCM (source code management) play a crucial role in any software development environment. They also play a very important role for a successful CI process. The SCM is a central place for the team to store every needed artifact for the project. It is mandatory for the teams to put everything needed for a successful build into this repository. This includes the build scripts, property files, database scripts, all the libraries required to build the software and so on. The CI Server: For CI to function properly, we also need to have an automated process that monitors a version control repository and runs a build when any changes are detected. There are several CI servers available, both open source and commercial. Most of them are similar in their basic configuration and monitor a particular version control repository and run builds when any changes are detected. Some of the most commonly used open source CI servers are; Cruise Control, Continuum, and Hudson. Hudson is particularly interesting because of its ease of configuration and compelling plug-ins, which makes integration with test and static analysis tools much easier. Automated Build: The process of CI is about building software often, which is accomplished through the use of a build. A sturdy build strategy is by far the most important aspect of a successful CI process. In the absence of a solid build that does more than compile your code, CI withers. With automated builds, teams can reliably perform (in an automated fashion) otherwise manual tasks like compilation, testing, and even more interesting things like software inspection and deployment. Now that we have seen the important tools in our CI process, let’s see how a typical CI scenario looks like for a developer: CI server is configured to poll the version control repository continuously for changes. Developer commits code to the repository. CI server detects this change, and retrieves the latest code from the repository. This causes the CI server to invoke the build script with the given targets and options. If configured, CI Server will send out an e-mail to the specified recipients when a certain important event occurs. The CI server continues to poll for changes. Why is CI Important? This is one of the most frequently asked questions, and here are a few points to note about this powerful technique: Building software often greatly increases the likelihood that you will spot defects early, when they still are relatively manageable. Extends defect visibility. CI ensures that you have production ready software at every change. CI also ensures that you have reduced the risk of integration issues by building software at every change. CI server can also be configured to run continuous inspection which can assist the development team in finding potential bugs, bad programming practice, automatically check coding standards, and also provide valuable feedback on the quality of code being written. Over the past several months, I have assisted several companies in implementing CI. There was a little bit of resistance from the developers in the early stages when we implemented continuous feedback. But, never heard a single negative comment about this approach. If you already have a version control repository and automated builds, you are very close to the CI process. Download one of the open source CI servers, configure and setup a simple project. It should take less than an hour if you have automated build scripts. Start adding additional features like code inspections, generating reports, metrics, documentation and so on. Most important, send continuous feedback to your team. Give this process a try, you sure will be surprised to see how effective it is. And, as always share your thoughts, concerns or questions.

December 15, 2008

by Meera Subbarao

· 23,965 Views

Compute Grids vs. Data Grids

in a nutshell, grid computing is a way to distribute your computations across multiple computers (nodes). however, even jms does that, but jms is not a grid computing product - it's a messaging protocol. to correctly classify grid computing products we have to split them into 2 categories: compute grids and data grids. compute grid compute grids allow you to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel. the obvious benefit here is that your computation will perform faster as it now can use resources from all grid nodes in parallel. one of the most common design patterns for parallel execution is mapreduce . however, compute grids are useful even if you don't need to split your computation - they help you improve overall scalability and fault-tolerance of your system by offloading your computations onto most available nodes. some of the "must have" compute grid features are: automatic deployment - allows for automatic deployment of classes and resources onto grid without any extra steps from user. this feature alone provides one of the largest productivity boosts in distributed systems. users usually are able to simply execute a task from one grid node and as task execution penetrates the grid, all classes and resources are also automatically deployed. topology resolution - allows to provision nodes based on any node characteristic or user-specific configuration. for example, you can decide to only include linux nodes for execution, or to only include a certain group of nodes within certain time window. you should also be able to choose all nodes with cpu loaded, say, under 50% that have more than 2gb of available heap memory. collision resolution - allows users to control which jobs get executed, which jobs get rejected, how many jobs can be executed in parallel, order of overall execution, etc. load balancing - allows to balance properly balance your system load within grid. usually range of load balancing policies varies within products. some of the most common ones are round robin, random, or adaptive. more advanced vendors also provide affinity load balancing where grid jobs always end up on the same node based on job's affinity key. this policy works well with data grids described below. fail-over - grid jobs should automatically fail-over onto other nodes in case of node crash or some other job failure. checkpoints - long running jobs should be able to periodically store their intermediate state. this is useful for fail-overs, when a failed job should be able to pick up its execution from the latest checkpoint, rather than start from scratch. grid events - a querying mechanism for all grid events is essential. any grid node should be able to query all events that happened on remote grid nodes during grid task execution. node metrics - a good compute grid solution should be able to provide dynamic grid metrics for all grid nodes. metrics should include vital node statistics, from cpu load to average job execution time. this is especially useful for load balancing, when the system or user need to pick the least loaded node for execution. pluggability - in order to blend into any environment a good compute grid should have well thought out pluggability points. for example, if running on top of jboss, a compute grid should totally reuse jboss communication and discovery protocols. data grid integration - it is important that compute grid are able to natively integrate with data grids as quite often businesses will need both, computational and data features working within same application. some compute grid vendors: - gridgain - professional open source - jppf - open source data grid data grids allow you to distribute your data across the grid. most of us are used to the term distributed cache rather than data grid (data grid does sound more savvy though). the main goal of data grid is to provide as much data as possible from memory on every grid node and to ensure data coherency. some of the important data grid features include: data replication - all data is fully replicated to all nodes in the grid. this strategy consumes the most resources, however it is the most effective solution for read-mostly scenarios, as data is available everywhere for immediate access. data invalidation - in this scenario, nodes load data on demand. whenever data changes on one of the nodes, then the same data on all other nodes is purged (invalidated). then this data will be loaded on-demand the next time it is accessed. distributed transactions - transactions are required to ensure data coherency. cache updates must work just like database updates - whenever an update failed, then the whole transaction must be rolled back. most data grid support various transaction policies, such as read committed, write committed, serializable, etc... data backups - useful for fail-over. some data grid products provide ability to assign backup nodes for the data. this way whenever a node crashes, the data is immediately available from another node. data affinity/partitioning - data affinity allows you to split/partition your whole data set into multiple subsets and assign every subset to a grid node. in the purest form, data is not replicated between nodes at all, every node is only responsible for it's own subset of data. however, various data grid products may provide different flavors of data affinity, such as replication only to back up nodes for example. data affinity is one of the more advanced features, and is not provided by every vendor. to my knowledge, according to product websites, out of commercial vendors oracle coherence and gemstone have it (there may be others). in professional open source space you can take a look at combination of gridgain with affinity load balancing and jbosscache . some data grid/cache vendors: - oracle coherence - commercial - gemstone - commercial - gigaspaces - commercial - jbosscache - professional open source - ehcache - open source

July 31, 2008

by Dmitriy Setrakyan

· 28,405 Views · 3 Likes

Understanding HBase and BigTable

The hardest part about learning Hbase (the open source implementation of Google's BigTable), is just wrapping your mind around the concept of what it actually is. I find it rather unfortunate that these two great systems contain the words table and base in their names, which tend to cause confusion among RDBMS indoctrinated individuals (like myself). This article aims to describe these distributed data storage systems from a conceptual standpoint. After reading it, you should be better able to make an educated decision regarding when you might want to use Hbase vs when you'd be better off with a "traditional" database. It's all in the terminology Fortunately, Google's BigTable Paper clearly explains what BigTable actually is. Here is the first sentence of the "Data Model" section: A Bigtable is a sparse, distributed, persistent multidimensional sorted map. Note: At this juncture I like to give readers the opportunity to collect any brain matter which may have left their skulls upon reading that last line. The BigTable paper continues, explaining that: The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Along those lines, the HbaseArchitecture page of the Hadoop wiki posits that: HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have crazily-varying columns, if the user likes. Although all of that may seem rather cryptic, it makes sense once you break it down a word at a time. I like to discuss them in this sequence: map, persistent, distributed, sorted, multidimensional, and sparse. Rather than trying to picture a complete system all at once, I find it easier to build up a mental framework piecemeal, to ease into it... map At its core, Hbase/BigTable is a map. Depending on your programming language background, you may be more familiar with the terms associative array (PHP), dictionary (Python), Hash (Ruby), or Object (JavaScript). From the wikipedia article, a map is "an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value." Using JavaScript Object Notation, here's an example of a simple map where all the values are just strings: { "zzzzz" : "woot", "xyz" : "hello", "aaaab" : "world", "1" : "x", "aaaaa" : "y" } persistent Persistence merely means that the data you put in this special map "persists" after the program that created or accessed it is finished. This is no different in concept than any other kind of persistent storage such as a file on a filesystem. Moving along... distributed Hbase and BigTable are built upon distributed filesystems so that the underlying file storage can be spread out among an array of independent machines. Hbase sits atop either Hadoop's Distributed File System (HDFS) or Amazon's Simple Storage Service (S3), while a BigTable makes use of the Google File System (GFS). Data is replicated across a number of participating nodes in an analogous manner to how data is striped across discs in a RAID system. For the purpose of this article, we don't really care which distributed filesystem implementation is being used. The important thing to understand is that it is distributed, which provides a layer of protection against, say, a node within the cluster failing. sorted Unlike most map implementations, in Hbase/BigTable the key/value pairs are kept in strict alphabetical order. That is to say that the row for the key "aaaaa" should be right next to the row with key "aaaab" and very far from the row with key "zzzzz". Continuing our JSON example, the sorted version looks like this: { "1" : "x", "aaaaa" : "y", "aaaab" : "world", "xyz" : "hello", "zzzzz" : "woot" } Because these systems tend to be so huge and distributed, this sorting feature is actually very important. The spacial propinquity of rows with like keys ensures that when you must scan the table, the items of greatest interest to you are near each other. This is important when choosing a row key convention. For example, consider a table whose keys are domain names. It makes the most sense to list them in reverse notation (so "com.jimbojw.www" rather than "www.jimbojw.com") so that rows about a subdomain will be near the parent domain row. Continuing the domain example, the row for the domain "mail.jimbojw.com" would be right next to the row for "www.jimbojw.com" rather than say "mail.xyz.com" which would happen if the keys were regular domain notation. It's important to note that the term "sorted" when applied to Hbase/BigTable does not mean that "values" are sorted. There is no automatic indexing of anything other than the keys, just as it would be in a plain-old map implementation. multidimensional Up to this point, we haven't mentioned any concept of "columns", treating the "table" instead as a regular-old hash/map in concept. This is entirely intentional. The word "column" is another loaded word like "table" and "base" which carries the emotional baggage of years of RDBMS experience. Instead, I find it easier to think about this like a multidimensional map - a map of maps if you will. Adding one dimension to our running JSON example gives us this: { "1" : { "A" : "x", "B" : "z" }, "aaaaa" : { "A" : "y", "B" : "w" }, "aaaab" : { "A" : "world", "B" : "ocean" }, "xyz" : { "A" : "hello", "B" : "there" }, "zzzzz" : { "A" : "woot", "B" : "1337" } } In the above example, you'll notice now that each key points to a map with exactly two keys: "A" and "B". From here forward, we'll refer to the top-level key/map pair as a "row". Also, in BigTable/Hbase nomenclature, the "A" and "B" mappings would be called "Column Families". A table's column families are specified when the table is created, and are difficult or impossible to modify later. It can also be expensive to add new column families, so it's a good idea to specify all the ones you'll need up front. Fortunately, a column family may have any number of columns, denoted by a column "qualifier" or "label". Here's a subset of our JSON example again, this time with the column qualifier dimension built in: { // ... "aaaaa" : { "A" : { "foo" : "y", "bar" : "d" }, "B" : { "" : "w" } }, "aaaab" : { "A" : { "foo" : "world", "bar" : "domination" }, "B" : { "" : "ocean" } }, // ... } Notice that in the two rows shown, the "A" column family has two columns: "foo" and "bar", and the "B" column family has just one column whose qualifier is the empty string (""). When asking Hbase/BigTable for data, you must provide the full column name in the form ":". So for example, both rows in the above example have three columns: "A:foo", "A:bar" and "B:". Note that although the column families are static, the columns themselves are not. Consider this expanded row: { // ... "zzzzz" : { "A" : { "catch_phrase" : "woot", } } } In this case, the "zzzzz" row has exactly one column, "A:catch_phrase". Because each row may have any number of different columns, there's no built-in way to query for a list of all columns in all rows. To get that information, you'd have to do a full table scan. You can however query for a list of all column families since these are immutable (more-or-less). The final dimension represented in Hbase/BigTable is time. All data is versioned either using an integer timestamp (seconds since the epoch), or another integer of your choice. The client may specify the timestamp when inserting data. Consider this updated example utilizing arbitrary integral timestamps: { // ... "aaaaa" : { "A" : { "foo" : { 15 : "y", 4 : "m" }, "bar" : { 15 : "d", } }, "B" : { "" : { 6 : "w" 3 : "o" 1 : "w" } } }, // ... } Each column family may have its own rules regarding how many versions of a given cell to keep (a cell is identified by its rowkey/column pair) In most cases, applications will simply ask for a given cell's data, without specifying a timestamp. In that common case, Hbase/BigTable will return the most recent version (the one with the highest timestamp) since it stores these in reverse chronological order. If an application asks for a given row at a given timestamp, Hbase will return cell data where the timestamp is less than or equal to the one provided. Using our imaginary Hbase table, querying for the row/column of "aaaaa"/"A:foo" will return "y" while querying for the row/column/timestamp of "aaaaa"/"A:foo"/10 will return "m". Querying for a row/column/timestamp of "aaaaa"/"A:foo"/2 will return a null result. sparse The last keyword is sparse. As already mentioned, a given row can have any number of columns in each column family, or none at all. The other type of sparseness is row-based gaps, which merely means that there may be gaps between keys. This, of course, makes perfect sense if you've been thinking about Hbase/BigTable in the map-based terms of this article rather than perceived similar concepts in RDBMS's. And that's about it Well, I hope that helps you understand conceptually what the Hbase data model feels like. As always, I look forward to your thoughts, comments and suggestions.

May 22, 2008

by Jim Wilson

· 84,947 Views · 5 Likes

Interview: Game Over for the JDK's Date and Time Classes

JSR 310 aims to modernize the date and calendar classes. The goal is to provide a more advanced and comprehensive model for date and time than those found in the Date and Calendar APIs. The JSR's leaders, Stephen Colebourne and Michael Nascimento, are presenting their work at JavaOne and give an overview below. Firstly, please briefly introduce yourselves. Michael Nascimento. I'm a senior technical consultant at Summa technologies and the founder of the Genesis open source project. I have also served as an expert on a few JSRs, such as the Common Annotations for the Java Platform (JSR-250). Stephen Colebourne. I am employed building travel e-commerce booking engines and am involved with many open source projects, such as Apache Commons and JodaTime. I am involved in this JSR because of my JodaTime project. JodaTime? What's that? Stephen: JodaTime provides a complete replacement of the date and time classes in the JDK: public boolean isAfterPayDay(DateTime datetime) { if (datetime.getMonthOfYear() == 2) { // February is month 2!! return datetime.getDayOfMonth() > 26; } return datetime.getDayOfMonth() > 28; } public Days daysToNewYear(LocalDate fromDate) { LocalDate newYear = fromDate.plusYears(1).withDayOfYear(1); return Days.daysBetween(fromDate, newYear); } public boolean isRentalOverdue(DateTime datetimeRented) { Period rentalPeriod = new Period().withDays(2).withHours(12); return datetimeRented.plus(rentalPeriod).isBeforeNow(); } public String getBirthMonthText(LocalDate dateOfBirth) { return dateOfBirth.monthOfYear().getAsText(Locale.ENGLISH); } What are the main things that are wrong with the current date and time classes? Stephen: The existing classes are pretty bad—probably the worst APIs in the JDK. They're buggy, mutable, cumbersome, many bugs, and they tend not to be threadsafe. Michael: The original date class comes from JDK 1.0. At the time, James Gosling tried to follow the related functions in C and didn't put much force into designing them from scratch. For example, they can't be internationalized and only local timezones are supported. Stephen: Right. The Gregorian calendar class is a direct port of the C-class, such as "January = 0". So, if you enter the month "12", the month is January because the algorithm wraps around. The algorithm performs calculations such as this that you don't expect. For example, with the Gregorian calendar class, getYear(), getMonth(), and getDay() are quick, while if you call combinations of getyear(), setyear() (and getMonth() setMonth(), and so on), performance will be bad because lots of calculations are done unexpectedly. Politely put, one can describe these classes as exhibiting "unusual performance characteristics". Why has it taken so long to fix these various problems? Stephen: People have known of these problems for several years. Some attempts have been made to fix the Calendar class, but it only got worse. Fixing these issues once and for all has never been a high enough priority. So why now and why you? Stephen: I started JodaTime in 2000/2001 and gradually solved the standard date and time class problems, releasing it in 2003. My solution has been picked up across the board, from small applications to the largest advertizing systems in the world. The point is that I wanted the solution to exist a few years as JodaTime, before heading into a JSR so that all the issues would have been identified in preparation for the JSR. In a nutshell, what does JodaTime offer me? Michael: Firstly, a better quality API. Stephen: Secondly, JodaTime supports a number of additional concepts. Firstly, "periods", such as if you wanted to store the concept of 5 weeks and 3 days. Secondly, "intervals", so that you'll be able to store the interval between the start of JavaOne and its end, i.e., for example, from Monday May 5, 9 a.m. to May 9, 3 p.m. Thirdly, an updated timezone implementation to make it easy to pick up timezone changes, which could even be on an annual basis. Finally, handling of different calendar systems, such as Islamic calendar systems / Coptic calendar systems, and so on, which don't exist in the standard JDK. Michael: The third point is why I got interested in this JSR in the first place. I'm from Brazil where the daylight saving systems change each year and there's always one or two weeks of chaos. I asked myself why things go wrong every year around this issue. Stephen. Possibly we could offer a solution consisting of a JAR file with the latest set of rules, which you could then put on the classpath. However, sometimes you'd need both sets of rules at the same time. We're still thinking about these situations and ought to be able to come up with something. By the way, where does the name "Joda" come from? Stephen: "Joda" was a 4 letter domain name starting with "J" that was free in 2003. I simply typed random things beginning with "J" and found that that one was free... Where is the JSR process now? Michael: We are progressing it in an open manner. All discussions are on public mailing lists and Wikis. All repositories are open and Issuezilla is open. Stephen: We are using java.net to build a reference implementation and a testing kit in Subversion. People can go there and try it out. It is all "work in progress". The basic API is there. Right now, parsing needs to be finished and some loose ends need to be tidied up. Parsing, intervals, and multiple calendar systems are missing at the moment. Can you say something about the JSR's timeline? Stephen: We hope that we'll be in Java 7, but given that there's no date for it, there's no guarantee that we'll finish in time. We received a little bit of funding from the OpenJDK challenge to get to early draft review by August. Michael: It's really important that people get involved, the last chance to influence design aspects is the early draft review, scheduled for August, which is coming near. Two previous attempts have been made for rewriting these classes and it's unlikely there'll be another one after ours. So it is really important to let your voice be heard because the more feedback we get the better. Stephen: There's been good quality feedback. We've had suggestions consisting of sample implementations of intervals, people pointing to different ISO specifications, and suggestions to expand into areas outside our scope. People should take a look at the algorithms too. Maybe someone could come up with better algorithms than those that we already have. Michael: We've also been nominated for a JCP Program Award, probably because we're the main examples of individuals, rather than a company, leading a JSR. The results will be announced on Tuesday during JavaOne. Will you present something around your JSR at JavaOne? Michael: Our technical session on Thursday at 1.30 is completely full and there's a repeat session on Friday at the same time, that is, at 1.30. In the session, we will cover all the basic classes, show examples of the code and how to get started with it. Stephen: There'll be little bit of explanation around the design principles, with examples of how bad the current date and time classes are. There'll also be a small puzzler, asking participants to identify the number of bugs in an existing bit of JDK code... Further Reading JSR 310 JSR 310 Technical Sessions at JavaOne JSR 310 General Purpose Area for Anyone to Leave Messages JSR 310 Mailing List Stephen Colebourne's blog Michael Nascimento's blog JCP Program's Award Nominations

May 5, 2008

by Geertjan Wielenga

· 13,323 Views

1st Binary Release of Java Music Composer

Today marks the availability of the first binary release of the open source JFugue Music NotePad. The core basic functionality of the tool is available and ready to be tried out. After downloading and unzipping the archive, you need to specify the location of the JDK (JDK 5 or above, so Mac users are welcome too), in etc/mnotepad.conf, which is in the unzipped archive's main directory. The binary can be downloaded here: mnotepad_9April2008.zip Once you have done so, you can launch it, which should result in the following main window at start up: Here it is on the Mac: Then choose File | New, to define a new composition: Click Finish and you can begin composing music. To do so, choose notes from the toolbar and point/click to add them to the composition. Currently you can't add notes between existing notes. However, select one/more notes with your mouse, the notes turn red to show they are selected, and then use up/down to move notes up the scale and left/right to decrease/increase their length. While doing this part of the work, compositions typically look something like this: You can change instruments by right-clicking one in the Instruments window and choosing "Select". Finally, click Play to play the music or Save to save it. Compositions are saved to disk in midi format, the status bar shows where they are saved to, in the installation directory. Architecture. Instead of dealing directly with the Midi API, the JFugue API is used instead. The JFugue API provides an extremely transparent, simple, yet surprisngly powerful layer of functionality on top of the complexities of typical Midi programming. The user interface is pure Swing on top of the NetBeans Platform, therefore the application has a very mature window system (max/minimize, dock/undock) and is pluggable out of the box, among other features. User Comments. One of the user comments thus far: "It's definitely very cool! And as promised, you already got a whole bunch of subtle features for free from the NetBeans Platform (windowing, favorites, etc). :-D It already looks more stable and trustworthy than any other app with the same amount of work invested, just because of the solid and consistent windowing system." Known Issues. Amongst others, the following: several usability issues, such as that it isn't easy to know/see from the ui that notes can be changed by selecting them and moving up/down/left/right. Should be able to specify where saved midi file should be saved to. Some users report not being able to change instruments. Keyboard window currently unused. Playing of tune blocks the ui. Feedback Welcome. The JFugue Music NotePad is an open source project at https://nbjfuguesupport.dev.java.net/. You are welcome to join, especially if you are able to offer time/insight to add missing features. Especially programmers who are also musicians are welcome. For example, this application could do with a metronome, which could be provided by a separate plugin...

April 9, 2008

by Geertjan Wielenga

· 12,799 Views

John Wilson: Groovy and XML

John Wilson is mainly known to the Groovy community because of his work on XmlSlurper, one of the easiest ways to work with XML in the JVM. Continue reading to learn what inspired John to get into Groovy. Enjoy! [img_assist|nid=2119|title=|desc=|link=none|align=right|width=220|height=188]Q. John, you are the creator of XmlSlurper, what motivated you to make it ? A. I had a problem with processing very large XML documents. XmlParser uses a simple and robust way of implementing GPath expressions which involves building an array to hold the result of each term on the expression. Unfortunately this means that you can consume large amounts of memory if the original document is large. I was getting out of memory errors quite a bit so I wrote the original version of XmlSlurper to use Iterarors rather than arrays which cut down the memory footprint quite a bit. It had the happy side effect of being faster too. I have rewritten XmlSlurper a couple of times since then and it now plays very well with StreamingMarkupBuilder, handles namespaces nicely and has an interesting way of doing edits on the fly as the slurped document is written out. Q. XmlSluper and XmlParser are so similar, is there a reason to have both ? A. On the one hand it's a disadvantage because users are unsure which one to use (The answer is - for most things it doesn't matter). However they have differences which are significant and, in my view, valuable. The most significant difference is their approach to editing the document. XmlParser is very straightforward, you just change the in memory tree structure which represents the document. XmlSlurper does not let you have direct access to the in memory data structure. It forces you to put you editing code in closures and specify where in the document the closures should be applied. It then does the editing on the fly as the document is written out. The first mechanism is simple but limited the second is more complicated but very powerful. Most of us only want to do relatively simple operations on small XML documents and XmlParser is excellent for that. For those people who want to do rather complex operations on XMl documents which can be quite large then XmlSlurper is a better choice. Fortunately they support an almost identical GPath syntax so switching from one to the other is no big deal. This was not always so - the community owes a big debt of gratitude to Paul King for doing a large amount of work on documenting these implementations and aligning them. My long term aim is to be able to do away altogether with the need to hold the whole document in memory but to stream the document through memory whist executing the GPath expressions. Q. You are also the creator of the xmlrpc module, what can you tell us about it ? A. It's a module I'm very fond of. It's an excellent example of how Groovy can make something that is quite complex in Java completely trivial. I can build an XML-RPC server in 4 lines of Groovy and a client in 1 line. Performance is excellent and it interoperates well. It is based on code I wrote a few years ago to implement XML-RPC on the Dallas Semiconductor TINI. The TINI is an amazing device the size of a memory SIMM which runs Java on a 8051 (the processor which controlled the keyboard in the original IBM PC) with 1Mb of battery backed up RAM and an Ethernet port. One of my favourite TINI apps was a weather forecasting toaster! [it's true! check it out for yourselves here] Q. Have you participated in other Open Source projects ? A. Yes, mostly in the embedded Java arena. I wrote tiny XML parsers (MinML and MinML2) and a tiny XML-RPC server (MinMl-RPC) i have also contributed to VNC and the Snort intrusion detections system. In the background I'm working on Ng which is an attempt to build a runtime system for dynamic languages on the JVM which is both simple and fast. Q. Ng, can you share more about it? A. Ng is a solo (at the moment) project which tries to answer the question: How can we implement a fully dynamic language on the current JVM which runs no more than ten times slower than Java? This is looking for an improvement of one to two orders of magnitude over current implementations (Groovy, JRuby, Jython, etc.). The idea is to design a programming language "backwards". I start with a highly optimised runtime system and then derive a language which can be optimally compiled for that runtime system. I'm hoping that some of the insights I get whist doing this can be fed back into the Groovy 2.0 MOP redesign. I'm making good but slow progress. I have arithmetic operations executing at less than twice as slow as Java in some benchmarks and method calls are coming below ten times as slow. I have started to document some of the techniques I have developed http://docs.google.com/View?docid=ah76zbd6xsx2_9ck33c8dp Q. How did you get involved with Groovy ? A. I was looking for an Open Source project to get involved in. I have a long term interest in programming languages (my first paid job was as a compiler writer in 1971). I looked at Ruby and JRuby but it was too Perlish for my tastes and the JRuby project looked moribund. Google found me Groovy and I liked the feel of the language and the community was very lively so I stuck around. Q. Do you use Groovy at work ? A. Yes. If I have to mung XML I will always do it in Groovy. I also spend quite a bit of time building DSLs in Groovy. I think the return on investment in DSLs is huge if they are done properly. Q. Do you have a preferred technique for building DSLs (builders, metaprogramming, ... ) ? A. I like builders a lot. I think that the Builder concept is one of James Strachan's best ideas. I built a little DSL to allow people to specify arbitrary graphs - it took about an hour to develop and it's saved days in allowing us to specify complex graphs simply, clearly and reliably. I tend to override invokeMethod, etc. or use Categories rather than ExpandoMetaClass to do MetaPrograming magic. That's probably because to got into the habit before Graeme wrote ExpandoMetaClass. However I do like the fact that Categories allow me to limit the extent of the change to a single thread - they need to have less impact on performance, though. Q. Is there a specific feature you would like to see in a future version of Groovy ? A. I think Inner Classes need to be added. Other than that I don't see much urgent need for language extensions. Quite a lot of work has been done on making the run time system cleaner and that work needs to continue. The speed of the implementation has been improved in the last few months but there is more work needed there (especially with Categories). the big thing I'd like to see is the ability to not compile to class files but to execute the AST (Abstract Syntax Tree) directly. JRuby does this and it can be very useful in cases where you are generating code dynamically and executing it once or twice before discarding it (which is quite a common use). It would also help with the Groovy console. Thanks John! John's bio John Wilson has been a programmer, project manager, teacher, CTO and CEO. He's now CTO of an English engineering company and is enjoying working with a great crowd in the Groovy/Grails community.

April 1, 2008

by Andres Almiray

· 17,282 Views · 1 Like

SVNKit: Tame Subversion with Java!

SVNKitis an Open Source pure Java Subversion library. SVNKit literally brings Subversion, popular open source version control system, to the Java world. With SVNKit you can do the following: All standard Subversion operations: For instance, the following snipped checks out project from repository: File dstPath = new File("c:/svnkit"); SVNURL url = SVNURL. parseURIEncoded("http://svn.svnkit.com/repos/svnkit/branches/1.1.x/"); SVNClientManager cm = SVNClientManager.newInstance(); SVNUpdateClient uc = cm.getUpdateClient(); uc.doCheckout(url, dstPath, SVNRevision.UNDEFINED, SVNRevision.HEAD, true); Updates it to the latest revision: uc.doUpdate(dstPath, SVNRevision.HEAD, true); And finally commits local changes in "www" subdirectory if there are any: SVNCommitClient cc = cm.getCommitClient(); cc.doCommit(new File[] {new File(dstPath, "www")}, false, "message", false, true); SVNKit supports all standard Subversion operations and compatible with the latest version of Subversion. Access Subversion repository directly: Some applications will benefit from working with repository directly, without keeping working copy locally. Example below displays list of files in "www" directory. SVNURL url = SVNURL.parseURIEncoded("http://svn.svnkit.com/repos/svnkit/branches/1.1.x/"); SVNRepository repos = SVNRepositoryFactory.create(url); long headRevision = repos.getLatestRevision(); Collection entriesList = repos.getDir("www", headRevision, null, (Collection) null); for (Iterator entries = entriesList.iterator(); entries.hasNext();) { SVNDirEntry entry = (SVNDirEntry) entries.next(); System.out.println("entry: " + entry.getName()); System.out.println("last modified at revision: " + entry.getDate() + " by " + entry.getAuthor()); } Direct repository access API allows to perform operations like update, commit, diff and many other. Additionaly to the performance benefits of the direct access to repository, this API makes it possible to version arbitrary objects or object models within Subevrsion repository, not only files from the file system. Replace JNI Subversion bindings with SVNKit: Native Subversion provides Java interface that works with Subversion binaries through JNI. In case you already using it or would like to use as an option, you may also use SVNKit through exactly the same interface. This way you'll let your application dynamically switch between JNI and SVNKit implementation of the same API or let your application work on the platforms where there are no native Subversion binaries. For example: // pure Java implementation of the standard Subversion Java interface SVNClientInterface jniAPI = SVNClientImpl.newInstance(); byte[] contents = jniAPI.fileContent("http://svn.svnkit.com/repos/svnkit/branches/1.1.x/changelog.txt", Revision.HEAD); SVNKit is widely used in different applications, including IntelliJ IDEA, Eclipse Subversion integrations, SmartSVN, JDeveloper, bug tracking server side applications (e.g. Atlassian JIRA) and repository management and tracking tools (e.g. Atlassian FishEye) and many others. Where to get more information: Recently we've released SVNKit version 1.1.6 which is bugfix release. At http://svnkit.com/ you will find more information on that new version and, of course, downloads, documentation, source code example and articles explaining how to use SVNKit. In case of any questions you're welcome at our mailing list, or just contact us at [email protected] SVNKit is widely used in different applications, including IntelliJ IDEA, Eclipse Subversion integrations, SmartSVN, JDeveloper, bug tracking server side applications (e.g. Atlassian JIRA) and repository management and tracking tools (e.g. Atlassian FishEye) and many others. With best regards, TMate Software, http://svnkit.com/ - Java [Sub]Versioning Library!

February 26, 2008

by Alexander Kitaev

· 11,348 Views · 2 Likes

VisualVM: Free and Open Source Java Troubleshooter

Trying to troubleshoot Java? VisualVM is a great, free, open source tool.

February 21, 2008

by Geertjan Wielenga

· 69,570 Views