DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest AI/ML Topics

article thumbnail
A collection with billions of entries
There are a number of problems with having a large number of records in memory. One way around this is to use direct memory, but this is too low level for most developers. Is there a way to make this more friendly? Limitations of large numbers of objects The overhead per object is between 12 and 16 bytes for 64-bit JVMs. If the object is relatively small, this is significant and could be more than the data itself. The GC pause time increases with the number of objects. Pause times can be around one second per GB of objects. Collections and arrays only support two billion elements Huge collections One way to store more data and still follow object orientated principles is have wrappers for direct ByteBuffers. This can be tedious to write, but very efficient. What would be ideal is to have these wrappers generated automatically. Small JavaBean Example This is an example of JavaBean which would have far more overhead than actual data contained. interface MutableByte { public void setByte(byte b); public byte getByte(); } It is also small enough that I can create billions of these on my machine. This example creates a List with 16 billion elements. final long length = 16_000_000_000L; HugeArrayList hugeList = new HugeArrayBuilder() {{ allocationSize = 4 * 1024 * 1024; capacity = length; }.create(); List list = hugeList; assertEquals(0, list.size()); hugeList.setSize(length); // add a GC to see what the GC times are like. System.gc(); assertEquals(Integer.MAX_VALUE, list.size()); assertEquals(length, hugeList.longSize()); byte b = 0; for (MutableByte mb : list) mb.setByte(b++); b = 0; for (MutableByte mb : list) { byte b2 = mb.getByte(); byte expected = b++; if (b2 != expected) assertEquals(expected, b2); } From start to finish, the heap memory used is as follows. with -verbosegc 0 sec - 3100 KB used [GC 9671K->1520K(370496K), 0.0020330 secs] [Full GC 1520K->1407K(370496K), 0.0063500 secs] 10 sec - 3885 KB used 20 sec - 4428 KB used 30 sec - 4428 KB used ... deleted ... 1380 sec - 4475 KB used 1390 sec - 4476 KB used 1400 sec - 4476 KB used 1410 sec - 4476 KB used The only GC is one triggered explicitly. Without the System.gc(); no GC logs appear. After 20 sec, the increase in memory used is from logging how much memory was used. Conclusion The library is relatively slow. Each get or set takes about 40 ns which really adds up when there are so many calls to make. I plan to work on it so it is much faster. ;) On the upside, it wouldn't be possible to create 16 billion objects with the memory I have, nor could it be put in an ArrayList, so having it a little slow is still better than not working at all. From http://vanillajava.blogspot.com/2011/08/collection-with-billions-of-entries.html
August 10, 2011
by Peter Lawrey
· 17,358 Views
article thumbnail
Human Readable vs Machine Readable Formats
Most file/serialization formats can be broadly broking into two formats, Human Readable Text and Machine Readble Binary. The Human Readable formats have the advantage of being easily understood by a person reading them. Machine readable formats are easier/faster for a machine to encode/decode. There are formats which attempt to be a little of both. XML, JSon, CSV are examples of these. However these do not achieve close to the performance a binary format can achieve. Myth: Machine Readable Binary is always more compact than a Human Readable Binary can be more compact, however the obscurity of its format makes it difficult to ensure every byte counts. i.e. its usually hard enough getting something work. Making it compact as well is an added complication. However with Human Readable formats, determing how the format can be made more compact is more easily understood. As text: 38 bytes long, [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] As binary: 290 bytes long, ....sr..java.util.ArrayListx.....a....I..sizexp....w.....sr..java.lang.Long; .....#....J..valuexr..java.lang.Number...........xp........sq.~..........sq.~ ..........sq.~..........sq.~..........sq.~..........sq.~..........sq.~ ..........sq.~..........sq.~..........sq.~..........sq.~..........x Even though the first format is more compact, you can immedately see you could drop the [ ] and spaces after the ", " to make it more compact. With the binary formats, it is hard to know where to start. ComparingHumanReadableToBinaryMain.java List longs = new ArrayList(); for(long i=-1;i<=10;i++) longs.add(i); String asText = longs.toString(); byte[] bytes1 = asText.getBytes(); System.out.println("As text: "+ bytes1.length+" bytes long, "+asText); ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(longs); oos.close(); byte[] bytes2 = baos.toByteArray(); System.out.println("As binary: "+bytes2.length+" bytes long, " +new String(bytes2, 0).replaceAll("[^\\p{Graph}]", ".")); Myth: Machine Readable Binary is always faster than a Human Readable Its assumed the cost of parsing data in a human readable format always makes it slower, however machine sreadbale formats have to deal with an issue human readbale formats takes for granted, that is byte endianness. For human readable formats the order of digits is fairly obvious, however for machine formats the byte endianess of the data might not match that the natrual byte order of the CPU, leading to a source of overhead (as it has to swap the bytes around) One example of this is using big-endian (e.g. TCP/Network byte order) on a little endian machine e.g. Windows/Linux Intel/AMD. A common class which has this issue is DataInputStream and DataOutputStream which re-arranges the byte order (even if the native byte order matches) For this reason, a fast human readable parse can be as fast or faster. In an earlier article I showed how a Human Readable format could be used to read/write integers 30% faster than using DataInput/DataOuput. Writing human readable data faster than binary. Myth: Using a Human Readable Format makes it easy to read Just using a human readable format doesn't mean it will be easier to read than a machine readable format. Reusing existing tools as much as possible makes human readable format preferrable. However, machine readable formats can come with tools which decode the data and make maintain it easier. If you have data which can only be managed with the use of specialist tools, being human readable is not much advantage. Images are a good example of where a machine readable format is the best option. It is hard to image editing or viewing an image without the need for a specialist tool. A practical human readable format would undoubtably lower the quality of the image. ;) ________/.- ,’_______`-. \ _________\ /`__________\’/ _________ /___’a___a`___\ _________|____,’(_)`.____ | _________\___( ._|_. )___ / __________\___ .__,’___ / __________.-`._______,’-.__ ________,’__,’___`-’___`.__`. _______/____/____V_____\___\_ _____,’____/_____o______\___`.__ ___,’_____|______o_______|_____`. __|_____,’|______o_______|`._____| ___`.__,’_.-\_____o______/-._`.__,’ __________/_`.___o____,’__\_ __.””-._,’_____`._:_,’_____`.,-””._ _/_,-._`_______)___(________’_,-.__\ (_(___`._____,’_____`.______,’___)_) _\_\____\__,’________`.____/.___/_/ On the other hand human readable formats can be almost as obscure. This is a piece of code written in a language I am not worthy of mentioning. ;) Its is descibed as "used to list all of the prime numbers between 1 and R" (!R)@&{&/x!/:2_!x}'!R Conclusion If you are designing a file format, start with a human readable one as its much easier to understand. If this is not compact enough, consider compressing it. If it is not fast enough concider making it a binary format, but make sure it really is faster to use such a format. If you are going to use a binary format make sure you have tools in place to supprot viewing (possibly editing) the data (which you would get for free with a text format) From http://vanillajava.blogspot.com/2011/07/human-readable-vs-machine-readble.html
July 13, 2011
by Peter Lawrey
· 21,549 Views · 5 Likes
article thumbnail
Developing Android Apps with NetBeans, Maven, and VirtualBox
I am an experienced Java developer who has used various IDEs and prefer NetBeans IDE over all others by a long shot. I am also very fond of Maven as the tool to simplify and automate nearly every aspect of the development of my Java project throughout its lifecycle. Recently, I started developing Android applications and naturally I looked for a Maven plugin that would manage my Android projects. Luckily I found the maven-android-plugin which worked like a charm and allowed me to use Maven for developing my Android projects. The Android Emulator from the Android SDK seemed unusably slow. Lucklily, I found a way to use an Android Virtual Machine for VirtualBox that worked nearly as fast as my native computer! This page documents my experiences. Tested Environment Dev machine: Ubuntu 11.04 Linux IDE: NetBeans VirtualBox: 4.0.8 r71778 Android SDK Revision 11, Add on XML Schema #1, Repository XML Schema #3 (from About in SDK and AVD Manager) Android Version: 2.2 Overview of Steps Download and install the Android SDK on your dev machine Attach an Android Device to dev machine Configure and load your device for development and other use Create an initial Android maven project Connect Android Device to Android SDK Debug Android app using NetBeans Graphical Debuger Download and Install Android SDK Download and install the Android SDK on your dev machine as described here. Make sure to set the following in dev machine ~/.bashrc file: export ANDROID_HOME=$HOME/android-sdk-linux_x86 #Change as needed export PATH="$ANDROID_HOME/tools:$ANDROID_HOME/platform-tools:$PATH" Attaching an Android Device to Dev Machine If you have an actual device that is usually always best. If not, you must use a virtual Android device which usually has various limitations (e.g. no GPS, Camera etc.). The Android SDK makes it easy to create a new Virtual Device but the resulting device is painfully slow in my experience and not usable. Do not bother with this. Instead, create a virtual Android device using VirtualBox as described in the following steps: Install virtual box and initial Android VM as described here: http://androidspin.com/2011/01/24/howto-install-android-x86-2-2-in-virtualbox/ http://geeknizer.com/how-to-run-google-android-in-virtualbox-vmware-on-netbooks/ Configure Android VM so it is connected bidirectionally with your dev machine over TCP as described here: http://stackoverflow.com/questions/61156/virtualbox-host-guest-network-setup I used the approach of configuring a HOST ONLY network adapater and a second NAT adapter on the Android VM within virtual box. Configuring your Android Device This section describes various things I did to setup a dev environment for my Android device: Root the device. I used Universal AndRoot Install ConnectBot so you have ssh and related network utilities Creating Initial Android Maven Application Create initial project using instructions here. I found it best to create stub project structure using the maven-archtype-plugin and the archtypes at https://github.com/akquinet/android-archetypes/wiki Connecting Android VM Device to Android SDK In order for your code to be deployed from NetBeans IDE to Android Device and in order for you to monitor your deployed app from the Dalvik Debug Monitor (ddms) you need to connect your android VM device to the android sdk over TCP as described in the following steps. On Android Device open the Terminal Emulator Type su to become root (your device must be rooted for this Type following commands in root shell: setprop service.adb.tcp.port 5555 stop adbd start adbd Type the following commands on dev machine shell. TODO: Note that IP address below is whatever is the ip address associated with the device (see ifconfig on linux for device vboxnet0) adb tcpip 5555 adb connect 192.168.0.101:5555 For details on above steps see: http://stackoverflow.com/questions/2604727/how-can-i-connect-to-android-with-adb-over-tcp Set up port forwarding as described here http://redkrieg.com/2010/10/11/adb-over-ssh-fun-with-port-forwards/ (this is where I am most fuzzy) Build your maven android project using Right-Click / Clean and Build Now for the acid test whether you can deploy your app to the device from NetBeans IDE! Right-click / Custom / Goal to show Run Maven dialog. Enter android:deploy in Goals field. Select Remember As button and enter android:deploy for its text field. If all is well, the app will deploy to the device and will show up in its "Applications" screen. Debugging Android App Using NetBeans Graphical Debugger Once you can build and deploy your app to the real or virtual Android device, here are the steps to debug the app using NetBeans debugger: On Device: Start the app (TODO: determine how to start app on device with JVM options so it can wait for debugger connection. This should be easy) On Dev Machine run Dalvik Debug Monitor (ddms) in background: $ANDROID_HOME/tools/ddms & Lookup your app in ddms and get its debug port. This is described here but does not address NetBeans specifically In NetBeans do: Debug / Attach Debugger and specify the port looked up in ddms in previous step. You may leave rest of the fields with defaults. Click OK
June 18, 2011
by Farrukh Najmi
· 173,456 Views
article thumbnail
Clustering Tomcat Servers with High Availability and Disaster Fallback
There has been a lot of buzz lately on high-availability and clustering. Most developers don't care and why should they? These features should be transparent to the application architecture and not something of concern to the developers of that application. But knowledge never hurts, so I emerged myself into the world of load balancing, heartbeats and virtual IP addresses. And you know what? Next time we need a infrastructure like this, I can at least sit down with the guys from the infrastructure department and at least know what the hell they are talking about. So what exactly is a high-availability clustered infrastructure (HACI, as I'll call it from now on) ? In essence, it should be a zero-downtime infrastructure (or at least perceived as one by the end user, which means never ever returning a default browser 404 page), capable of horizontal scaling when the need for it arises and without a single point of failure. It's the SLA writer's dream. A basic HACI setup looks like this: The users enters through a virtual IP address, assigned to one of the two load balancers. Only one of the load-balancers is active (the active master, LB1), the other one is there in the event LB1 fails ((LB2, a passive slave). The two load balancers are redundant, ie. having the exact same configuration. The load balancers redirect all traffic to the real servers. This can be done through round-robin assignment or through other means like sticky sessions, where the same user is redirected to the same server each and every time within a session. Servers can be added at any moment and configured on the load balancers. Ideally, the load balancer configuration is aware of the hardware specification and balances the load accordingly, but that's beyond the scope of this article (it involves adding weights). If all servers balanced by the load balancer fail, a backup server should be used to redirect all traffic coming from the load balancer. This can be a very lightweight server, which purpose is only to provide a sensible error page to the user (something like 'Sorry, we are performing maintenance'). Again, perception and immediate feedback to the user is key. You don't want to show the user a plain 404 page. Off course, if the backup server goes down too, you're in trouble (off course, by that time, warning bells should have gone off on every level in the hierarchy). So how to achieve this with as little effort as possible? If you want to try this out, I suggest you start by installing a virtual machine like VirtualBox or VMWare. This way you can try out the configuration yourself. In this example, I'll be load-balancing 3 Tomcat servers using sticky sessions using 2 load balancers in active-passive mode. I'm assuming all 3 Tomcat servers share the same hardware configuration, so they are all able to handle the same amount of traffic each. I'm also throwing in a backup server, in case all 3 Tomcat servers go down (serving a custom 503 page kindly informing the user of a catastrophic failure, instead of dropping the standard 404 bomb). You want to start off by assigning IP addresses to the servers. This will make your life a bit easier. We'll need 7 addresses: 3 for the tomcat server, 1 for the backup server, 2 for the loadbalancers and 1 virtual IP address to be shared between the load balancers (and which will be the entry point for your users). So our assignment will be: Virtual IP 10.0.5.99 www.haci.local LB1 10.0.5.100 lb1.haci.local #MASTER LB2 10.0.5.101 lb2.haci.local #SLAVE WEB1 10.0.5.102 web1.haci.local WEB2 10.0.5.103 web2.haci.local WEB3 10.0.5.104 web3.haci.local BACKUP 10.0.5.105 backup.haci.local Setting up the web servers is easy. You just install Tomcat on each server and create a simple JSP file to be served to users (make a small change, like the background color, on each server to distinguish the servers). I won't be covering session replication between the Tomcat servers, as it'll take me too far. If you want, you can configure the appropriate session replication and storage (using multicast or JDBC for example). The backup server I'm using is a basic LAMP server that returns a simple 503 page on every request it gets. The 503 error code is important, because it reflects the current state of the system: currently unavailable. For the loadbalancers I'll be using 2 applications: HAProxy and keepalived. HAProxy is going to handle load balancing, while keepalived will handle the failover between the two load balancers. First, we're going to configure HAProxy for both LB1 and LB2. Installing HAProxy is quite easy on an ubuntu system. Just do a sudo apt-get install haproxy and you're off. After the install, backup the current HAProxy config and start editing away. cp /etc/haproxy.cfg /etc/haproxy.cfg_orig cat /dev/null > /etc/haproxy.cfg vi /etc/haproxy.cfg The content of the config to reflect our setup should become something like this (same config on LB1 and LB2): global log 127.0.0.1 local0 log 127.0.0.1 local1 notice #log loghost local0 info maxconn 4096 #debug #quiet user haproxy group haproxy defaults log global mode http option httplog option dontlognull retries 3 redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 frontend http-in bind 10.0.5.99:80 default_backend servers backend servers mode http stats enable stats auth someuser:somepassword balance roundrobin cookie JSESSIONID prefix option httpclose option forwardfor option httpchk HEAD /check.txt HTTP/1.0 server web1 10.0.5.102:80 cookie haci_web1 check server web2 10.0.5.103:80 cookie haci_web2 check server web3 10.0.5.104:80 cookie haci_web3 check server webbackup 10.0.5.105:80 backup After this, enable HAProxy on both LB1 and LB2 by editing /etc/defaults/haproxy # Set ENABLED to 1 if you want the init script to start haproxy. ENABLED=1 # Add extra flags here. #EXTRAOPTS="-de -m 16" So far for the HAProxy configuration. We can't start it up yet, as LB1 and LB2 aren't listening yet on the virtual IP address. Next we'll configure the failover of the loadbalancers using keepalived. Installing it on Ubuntu is as easy as it was for HAProxy: sudo apt-get install keepalived. But its configuration is slightly different on both load balancers. First, we need to configure the both servers to be able to listen to the shared IP address. Add the following line to /etc/sysctl.conf: net.ipv4.ip_nonlocal_bind=1 And run sysctl -p Now, we configure keepalived so that LB1 is configured as the main load balancer and binds to the shared IP address, while LB2 is on standby, ready to take over whenever LB1 goes down. The configuration for LB1 looks like this (edit /etc/keepalived/keepalived.conf): vrrp_script chk_haproxy { # Requires keepalived-1.1.13 script "killall -0 haproxy" # cheaper than pidof interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface eth0 state MASTER virtual_router_id 51 priority 101 # 101 on master, 100 on backup virtual_ipaddress { 10.0.5.99 } track_script { chk_haproxy } } Start up keepalived and check whether it is listening to the virtual IP address. /etc/init.d/keepalived start ip addr sh eth0 It should return something like this, indicating it is listening to the virtual IP address 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:0c:29:a5:5b:93 brd ff:ff:ff:ff:ff:ff inet 10.0.5.100/24 brd 10.0.5.255 scope global eth0 inet 10.0.5.99/32 scope global eth0 inet6 fe80::20c:29ff:fea5:5b93/64 scope link valid_lft forever preferred_lft forever Next, we configure LB2. The configuration is almost the same, exception for the priority. vrrp_script chk_haproxy { # Requires keepalived-1.1.13 script "killall -0 haproxy" # cheaper than pidof interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface eth0 state MASTER virtual_router_id 51 priority 100 # 101 on master, 100 on backup virtual_ipaddress { 10.0.5.99 } track_script { chk_haproxy } } Start up keepalived and check the network interface. /etc/init.d/keepalived start ip addr sh eth0 It should return something like this, indicating it is not listening to the virtual IP address 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:0c:29:a5:5b:93 brd ff:ff:ff:ff:ff:ff inet 10.0.5.101/24 brd 10.0.5.255 scope global eth0 inet6 fe80::20c:29ff:fea5:5b93/64 scope link valid_lft forever preferred_lft forever Now, start up HAProxy on both LB1 and LB2. /etc/init.d/haproxy start Now you can issue requests to 10.0.5.99 (or www.haci.local), which will go to LB1, which in turn will load-balance the request to either WEB1, WEB2 and WEB3. You can test the load balancing by turning off WEB1 (or the server you're currently on). You can also the backup server by turning all main webservers (WEB1, WEB2 and WEB3). And you can test the loadbalancer failover by turning off LB1. At that point LB2 will kick in and act as the master, loadbalancing all requests. When you turn LB1 back on, it'll take over the master role once again. HAProxy allows you to add extra servers very easily, reloading the configuration without breaking existing sessions. See the HAProxy documentation for more info or on ServerFault. (http://serverfault.com/questions/165883/is-there-a-way-to-add-more-backend-server-to-haproxy-without-restarting-haproxy). Cheap and effective. While most enterprise shops have hardware load balancers, which also have these possibilities and more, if you're on a tight budget or need to simulate a HACI environment for development purposes (a lesson here: always simulate your production environment when you're testing during development), this might be the sane option. To finish, I'll quickly explain how to set up the backup server (a simple LAMP server). Create a vhost configuration on the apache for www.haci.local or any other domain pointing to the virtual IP address and set up mod_rewrite for it: RewriteEngine On RewriteCond %{REQUEST_URI} !\.(css|gif|ico|jpg|js|png|swf|txt)$ [NC] RewriteConf %{REQUEST_URI} !/503.php RewriteRule .* /503.php [L] Then create the 503.php file and add this to the top of it: Sorry, our servers are currently undergoing maintenance. Please check back with us in a while. Thank you for your patience. You can decorate the 503.php file any way you like. You can even use CSS, JavaScript and image files in the php file. Now, back to my IDE. I'm getting withdrawal symptoms.
March 11, 2011
by Lieven Doclo
· 57,973 Views
article thumbnail
Apache Solr: Get Started, Get Excited!
we've all seen them on various websites. crappy search utilities. they are a constant reminder that search is not something you should take lightly when building a website or application. search is not just google's game anymore. when a java library called lucene was introduced into the apache ecosystem, and then solr was built on top of that, open source developers began to wield some serious power when it came to customizing search features. in this article you'll be introduced to apache solr and a wealth of applications that have been built with it. the content is divided as follows: introduction setup solr applications summary 1. introduction apache solr is an open source search server. it is based on the full text search engine called apache lucene . so basically solr is an http wrapper around an inverted index provided by lucene. an inverted index could be seen as a list of words where each word-entry links to the documents it is contained in. that way getting all documents for the search query "dzone" is a simple 'get' operation. one advantage of solr in enterprise projects is that you don't need any java code, although java itself has to be installed. if you are unsure when to use solr and when lucene, these answers could help. if you need to build your solr index from websites, you should take a look into the open source crawler called apache nutch before creating your own solution. to be convinced that solr is actually used in a lot of enterprise projects, take a look at this amazing list of public projects powered by solr . if you encounter problems then the mailing list or stackoverflow will help you. to make the introduction complete i would like to mention my personal link list and the resources page which lists books, articles and more interesting material. 2. setup solr 2.1. installation as the very first step, you should follow the official tutorial which covers the basic aspects of any search use case: indexing - get the data of any form into solr. examples: json, xml, csv and sql-database. this step creates the inverted index - i.e. it links every term to its documents. querying - ask solr to return the most relevant documents for the users' query to follow the official tutorial you'll have to download java and the latest version of solr here . more information about installation is available at the official description . next you'll have to decide which web server you choose for solr. in the official tutorial, jetty is used, but you can also use tomcat. when you choose tomcat be sure you are setting the utf-8 encoding in the server.xml . i would also research the different versions of solr, which can be quite confusing for beginners: the current stable version is 1.4.1. use this if you need a stable search and don't need one of the latest features. the next stable version of solr will be 3.x the versions 1.5 and 2.x will be skipped in order to reach the same versioning as lucene. version 4.x is the latest development branch. solr 4.x handles advanced features like language detection via tika, spatial search , results grouping (group by field / collapsing), a new "user-facing" query parser ( edismax handler ), near real time indexing, huge fuzzy search performance improvements, sql join-a like feature and more. 2.2. indexing if you've followed the official tutorial you have pushed some xml files into the solr index. this process is called indexing or feeding. there are a lot more possibilities to get data into solr: using the data import handler (dih) is a really powerful language neutral option. it allows you to read from a sql database, from csv, xml files, rss feeds, emails, etc. without any java knowledge. dih handles full-imports and delta-imports. this is necessary when only a small amount of documents were added, updated or deleted. the http interface is used from the post tool, which you have already used in the official tutorial to index xml files. client libraries in different languages also exist. (e.g. for java (solrj) or python ). before indexing you'll have to decide which data fields should be searchable and how the fields should get indexed. for example, when you have a field with html in it, then you can strip irrelevant characters , tokenize the text into 'searchable terms', lower case the terms and finally stem the terms . in contrast, if you would have a field with text in it that should not be interpreted (e.g. urls) you shouldn't tokenize it and use the default field type string. please refer to the official documentation about field and field type definitions in the schema.xml file. when designing an index keep in mind the advice from mauricio : "the document is what you will search for. " for example, if you have tweets and you want to search for similar users, you'll need to setup a user index - created from the tweets. then every document is a user. if you want to search for tweets, then setup a tweet index; then every document is a tweet. of course, you can setup both indices with the multi index options of solr. please also note that there is a project called solr cell which lets you extract the relevant information out of several different document types with the help of tika. 2.3. querying for debugging it is very convenient to use the http interface with a browser to query solr and get back xml. use firefox and the xml will be displayed nicely: you can also use the velocity contribution , a cross-browser tool, which will be covered in more detail in the section about 'search application prototyping' . to query the index you can use the dismax handler or standard query handler . you can filter and sort the results: q=superman&fq=type:book&sort=price asc you can also do a lot more ; one other concept is boosting. in solr you can boost while indexing and while querying. to prefer the terms in the title write: q=title:superman^2 subject:superman when using the dismax request handler write: q=superman&qf=title^2 subject check out all the various query options like fuzzy search , spellcheck query input , facets , collapsing and suffix query support . 3. applications now i will list some interesting use cases for solr - in no particular order. to see how powerful and flexible this open source search server is. 3.1. drupal integration the drupal integration can be seen as generic use case to integrate solr into php projects. for the php integration you have the choice to either use the http interface for querying and retrieving xml or json. or to use the php solr client library . here is a screenshot of a typical faceted search in drupal : for more information about faceted search look into the wiki of solr . more php projects which integrates solr: open source typo3- solr module magento enterprise - solr module . the open source integration is out dated. oxid - solr module . no open source integration available. 3.2. hathi trust the hathi trust project is a nice example that proves solr's ability to search big digital libraries. to quote directly from the article : "... the index for our one million book index is over 200 gigabytes ... so we expect to end up with a two terabyte index for 10 million books" other examples for libraries: vufind - aims to replace opac internet archive national library of australia 3.3. auto suggestions mainly, there are two approaches to implement auto-suggestions (also called auto-completion) with solr: via facets or via ngramfilterfactory . to push it to the extreme you can use a lucene index entirely in ram. this approach is used in a large music shop in germany. live examples for auto suggestions: kaufda.de 3.4. spatial search applications when mentioning spatial search, people have geographical based applications in mind. with solr, this ordinary use case is attainable . some examples for this are : city search - city guides yellow pages kaufda.de spatial search can be useful in many different ways : for bioinformatics, fingerprints search, facial search, etc. (getting the fingerprint of a document is important for duplicate detection). the simplest approach is implemented in jetwick to reduce duplicate tweets, but this yields a performance of o(n) where n is the number of queried terms. this is okay for 10 or less terms, but it can get even better at o(1)! the idea is to use a special hash set to get all similar documents. this technique is called local sensitive hashing . read this nice paper about 'near similarity search and plagiarism analysis' for more information. 3.5. duckduckgo duckduckgo is made with open source and its "zero click" information is done with the help of solr using the dismax query handler: the index for that feature contains 18m documents and has a size of ~12gb. for this case had to tune solr: " i have two requirements that differ a bit from most sites with respect to solr: i generally only show one result, with sometimes a couple below if you click on them. therefore, it was really important that the first result is what people expected. false positives are really bad in 0-click, so i needed a way to not show anything if a match wasn't too relevant. i got around these by a) tweaking dismax and schema and b) adding my own relevancy filter on top that would re-order and not show anything in various situations. " all the rest is done with tuned open source products. to quote gabriel again: "the main results are a hybrid of a lot of things, including external apis, e.g. bing, wolframalpha, yahoo, my own indexes and negative indexes (spam removal), etc. there are a bunch of different types of data i'm working with. " check out the other cool features such as privacy or bang searches . 3.6. clustering support with carrot2 carrot2 is one of the "contributed plugins" of solr. with carrot2 you can support clustering : " clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. " see some research papers regarding clustering here . here is one visual example when applying clustering on the search "pannous" - our company : 3.7. near real time search solr isn't real time yet, but you can tune solr to the point where it becomes near real time, which means that the time ('real time latency') that a document takes to be searchable after it gets indexed is less than 60 seconds even if you need to update frequently. to make this work, you can setup two indices. one write-only index "w" for the indexer and one read-only index "r" for your application. index r refers to the same data directory of w, which has to be defined in the solrconfig.xml of r via: /pathto/indexw/data/ to make sure your users and the r index see the indexed documents of w, you have to trigger an empty commit every 60 seconds: wget -q http://localhost:port/solr/update?stream.body=%3ccommit/%3e -o /dev/null everytime such a commit is triggered a new searcher without any cache entries is created. this can harm performance for visitors hitting the empty cache directly after this commit, but you can fill the cache with static searches with the help of the newsearcher entry in your solrconfig.xml. additionally, the autowarmcount property needs to be tuned, which fills the cache with a newsearcher from old entries. also, take a look at the article 'scaling lucene and solr' , where experts explain in detail what to do with large indices (=> 'sharding') and what to do for high query volume (=> 'replicating'). 3.8. loggly = full text search in logs feeding log files into solr and searching them at near real-time shows that solr can handle massive amounts of data and queries the data quickly. i've setup a simple project where i'm doing similar things , but loggly has done a lot more to make the same task real-time and distributed. you'll need to keep the write index as small as possible otherwise commit time will increase too great. loggly creates a new solr index every 5 minutes and includes this when searching using the distributed capabilities of solr ! they are merging the cores to keep the number of indices small, but this is not as simple as it sounds. watch this video to get some details about their work. 3.9. solandra = solr + cassandra solandra combines solr and the distributed database cassandra , which was created by facebook for its inbox search and then open sourced. at the moment solandra is not intended for production use. there are still some bugs and the distributed limitations of solr apply to solandra too. tthe developers are working very hard to make solandra better. jetwick can now run via solandra just by changing the solrconfig.xml. solandra also has the advantages of being real-time (no optimize, no commit!) and distributed without any major setup involved. the same is true for solr cloud. 3.10. category browsing via facets solr provides facets , which make it easy to show the user some useful filter options like those shown in the "drupal integration" example. like i described earlier , it is even possible to browse through a deep category tree. the main advantage here is that the categories depend on the query. this way the user can further filter the search results with this category tree provided by you. here is an example where this feature is implemented for one of the biggest second hand stores in germany. a click on 'schauspieler' shows its sub-items: other shops: game-change 3.11. jetwick - open twitter search you may have noticed that twitter is using lucene under the hood . twitter has a very extreme use case: over 1,000 tweets per second, over 12,000 queries per second, but the real-time latency is under 10 seconds! however, the relevancy at that volume is often not that good in my opinion. twitter search often contains a lot of duplicates and noise. reducing this was one reason i created jetwick in my spare time. i'm mentioning jetwick here because it makes extreme use of facets which provides all the filters to the user. facets are used for the rss-alike feature (saved searches), the various filters like language and retweet-count on the left, and to get trending terms and links on the right: to make jetwick more scalable i'll need to decide which of the following distribution options to choose: use solr cloud with zookeeper use solandra move from solr to elasticsearch which is also based on apache lucene other examples with a lot of facets: cnet reviews - product reviews. electronics reviews, computer reviews & more. shopper.com - compare prices and shop for computers, cell phones, digital cameras & more. zappos - shoes and clothing. manta.com - find companies. connect with customers. 3.12. plaxo - online address management plaxo.com , which is now owned by comcast, hosts web addresses for more than 40 million people and offers smart search through the addresses - with the help of solr. plaxo is trying to get the latest 'social' information of your contacts through blog posts, tweets, etc. plaxo also tries to reduce duplicates . 3.13. replace fast or google search several users report that they have migrated from a commercial search solution like fast or google search appliance (gsa) to solr (or lucene). the reasons for that migration are different: fast drops linux support and google can make integration problems. the main reason for me is that solr isn't a black box —you can tweak the source code, maintain old versions and fix your bugs more quickly! 3.14. search application prototyping with the help of the already integrated velocity plugin and the data import handler it is possible to create an application prototype for your search within a few hours. the next version of solr makes the use of velocity easier. the gui is available via http://localhost:port/solr/browse if you are a ruby on rails user, you can take a look into flare. to learn more about search application prototyping, check out this video introduction and take a look at these slides. 3.15. solr as a whitelist imagine you are the new google and you have a lot of different types of data to display e.g. 'news', 'video', 'music', 'maps', 'shopping' and much more. some of those types can only be retrieved from some legacy systems and you only want to show the most appropriated types based on your business logic . e.g. a query which contains 'new york' should result in the selection of results from 'maps', but 'new yorker' should prefer results from the 'shopping' type. with solr you can set up such a whitelist-index that will help to decide which type is more important for the search query. for example if you get more or more relevant results for the 'shopping' type then you should prefer results from this type. without the whitelist-index - i.e. having all data in separate indices or systems, would make it nearly impossible to compare the relevancy. the whitelist-index can be used as illustrated in the next steps. 1. query the whitelist-index, 2. decide which data types to display, 3. query the sub-systems and 4. display results from the selected types only. 3.16. future solr is also useful for scientific applications, such as a dna search systems. i believe solr can also be used for completely different alphabets so that you can query nucleotide sequences - instead of words - to get the matching genes and determine which organism the sequence occurs in, something similar to blast . another idea you could harness would be to build a very personalized search. every user can drag and drop their websites of choice and query them afterwards. for example, often i only need stackoverflow, some wikis and some mailing lists with the expected results, but normal web search engines (google, bing, etc.) give me results that are too cluttered. my final idea for a future solr-based app could be a lucene/solr implementation of desktop search. solr's facets would be especially handy to quickly filter different sources (files, folders, bookmarks, man pages, ...). it would be a great way to wade through those extra messy desktops. 4. summary the next time you think about a problem, think about solr! even if you don't know java and even if you know nothing about search: solr should be in your toolbox. solr doesn't only offer professional full text search, it could also add valuable features to your application. some of them i covered in this article, but i'm sure there are still some exciting possibilities waiting for you!
January 25, 2011
by Peter Karussell
· 147,268 Views
article thumbnail
Setting mouse cursor position with WinAPI
Setting the mouse cursor position on a Windows machine with the help of .NET Framework shouldn't be that big of a problem. After all, there is the built-in Cursor class that lets you do that by executing a simple line of code: Cursor.Position = new System.Drawing.Point(0, 0); Of course, here 0 and 0 are the absolute coordinates for the mouse cursor on the screen. One thing to mention about this type of position setting is that Cursor requires a reference to System.Windows.Forms. And in some cases you don't want this extra reference. If that's the case, WinAPI is your solution. It requires some more work compared to the regular .NET way (class instance -> method call) but at the end you get more control than you would expect. When using WinAPI to set the cursor position, there are two ways you can go: mouse_event SendInput mouse_event is the very basic function that is only able to set the mouse coordinates. It was superseded and Microsoft recomends using SendInput instead. Nonetheless, it still works (although I cannot say for sure whether it will be working in future releases of Windows). So to start, I have a very basic class: class WINAPI_SUPERSEDED { [DllImport("user32.dll",SetLastError=true)] public static extern void mouse_event(uint dwFlags, uint dx, uint dy, uint dwData, int dwExtraInfo); public enum MouseFlags { MOUSEEVENTF_ABSOLUTE = 0x8000, MOUSEEVENTF_LEFTDOWN = 0x0002, MOUSEEVENTF_LEFTUP = 0x0004, MOUSEEVENTF_MIDDLEDOWN = 0x0020, MOUSEEVENTF_MIDDLEUP = 0x0040, MOUSEEVENTF_MOVE = 0x0001, MOUSEEVENTF_RIGHTDOWN = 0x0008, MOUSEEVENTF_RIGHTUP = 0x0010, MOUSEEVENTF_WHEEL = 0x0800, MOUSEEVENTF_XDOWN = 0x0080, MOUSEEVENTF_XUP = 0x0100 } public enum DataFlags { XBUTTON1 = 0x0001, XBUTTON2 = 0x0002 } } So to set the cursor position to 0,0 I would use this: WINAPI_SUPERSEDED.mouse_event((int)WINAPI_SUPERSEDED.MouseFlags.MOUSEEVENTF_MOVE | (int)WINAPI_SUPERSEDED.MouseFlags.MOUSEEVENTF_ABSOLUTE, 0, 0, 0, 0); If you go through the documentation, you will notice that in fact, dx and dy are in no way direct coordinates but rather normalized values ranged between 0 and 65,535 - that is only when the MOUSEEVENTF_ABSOLUTE flag is present. Otherwise, the position will be adjusted according to the current mouse cursor position. This method doesn't return any value so I cannot be informed whether it was successful or not. GetLastError won't give much information either. In this case, SendInput comes to the rescue. Here is what I have defined as the main class: class WINAPI { [DllImport("kernel32.dll")] public static extern uint GetLastError(); public enum MouseData { XBUTTON1 = 0x0001, XBUTTON2 = 0x0002 } public enum MouseFlags { MOUSEEVENTF_ABSOLUTE = 0x8000, MOUSEEVENTF_HWHEEL = 0x01000, MOUSEEVENTF_MOVE = 0x0001, MOUSEEVENTF_MOVE_NOCOALESCE = 0x2000, MOUSEEVENTF_LEFTDOWN = 0x0002, MOUSEEVENTF_LEFTUP = 0x0004, MOUSEEVENTF_RIGHTDOWN = 0x0008, MOUSEEVENTF_RIGHTUP = 0x0010, MOUSEEVENTF_MIDDLEDOWN = 0x0020, MOUSEEVENTF_MIDDLEUP = 0x0040, MOUSEEVENTF_VIRTUALDESK = 0x4000, MOUSEEVENTF_WHEEL = 0x0800, MOUSEEVENTF_XDOWN = 0x0080, MOUSEEVENTF_XUP = 0x0100 } [DllImport("user32.dll", SetLastError=true)] public static extern uint SendInput(uint nInputs, ref INPUT pInputs, int cbSize); [StructLayout (LayoutKind.Explicit)] public struct INPUT { [FieldOffset(0)] public int type; [FieldOffset(4)] public MOUSEINPUT mi; } public struct MOUSEINPUT { public int dx; public int dy; public int mouseData; public int dwFlags; public int time; public int extraInfo; } } This class is a bit more complicated, but at the same time you have to understand that in some cases, SendInput is used for hardware and keyboard input as well. Of course, for experimentation purposes I removed those parts in the sample class. Here you have the same MouseFlags enum that will let you pass custom flags defining the mouse behavior. Notice, that SendInput has SetLastError set to true, therefore if something wrong happens via this method, the error will be easily obtained via GetLastError, that is implemented as a helper method in the same class. VERY IMPORTANT: When you define the INPUT struct, make sure you use LayoutKind.Explicit since when passed to an unamanaged call, a specific field layout is required - as you can see, every field is decorated with a FieldOffset attribute. Also taking about StructLayout, you don't have to set StructLayout.Sequential to MOUSEINPUT since it is setautomatically by CLR. When I want to call the method above, I can simply use this snippet: WINAPI.MOUSEINPUT mouseInput = new WINAPI.MOUSEINPUT(); mouseInput.dx = 100; mouseInput.dy = 10; mouseInput.dwFlags = (int)WINAPI.MouseFlags.MOUSEEVENTF_ABSOLUTE | (int)WINAPI.MouseFlags.MOUSEEVENTF_MOVE; WINAPI.INPUT input = new WINAPI.INPUT(); input.type = 0; input.mi = mouseInput; uint x = WINAPI.SendInput(1, ref input, Marshal.SizeOf(input)); Console.WriteLine(x); Console.WriteLine(WINAPI.GetLastError()); Console.ReadLine(); Notice that I have to call the unmanaged version of sizeof and pass the INPUT struct to it in order for the method to correctly execute. The regular C# sizeof won't cut it here. When I am defining the type of input, 0 is repesenting the INPUT_MOUSE flag, since I am only handling the mouse here. Of course, I can re-organize my method to accept a set of INPUT instances - the native call itself allows this by requesting an array of INPUT and the correct indication of the number of INPUT instances passed, but that is not required for testing purposes.
November 28, 2010
by Denzel D.
· 17,290 Views
article thumbnail
JMS Clustering by Example
It's amazing how the JBoss Team put together an easy way to do JMS Clustering, out of the box!!. I'll start with an easy example, creating a Queue named "MyClusteredQueue". In this example I'm using JBoss AS 5.1. and two computers connected on the same network, with these IP's: - Computer A: 192.168.0.143 - Computer B: 192.168.0.210 So, here are the steps: 1) Install the JBoss on both computers. We are going to use the "all" configuration for both computers. 2) We create our Queue on both servers. Go to $JBOSS_HOME/server/all/deploy/messaging/ and edit the destinations-service.xml file. Add the MyClusteredQueue before the last server tag. It looks like this: jboss.messaging:service=ServerPeer jboss.messaging:service=PostOffice true This is how you add a Queue to the JBoss, and the people how are familiar with this, the only new thing is to add the attribute "Clustered". This step must be set on both computers. At the end of the article you can find the files. 3) Write the MDB to consume the messages, and deploy it on the two computers. (I'm using an EJB 3 - MDB style). import java.net.InetAddress; import javax.ejb.ActivationConfigProperty; import javax.ejb.MessageDriven; import javax.jms.Message; import javax.jms.MessageListener; import javax.jms.ObjectMessage; import org.apache.log4j.Logger; /** * @author felipeg * */ @MessageDriven(activationConfig = { @ActivationConfigProperty(propertyName="destinationType", propertyValue="javax.jms.Queue"), @ActivationConfigProperty(propertyName="destination", propertyValue="queue/MyClusteredQueue") }) public class JMSClusterClientHandler implements MessageListener { Logger log = Logger.getLogger(JMSClusterClientHandler.class); @Override public void onMessage(Message message) { try{ if (message instanceof ObjectMessage) { InetAddress addr = InetAddress.getLocalHost(); log.info("########## Processing Host: " + addr.getHostName() + " ##########" ); ObjectMessage objMessage = (ObjectMessage) message; Object obj = objMessage.getObject(); log.info("Object received:" + obj.toString()); } } catch (Exception e) { e.printStackTrace(); } } } 4) Start the jboss with the following options: Computer A: $ cd $JBOSS_HOME/bin $ ./run.sh -c all -b 192.168.0.143 -Djboss.messaging.ServerPeerID=1 Computer B: $ cd $JBOSS_HOME/bin $ ./run.sh -c all -b 192.168.0.210 -Djboss.messaging.ServerPeerID=2 It is necesary to give an ID to each server and this is accomplished with this directive: -Djboss.messaging.ServerPeerID When you start the jboss on computer A, you should see the logs (server.log) telling you that there is one node ready and listening, and once you start the jboss on computer B, on the log will appear the two nodes, the two IP's ready to consume messages. 5) Now it's time to send a Message to the Queue. To accomplish this it's necessary to change the connection factory to "ClusteredConnectionFactory" (JMSDispatcher.java - See the code below). Also on the jndi.properties (if you are using the default InitialContext) file it's necessary to add the two computers ip's separated by comma to the java.naming.provider.url property. (In my case a create a Properties variable and I set all the necessary properties, JMSDispatcher.java - see the code below). java.naming.provider.url=192.168.0.143:1099,192.168.0.210:1099 The client that I wrote is a web application, that consist in one index.jsp page, which contains a form that prompts you for the name of the queue, the type of messaging (Queue or Topic), the server ip and port, how many times it will send the message and the actual message to be sent; also the web application has a Servlet (JMSClusteredClient.java - see code below) that receives the postback and helper class (JMSDispatcher.java - see code below) that sends the message to the jboss servers. You can to deploy it in any computer. In my case I deployed it on the Computer A. And you can access it through this URL: http://192.168.0.143:8080/JMSWeb/ (just modify the IP where the client war was deployed). If you notice (on the index.jsp - code below) I've already put some default values that reflects the name of the Queue, and the IP's of my two computers. Now, If you increment the number of times that the message will be sent (maybe a 10) and fill out the message box, and click "Send" you should see on the two servers some of the messages being consumed by the MDB. Here are the Files to create the client: index.jsp JMS Clustered - Test Client Server: QueueTopic Times:Message: Servlet: JMSClusteredClient.java public class JMSClusteredClient extends HttpServlet { private static final long serialVersionUID = 1L; /** * @see HttpServlet#service(HttpServletRequest request, HttpServletResponse response) */ protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { PrintWriter out = response.getWriter(); String topicqueue = request.getParameter("topicqueue"); String message = request.getParameter("message"); String server = request.getParameter("server"); String messageType = request.getParameter("messageType"); String times = request.getParameter("times"); int intTimes = Integer.parseInt(times); JMSDispatcher dispatcher = new JMSDispatcher(); dispatcher.setTopicQueueName(topicqueue); dispatcher.setServer(server); dispatcher.setMessageType(messageType); try { for(int count =1; count <= intTimes;count++){ dispatcher.sendMessage( count + " of " + times + " " + message); } out.println("Message [" + message + "] sent successfully to [" + topic + "] to the [" + server + "] server " + times + " times."); } catch (JMSException e) { e.printStackTrace(); out.println("Error:" + e.getMessage()); } catch (NamingException e) { out.println("Error:" + e.getMessage()); e.printStackTrace(); } finally{ out.close(); } } } A utility to send the messages: JMSDispatcher.java public class JMSDispatcher { /** * */ private static final long serialVersionUID = 7105145023422143880L; private static Logger log = Logger.getLogger(JMSDispatcher.class); private final String CONNECTION_FACTORY_CLUSTERED = "ClusteredConnectionFactory"; private final String CONNECTION_FACTORY = "ConnectionFactory"; private final String TOPIC = "TOPIC"; private final String QUEUE = "QUEUE"; private String topicQueueName; private String server; private String messageType; public void setTopicQueueName(String value){ this.topicQueueName = value; } public void setServer(String value){ this.server = value; } public void setMessageType(String value){ this.messageType = value; } public void sendMessage(Object objectMessage) throws JMSException, NamingException{ log.debug("##### Setting up a Queue/Topic Message: #####"); if (TOPIC.equals(messageType)){ sendTopicMessage(objectMessage); } else if (QUEUE.equals(messageType)){ sendQueueMessage(objectMessage); } log.debug("##### Publishing Message: Done #####"); } private void sendQueueMessage(Object objectMessage) throws JMSException, NamingException{ try{ InitialContext initialContext = getInitialContext(); QueueConnectionFactory qcf = (QueueConnectionFactory) initialContext.lookup(CONNECTION_FACTORY_CLUSTERED); QueueConnection queueConn = qcf.createQueueConnection(); Queue queue = (Queue) initialContext.lookup(topicQueueName); QueueSession queueSession = queueConn.createQueueSession(false, Session.AUTO_ACKNOWLEDGE); queueConn.start(); QueueSender send = queueSession.createSender(queue); ObjectMessage om = queueSession.createObjectMessage((Serializable)objectMessage); setMessageProperties(om); log.debug("##### Publishing Message to a Queue: " + queueName + "#####"); send.send(om); send.close(); queueConn.stop(); queueSession.close(); queueConn.close(); }catch(MessageFormatException ex){ log.error("##### The MESSAGE is not Serializable ####"); throw ex; }catch(MessageNotWriteableException ex){ log.error("##### The MESSAGE is not Readable ####"); throw ex; }catch(JMSException ex){ log.error("##### JMS provider fails to set the object due to some internal error. ####"); throw ex; } } private void sendTopicMessage(Object objectMessage) throws JMSException, NamingException{ try{ InitialContext initialContext = getInitialContext(); TopicConnectionFactory tcf = (TopicConnectionFactory)initialContext.lookup(CONNECTION_FACTORY_CLUSTERED); TopicConnection topicConn = tcf.createTopicConnection(); Topic topic = (Topic) initialContext.lookup(topicQueueName); TopicSession topicSession = topicConn.createTopicSession(false,TopicSession.AUTO_ACKNOWLEDGE); topicConn.start(); TopicPublisher send = topicSession.createPublisher(topic); ObjectMessage om = topicSession.createObjectMessage(); om.setObject((Serializable)objectMessage); setMessageProperties(om); log.debug("##### Publishing Message to a Topic: " + topicName + "#####"); send.publish(om); send.close(); topicConn.stop(); topicSession.close(); topicConn.close(); }catch(MessageFormatException ex){ log.error("##### The MESSAGE is not Serializable ####"); throw ex; }catch(MessageNotWriteableException ex){ log.error("##### The MESSAGE is not Readable ####"); throw ex; }catch(JMSException ex){ log.error("##### JMS provider fails to set the object due to some internal error. ####"); throw ex; } } private InitialContext getInitialContext() throws NamingException{ Properties jboss = new Properties(); jboss.put("java.naming.factory.initial", "org.jnp.interfaces.NamingContextFactory"); jboss.put("java.naming.factory.url.pkgs", "org.jboss.naming:org.jnp.interfaces"); jboss.put("java.naming.provider.url", server); return new InitialContext(jboss); } } And the web.xml JMSWeb index.jsp JMSClusteredClient JMSClusteredClient com.blogspot.felipeg48.jms.web.JMSClusteredClient JMSClusteredClient /JMSClusteredClient Happy Clustering!!
May 26, 2010
by Felipe Gutierrez
· 16,718 Views
article thumbnail
Running Hazelcast on a 100 Node Amazon EC2 Cluster
The purpose of this article is to give you the details of our 100 node cluster demo. This demo is recorded and you can watch the 5 minute screencast Hazelcast is an open source clustering and highly scalable data distribution platform for Java. JVMs that are running Hazelcast will dynamically cluster and allow you to easily share and partition your application data across the cluster. Hazelcast is a peer-to-peer solution (there is no master node, every node is a peer) so there is no single point of failure. Communication among cluster members is always TCP/IP with Java NIO beauty. The default configuration comes with 1 backup so if a node fails, no data will be lost (you can specify the backup count). It is as simple as using java.util.{Map, Queue, Set, List}. Just add the hazelcast.jar into your classpath and start coding. When you download the Hazelcast, you will find a test.sh under bin directory. The test.sh runs an application which randomly makes 40% get, 40% put and 20% remove on a distributed map. In this demo the same test application will be used to see how it performs on 100 node cluster. Amazon EC2 and S3 An easy to use and scalable cloud environment was needed for demo so we decided to use Amazon EC2 for server instances (nodes) and S3 service to store demo application zip and configuration files. With its newly announced Java SDK, it is very simple to start/stop server instances and upload files to S3 programatically. Hazelcast AMI & Launcher The challenge here is that we are running an application on 100 nodes and dealing with each and every server in the cluster is a huge task. We don't want to ssh into every server and manually start the application. This part is automated by creating a special server image (AMI). The AMI contains Java Runtime and a launcher application we developed, which will download the demo application from Amazon S3, unzip it, and run the hazelcast/bin/test.sh in it. The Launcher is actually so generic that it can run any application; it doesn't care/know what test.sh contains. Deployer Deployment of the demo application is also automated so that we don't need to login into AWS Management Console and manually start instances. Deployer instantiates any number of Amazon EC2 servers with any AMI and also uploads the demo application zip file to S3. So the idea here is that, the Deployer will store the application into S3 and launch 100 EC2 instances with our image. The Launcher on each instance will download the application from S3 and run it. Demo Details. The smallest EC2 instances (m1.small) are used to run the demo. These are the virtual instances with CPU about 1.0 GHz. Also keep in mind that EC2 platform suffers from considerable amount of network latency. That's why we increased the thread count to 250 in our application. The following steps performed during the demo Download hazelcast-1.8.3.zip from www.hazelcast.com. Unzip the file and move the monitoring war file into tomcat6/webapps directory. Edit the test.sh under the bin directory: Add -Xmx1G -Xms1G Add -Dhazelcast.initial.wait.seconds=100 to make the cluster evenly partition on start so that migration can be avoided for better performance. Add t250 as an argument to the application to set thread count to 250. Remember the latency issue. Run the Deployer from IDE. Check from EC2 Management Console if 100 servers started. Start tomcat. Copy the public DNS name of one of the servers to connect to from monitoring tool. Go to http://localhost:8080/hazelcast-monitor-1.8.3/ (Hazelcast Monitoring Tool). Paste the address and connect to the cluster. Enjoy! Results You should always look for programatic ways of launching applications on the cloud. With these tools we were able to deploy and run the demo application on 100 servers in minutes. The entire Hazelcast cluster was making over 400,000 operations per second on the smallest EC2 instances. In our next demo we will experiment Hazelcast on large data set and even bigger cluster. Watch the screencast
April 16, 2010
by Fuad Malikov
· 62,656 Views · 1 Like
article thumbnail
Concurrent Programming in Groovy
It seems that the Groovy has a project for just about anything. That's one of the reasons why the language is so popular. The GPars library is an especially useful project in this new era of mult-core processors and concurrent programming. Formerly known as GParallelizer, GPars offers a framework for handling tasks concurrently and asynchronously while safe-guarding mutable values. DZone recently got an update on the project's latest news from project lead Václav Pech, and we've provided some examples of GPars concepts. GPars uses some of the best concepts from emerging languages and implements them for Groovy. Its actor support was inspired by the Actors library in Scala and the SafeVariable class in GPars was inspired by Agents in Clojure. The Groovy-based APIs in GPars are used to declare which parts of the code should be run concurrently. Objects can be enhanced with asynchronous methods to perform collections-based operations in parallel based on the fork/join model. GPars also has a Dataflow concurrency model that offers an alternative model that is inherently safe and robust due to algorithms that prevent having to deal with live-locks and race-conditions. The SafeVariable class is another technology in GPars that alleviates problems with concurrency by providing a non-blocking mt-safe reference to mutable state when Java libraries are integrated. Finally the Parallelizer, Asynchronizer, and Actors are some of the most interesting concepts in GPars. Actors Actors can be generated quickly to consume and send messages between each other even across distributed machines. You can build a messaging-based concurrency model with actors that are not limited by the number of threads. What was once only available to Scala developers, GPars now brings to Java and Groovy developers. Actors perform three different operations - send messages, receive messages and create new actors. New actors are created with the actor() method passing in the actor's body as a closure parameter. Inside the actor's body loop() is used to iterate, react() to receive messages, and reply() to send a message to the actor, which has sent the currently processed message. Here is how to create an actor that prints out all messages that it receives: import static groovyx.gpars.actor.Actors.* def console = actor { loop { react { println it } } } The loop() method ensures that the actor doesn't stop after processing the first message. Messages are sent using the send() method or the << operator. Here is an example of the sendAndWait () method in a message: actor << 'Message' actor.send 'Message' def reply1 = actor.sendAndWait('Message') def reply2 = actor.sendAndWait(10, TimeUnit.SECONDS, 'Message') def reply3 = actor.sendAndWait(10.seconds, 'Message') The sendAndWait() family blocks the caller until a reply from the actor becomes available. The reply is returned from sendAndWait() as a return value. For non-blocking message retrieval, calling the react() method, with or without a timeout parameter, from within the actor's code will consume the next message from the actor's inbox: println 'Waiting for a gift' react {gift -> if (myWife.likes gift) reply 'Thank you!' } Here is a more 'real world' example of an event-driven actor that receives two numeric messages, generates a sum, and sends the result to the console actor: import static groovyx.gpars.actor.Actors.* //not necessary, just showing that a single-threaded pool can still handle multiple actors defaultPooledActorGroup.resize 1 final def console = actor { loop { react { println 'Result: ' + it } } } final def calculator = actor { react {a -> react {b -> console.send(a + b) } } } calculator.send 2 calculator.send 3 calculator.join() Since Actors can share a relatively small thread pool, they bypass the threading limitations of the JVM and don't require excessive system resources even if an application consists of thousands of actors. There are some more sophisticated actor examples on the old GParallelizer wiki and there's also a nice article on the key concepts behind actors in erlang and scala. The documentation on GPars Actors can be found here. Asynchronizer A major feature of GPars is the Asynchronizer class, which runs tasks asynchronously in the background. It enables a Java Executor Service-based DSL on collections and closures. Inside the Asynchronizer.doParallel() blocks, asynchronous methods can be added to the closures. async() creates a variant of the supplied closure returning a future for the potential return value when invoked. callAsync() calls a closure in a separate thread supplying the given arguments and also returns a future for the potential value. Here is one example of Asynchronizer use: Asynchronizer.doParallel() { Closure longLastingCalculation = {calculate()} Closure fastCalculation = longLastingCalculation.async() //create a new closure, which starts the original closure on a thread pool Future result=fastCalculation() //returns almost immediately //do stuff while calculation performs … println result.get() } Parallelizer Finally, there's the Parallelizer, which is a concurrent collection processor. The common pattern to process collections takes elements sequentially, one at a time. This algorithm however, won't work well on multi-core hardware. The min() function on a dual-core chip can only leverage 50% of the computing power - 25% for a quad-core. Instead, GPars uses a tree-like structure for parallel processing. The Parallelizer class enables a ParallelArray(from JSR-166y)-based DSL on collections. Here is a use exapmle: doParallel { def selfPortraits = images.findAllParallel{it.contains me}.collectParallel {it.resize()} //a map-reduce functional style def smallestSelfPortrait = images.parallel.filter{it.contains me}.map{it.resize()}.min{it.sizeInMB} } Václav Pech told DZone that GPars currently has two new people joining the project - Jon Kerridge and Kevin Chalmers. The two developers are bringing their JCSP Groovy library with them into GPars. Pech said, "Apart from experimenting with the CSP concept, we will also enhance actor remoting and polish a couple of rough edges on the APIs. The documentation and especially the samples also deserve more attention." GPars' documentation is already quite robust. Pech says there quite a few issues queued up in their JIRA, but the previously listed issues remain the top priorities. Depending on the amount of time required to make progress with CSP, the GPars 1.0 release might happen in the summer says Pech. GPars is also developed by Alex Tkachman, a leading developer on the Groovy++ project. In an with Andres Almiray, Tkachman said that some of the work that comes out of Groovy++ might be assimilated into GPars, but no plans are in place yet since Groovy++ developers are still experimenting.
February 22, 2010
by Mitch Pronschinske
· 53,497 Views · 2 Likes
article thumbnail
Four Methods to Automate Development Environment Setup
There are at least four methods that can be used in different combinations to make the process of setting up a complete development environment a lot less painful.
February 16, 2010
by Mitch Pronschinske
· 31,705 Views
article thumbnail
Draw, Plot 2d Line In C# (csharp) - Bresenham's Line Algorithm
based on wikipedia public interface ISetPixel { void SetPixel(Point point); } public partial class Algorithms2D { public delegate void SetPixel(Point point); public static void Line(Point p0,Point p1,G plot) where G:ISetPixel { int x0=p0.X; int y0=p0.Y; int x1=p1.X; int y1=p1.Y; bool steep=abs(y1-y0)>abs(x1-x0); if (steep) { swap(ref x0,ref y0); swap(ref x1,ref y1); } if (x0>x1) { swap(ref x0,ref x1); swap(ref y0,ref y1); } int deltax=x1-x0; int deltay=abs(y1-y0); int error=-deltax/2; int ystep; int y=y0; if (y00) { y=y+ystep; error=error-deltax; } } } struct CSetPixel:ISetPixel { public CSetPixel(SetPixel setPixel) { this.setPixel=setPixel; } SetPixel setPixel; #region ISetPixel Members public void SetPixel(Point point) { setPixel(point); } #endregion } public static void Line(Point p0,Point p1,SetPixel plot) { Line(p0,p1,new CSetPixel(plot)); } private static int abs(int p) { return Math.Abs(p); } private static void swap(ref T x0,ref T y0) { T z=x0; x0=y0; y0=z; } } unit tests (c# 3.0): [TestFixture] public class Line { [Test] public void LineDiagonal() { List l = new List(); Algorithms2D.Line(new Point(0,0),new Point(3,3),z=>l.Add(z)); Assert.AreEqual(3, l.Count); Assert.AreEqual(new Point(0, 0), l[0]); Assert.AreEqual(new Point(1, 1), l[1]); Assert.AreEqual(new Point(2, 2), l[2]); } [Test] public void Line45() { List l = new List(); Algorithms2D.Line(new Point(0, 0),new Point(6, 3), z => l.Add(z)); Assert.AreEqual(6, l.Count); Assert.AreEqual(new Point(0, 0), l[0]); Assert.AreEqual(new Point(1, 0), l[1]); Assert.AreEqual(new Point(2, 1), l[2]); Assert.AreEqual(new Point(3, 1), l[3]); Assert.AreEqual(new Point(4, 2), l[4]); Assert.AreEqual(new Point(5, 2), l[5]); } }
April 26, 2007
by Snippets Manager
· 4,011 Views
article thumbnail
Algorithm For Calculating The Date Of Easter Sunday
/// /// Algorithm for calculating the date of Easter Sunday /// (Meeus/Jones/Butcher Gregorian algorithm) /// http://en.wikipedia.org/wiki/Computus#Meeus.2FJones.2FButcher_Gregorian_algorithm /// /// A valid Gregorian year /// Easter Sunday public static DateTime EasterDate(int year) { int Y = year; int a = Y % 19; int b = Y / 100; int c = Y % 100; int d = b / 4; int e = b % 4; int f = (b + 8) / 25; int g = (b - f + 1) / 3; int h = (19 * a + b - d - g + 15) % 30; int i = c / 4; int k = c % 4; int L = (32 + 2 * e + 2 * i - h - k) % 7; int m = (a + 11 * h + 22 * L) / 451; int month = (h + L - 7 * m + 114) / 31; int day = ((h + L - 7 * m + 114) % 31) + 1; DateTime dt = new DateTime(year, month, day); return dt; } Easter Monday = Easter Sunday + 1 Ascension Day = Easter Sunday + 39 Pentecost Sunday = Easter Sunday + 49 Pentecost Monday = Easter Sunday + 50
September 27, 2005
by Snippets Manager
· 14,956 Views
  • Previous
  • ...
  • 198
  • 199
  • 200
  • 201
  • 202
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×