DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest IoT Topics

article thumbnail
Java: Testing a Socket is Listening on All Network Interfaces/Wildcard Interface
I previously wrote a blog post describing how I’ve been trying to learn more about network sockets in which I created some server sockets and connected to them using netcat. The next step was to do the same thing in Java and I started out by writing a server socket which echoed any messages sent by the client: public class EchoServer { public static void main(String[] args) throws IOException { int port = 4444; ServerSocket serverSocket = new ServerSocket(port, 50, InetAddress.getByAddress(new byte[] {0x7f,0x00,0x00,0x01})); System.err.println("Started server on port " + port); while (true) { Socket clientSocket = serverSocket.accept(); System.err.println("Accepted connection from client: " + clientSocket.getRemoteSocketAddress() ); In in = new In (clientSocket); Out out = new Out(clientSocket); String s; while ((s = in.readLine()) != null) { out.println(s); } System.err.println("Closing connection with client: " + clientSocket.getInetAddress()); out.close(); in.close(); clientSocket.close(); } } } public final class In { private Scanner scanner; public In(java.net.Socket socket) { try { InputStream is = socket.getInputStream(); scanner = new Scanner(new BufferedInputStream(is), "UTF-8"); } catch (IOException ioe) { System.err.println("Could not open " + socket); } } public String readLine() { String line; try { line = scanner.nextLine(); } catch (Exception e) { line = null; } return line; } public void close() { scanner.close(); } } public class Out { private PrintWriter out; public Out(Socket socket) { try { out = new PrintWriter(socket.getOutputStream(), true); } catch (IOException ioe) { ioe.printStackTrace(); } } public void close() { out.close(); } public void println(Object x) { out.println(x); out.flush(); } } I ran the main method of the class and this creates a server socket on port 4444 listening on the 127.0.0.1 interface and we can connect to it using netcat like so: $ nc -v 127.0.0.1 4444 Connection to 127.0.0.1 4444 port [tcp/krb524] succeeded! hello hello The output in my IntelliJ console looked like this: Started server on port 4444 Accepted connection from client: /127.0.0.1:63222 Closing connection with client: /127.0.0.1 Using netcat is fine but what I actually wanted to do was write some test code which would check that I’d made sure the server socket on port 4444 was accessible via all interfaces i.e. bound to 0.0.0.0. There are actually some quite nice classes in Java which make this very easy to do and wiring those together I ended up with the following client code: public static void main(String[] args) throws IOException { Enumeration nets = NetworkInterface.getNetworkInterfaces(); for (NetworkInterface networkInterface : Collections.list(nets)) { for (InetAddress inetAddress : Collections.list(networkInterface.getInetAddresses())) { Socket socket = null; try { socket = new Socket(inetAddress, 4444); System.out.println(String.format("Connected using %s [%s]", networkInterface.getDisplayName(), inetAddress)); } catch (ConnectException ex) { System.out.println(String.format("Failed to connect using %s [%s]", networkInterface.getDisplayName(), inetAddress)); } finally { if (socket != null) { socket.close(); } } } } } } If we run the main method of that class we’ll see the following output (on my machine at least!): Failed to connect using en0 [/fe80:0:0:0:9afe:94ff:fe4f:ee50%4] Failed to connect using en0 [/192.168.1.89] Failed to connect using lo0 [/0:0:0:0:0:0:0:1] Failed to connect using lo0 [/fe80:0:0:0:0:0:0:1%1] Connected using lo0 [/127.0.0.1] Interestingly we can’t even connect via the loopback interface using IPv6 which is perhaps not that surprising in retrospect given we bound using an IPv4 address. If we tweak the second line of EchoServer from: ServerSocket serverSocket = new ServerSocket(port, 50, InetAddress.getByAddress(new byte[] {0x7f,0x00,0x00,0x01})); to ServerSocket serverSocket = new ServerSocket(port, 50, InetAddress.getByAddress(new byte[] {0x00,0x00,0x00,0x00})); And restart the server before re-running the client we can now connect through all interfaces: Connected using en0 [/fe80:0:0:0:9afe:94ff:fe4f:ee50%4] Connected using en0 [/192.168.1.89] Connected using lo0 [/0:0:0:0:0:0:0:1] Connected using lo0 [/fe80:0:0:0:0:0:0:1%1] Connected using lo0 [/127.0.0.1] We can then wrap the EchoClient code into our testing framework to assert that we can connect via all the interfaces.
July 17, 2013
by Mark Needham
· 12,879 Views
article thumbnail
Build an Arduino Motor/Stepper/Servo Shield – Part 1: Servos
this post starts a small (or larger?) series of tutorials using the arduino motor/stepper/servo shield with the frdm-kl25z board. that motor shield is probably one of the most versatile on the market, and features 2 servo and 4 motor connectors for dc or stepper motors. that makes it a great shield for any robotic project arduino motor stepper servo shield with frdm-kl25z the series starts with a tutorial how to drive two servo motors. and if this is not what you are expecting to do with this shield, then you can vote and tell me what you want to see instead on this motor shield . oem or original? the original arduino motor/stepper/servo shield is available from adaftruit industries and costs less than $20. i’m using a oem version, see this link . the functionality is the same, except that the oem version only runs with motors up to 16 vdc, while the original shield is for motors up to 25 vdc. motor stepper servo shield details the board has two stmicroelectronics l293d motor h-bridge ic’s which can drive up to 4 dc motors (or up to 2 stepper motors) with 0.6 a per bridge (1.2 a peak). the 74hct595n (my board has the sn74hc595 from texas instrument) is a shift register used for the h-bridges to reduce the number of pins needed (more about this in a next post). a terminal block with jumper is providing power to the dc/stepper motor. the 5 vdc for the servos is taken from the frdm board. the frdm-kl25z can only give a few hundred ma on the 5v arduino header. that works for small servos, but i recommend to cut the 5v supply to the servos and use a dedicated 5v (or 6v) for the servos. outline in this tutorial, i’m creating a project with codewarrior for mcu10.4 for the frdm-kl25z board, and then add support for two servo motors. processor expert components this tutorial uses added processor expert components which are not part of codewarrior distribution. the following other components are used: wait : allows waiting for a given time servo : high level driver for hobby servp motors make sure you have the latest and greatest components loaded from github . instructions how to download and install the additional components can be found here . creating codewarrior project to create a new project in codewarrior: file > new > bareboard project, give a project name specify the device to be used: mkl25z128 opensda as connection i/o support can be set to ‘no i/o’ processor expert as rapid application development option this creates the starting point for my project: new servo project created servo motor servo motors are used in rc (radio control) or (hobby) robotics. typical servo motor (hitec hs-303) the motor has 3 connectors: gnd (black) power (red), typically 5v, but can be 6v or even higher pwm (white or yellow), signal for position information the pwm signal typically has frequency of 50 hz (20 ms), with a duty (high duration) between 1 ms and 2 ms. the screenshot below shows such a 50 hz signal with 1.5 ms duty cycle (servo middle position): servo signal many servos go below 1 ms and beyond 2 ms. e.g. many hitec servos have a range of 0.9…2.1 ms. check the data sheet of your servos for details. if you do not have a data sheet, then you might just experiment with different values. with a pwm duty of 1 ms to 2 ms within a 20 ms period, this means that only 10% of the whole pwm duty are used. this means if you have a pwm resolution of only 8bits, then only 10% of 256 steps could be used. as such, an 8bit pwm signal does not give me a fine tuned servo positioning. the duration of the duty cycle (1..2 ms) is translated into a motor position. typically the servo has a built-in closed-loop control with a microcontroller and a potentiometer. i have found that it is not important to have an *exact* 50 hz pwm frequency. you need to experiment with your servo if it works as well with a lower or higher frequency, or with non-fixed frequency (e.g. if you do a software pwm). many servos build an average of the duty cycle, so you might need to send several pulses until the servo reacts to a changed value. servo processor expert component i’m using here my own ‘servo’ component which offers following capabilities: pwm configuration (duty and period) min/max and initialization values methods to change the duty cycle optional command line shell support: you can type in commands and control the servo. this is useful for testing or calibration. optional ‘timed’ moving, so you can move the servo faster or slower to the new position in an interrupt driven way of course it is possible to use servos without any special components. from the components view, i add the servo component. to add it to my project, i can double-click on it or use the ‘+’ icon in that view: servo component in components library view in case the processor expert views are not shown, use the menu processor expert > show views this will add a new ‘servo’ component to the project: servo component added but it shows errors as first the pwm and pin settings need to be configured. pwm configuration on the arduino motor/stepper/servo shield the two servo motor headers are connected to pwm1b and pwm1a (see schematic ): servo header on board (source: dk electronics shield schematic) following the signals, this ends up at following pins on the kl25z: servo 1 => pwm1b => arduino header d10 => frdm-kl25z d10 => kl25z pin 73 => ptd0/spi0_pcs0/ tpm0_ch0 servo 2 => pwm1a => arduino header d9 => frdm-kl25z d9 => kl25z pin 78 => adc0_se6b/ptd5/spi1_sck/uart2_tx/ tpm0_ch5 from the pin names on the kinets (tpm0_ch0 and tpm0_ch5) i can see that this would be the same timer (tpm0), but with different channel numbers (ch0 and ch5). for my first servo processor expert has created for me a ‘timerunit_ldd’ which i will be able to share (later more on this). the timerunit_ldd implements the ‘ l ogical d evice d river’ for my pwm: timerunit_ldd so i select the pwm component inside the servo component and configure it for tpm0_c0v and the pin ptd0/spi0_pcs0/tpm0_ch0 with low initial polarity. the period of 20 ms (50 hz) and starting pulse with of 1.5 ms (mid-point) should already be pre-configured: servo1 pwm configuration i recommend to give it a pin signal name (i used ‘servo1′) that i need to set the ‘initial polarity’ to low is a bug of processor expert in my view: the device supports an initial ‘high’ polarity, but somehow this is not implemented? what it means is that the polarity of the pwm signal is now inverted: a ‘high’ duty cycle will mean that the signal is low. we need to ‘revert’ the logic later in the servo component. because of the inverted pwm logic, i need to set the ‘inverted pwm’ attribute in the servo component: inverted pwm the other settings of the servo component we can keep ‘as is’ for now. the ‘min pos pwm’ and ‘max pos pwm’ define the range of the pwm duty cycle which we will use later for the servo position. adding second servo as with the first servo, i add the second servo from the components library view. as i already have a timerunit_ldd present in my system, processor expert asks me if i want to re-use the existing one or to create a new component: shared component dialog as explained above: i can use the same timer (just a different pin/channel), so i have my existing component selected and press ok. as above, i configure the timer channel and pin with initial polarity: servo2 pwm configuration and i should not forget to enable the inverted logic: inverted pwm for servo2 test application time to try things out. for this i create a simple demo application which changes the position of both servos. first i add the wait component to the project from the components library: added wait component as i have all my processor expert components configured, i can generate the code: generating processor expert code next i add a new header application.h file to my project. for this i select the ‘sources’ folder of my project and use the new > header file context menu to add my new header file: new application.h in that header file application.h i add a prototype for my application ‘run’ routine: added app_run prototype from the main() in processorexpert.c , i call that function (not to forget to include the header file): calling app_run from main the same way i add a new source file application.c: new application.c to test my servos, i’m using the setpos() method which accepts a 8bit (0 to 255) value which is the position. to slow things a bit, i’m waiting a few milliseconds between the different positions: #include "application.h" #include "wait1.h" #include "servo1.h" #include "servo2.h" void app_run(void) { uint16_t pos; for(;;) { for(pos=0;pos<=255;pos++) { servo1_setpos(pos); servo2_setpos(pos); wait1_waitms(50); } } } save all files, and we should be ready to try it out on the board. build, download and run that’s it! time to build the project (menu project > build project ) and to download it with the debugger (menu run > debug ) and to start the application. if everything is going right, then the two servos will slowly turn in one direction until the end position, and then return back to the starting position. summary using hobby servo motors with the frdm-kl25z, codewarrior, processor expert and the additional components plus the arduino/stepper/servo shield is very easy in my view. i hope this post is useful to start your own experiments with hobby servo motors to bring any robotic project to the next level. i have here on github a project which features what is explained in this post, but with a lot more components, bells and whistles
June 2, 2013
by Erich Styger
· 17,807 Views · 7 Likes
article thumbnail
Coalition or Council: Which One Are You?
I have been thinking about institutions that strive for change. Sometimes we call them communities or organizations, sometimes we call them alliances or parties. But whatever their nature, these institutions are usually led and managed by a small group of people. I see two kinds of leading groups: coalitions and councils. coalition A temporary alliance of distinct parties, persons, or states for joint action council A group elected or appointed as an advisory or legislative body Coalitions A coalition is a self-selecting team. The persons seek each other out because they want to be active agents for change, and by working together they can be more successful in achieving a common goal. In his change management books John Kotter referred to them as guiding coalitions. They are not elected. They are not appointed. They select each other because they want to. And they can even work undercover, because their goal is to influence, not to govern. The allied powers in World War II were a coalition. The Google founders were a coalition. The originators of the Stoos Network were a coalition. Councils A council is a group of representatives. These people also want to be active agents for change. But, their primary concern is to have buy-in from the larger group of people they are representing within the institute (community, organization, or party). The concept of democracy has led to many different versions of these councils. Sometimes we call them a government. Sometimes a committee. And everything has to be out in the open, because if it’s not, we call them cronies. Their goal is primarily to govern or advise the institute. The United Nations has a council. My former students society had a council. And many workplaces have management teams acting as councils. And you? If you have a group of people who all desire change, do you lead with a coalition or with a council? This is the big problem with some alliances and consortiums for change. They have directors who try to be both. It is a recipe for disaster. Maybe the best institutions have both: a coalition and a council. (image from Veni Markovski)
April 21, 2013
by Jurgen Appelo
· 7,039 Views
article thumbnail
Weekend Project: Send sensor data from Arduino to MongoDB
Arduino is an open-source electronics platform that can acknowledge and interact with its environment through a variety of sensor types. It’s great for hardware prototyping and one-off projects. I just got an Arduino Board from our friends at SendGrid, who also gave me a little tutorial in the art of Arduino hacking. Inspired by the tutorial and armed with this new board, I bought a passive infared (PIR) motion sensor from my local Radio Shack. Now I was ready to play; in particular, I wanted to be able to collect that continuous stream of hardware sensor data into a MongoDB database for logging, trend analysis, system event correlation, etc. To this end, I created the demo project “mongodb-motion”, which I’ve made public on Github. In the “mongodb-motion” Github repo, you will find an Arudino project that writes motion sensor data to a cloud MongoDB database at MongoLab and sends alerts via email based on certain criteria. I built this demo using Node.js and the MongoLab REST API. Below, I’ll go through exactly what hardware you need to make your own “mongodb-motion” project a success, and how the code actually works. What You Need The hardware used in this demo includes: an Arduino UNO R3 and a Parallax PIR motion sensor. How the Code Works You can use a variety of motion sensors with the Arduino. In this particular experiment, I used a PIR motion sensor. The PIR motion sensor behaves like a switch, with ‘down’ events emitted on motion detection and ‘up’ events a few seconds after motion ceases to be detected. On the receiving side, I used JohnnyFive, an appropriately named Node.js package that accepts sensor events and sends messages to the Arduino board. With the two ends set, I’ll move on to the project’s configuration file. In this demo, I’ve included a configuration file, config-sample.js, where credentials for the MongoLab REST API and for the email SMTP server can be added. In my case, I used the SendGrid SMTP service. The configuration file also has two callbacks that determine when an email is emitted, one for each type of event – “detect” and “ceased”. I’ve used this feature to automatically send an email alert if an event timestamp is between 7:00pm and 8:00am, ostensibly when my office should be motionless… I’m out there watching you, office! Once you’ve customized this config-sample.js file, be sure to rename it to config.js in order for it to be usable. If you inspect the project code, you’ll notice that the MongoLab REST API is called in the logMsg() function, using an https.request. Building this little demo has given me some new ideas for hardware hacking the cloud. I hope you give it a try too. Thanks to the Arduino, Node.js and Javascript communities, and special thanks to Rick Waldon for Johnny Five, SendGrid for the UNO board, and a big shout out to @swiftalphaone for the Waza tutorial.
April 3, 2013
by Ben Wen
· 17,782 Views
article thumbnail
Simulate Network Latency, Packet Loss, and Low Bandwidth on Mac OSX
Sometimes while testing you may want to be able to simulate network latency, or packet loss, or low bandwidth. I have done this with Linux and tc/netem as well as with Shunra on Windows, but I had never done it on Mac OSX. It turns out that Mac OSX includes ‘dummynet’ from FreeBSD which has the capability to do this WAN simulation. Here is a quick example: Inject 250ms latency and 10% packet loss on connections between my workstation and my development web server (10.0.0.1) Simulate maximum bandwidth of 1Mbps # Create 2 pipes and assigned traffic to and from our webserver to each: $ sudo ipfw add pipe 1 ip from any to 10.0.0.1 $ sudo ipfw add pipe 2 ip from 10.0.0.1 to any # Configure the pipes we just created: $ sudo ipfw pipe 1 config delay 250ms bw 1Mbit/s plr 0.1 $ sudo ipfw pipe 2 config delay 250ms bw 1Mbit/s plr 0.1 A quick test: $ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1): 56 data bytes 64 bytes from 10.0.0.1: icmp_seq=0 ttl=63 time=515.939 ms 64 bytes from 10.0.0.1: icmp_seq=1 ttl=63 time=519.864 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=63 time=521.785 ms Request timeout for icmp_seq 3 64 bytes from 10.0.0.1: icmp_seq=4 ttl=63 time=524.461 ms Disable: $sudo ipfw list |grep pipe 01900 pipe 1 ip from any to 10.13.1.133 out 02000 pipe 2 ip from 10.13.1.133 to any in $ sudo ipfw delete 01900 $ sudo ipfw delete 02000 # or, flush all ipfw rules, not just our pipes $ sudo ipfw -q flush Notice that the round-trip on the ping is ~500ms. That is because we applied a 250ms latency to both pipes, incoming and outgoing traffic. Our example was very simple, but you can get quite complex since “pipes” are applied to traffic using standard ipfw firewall rules. For example, you could specify different latency based on port, host, network, etc. Packet loss is configured with the “plr” command. Valid values are 0 – 1. In our example above we used 0.1 which equals 10% packetloss. This is a very handy way for developers on Mac’s to test their applications in a variety of network environments. And you get it for FREE. On Windows you need to buy a commercial tool to achieve this (at least that was true the last time I looked, in 2008.)
December 15, 2012
by Joe Miller
· 17,019 Views
article thumbnail
CentOS Minimal Installation Network Configuration
By default CentOS minimal install does not come with pre-configured network, here’s how to make it work: $ ping google.com ping: unknown host google.com To fix this we’ll need to edit the set up for the ethernet. Let’s start with editing this file: $ vim /etc/sysconfig/network-scripts/ifcfg-eth0 IPADDR=x.x.x.x BOOTPROTO=none NETMASK=255.255.255.0 GATEWAY=y.y.y.y DNS1=y.y.y.y DNS2=y.y.y.y USERCTL=yes HWADDR='your mac address' where x.x.x.x is your static ip, and y.y.y.y is your router ip If you’re not sure what your mac address is, run this command $ ifconfig eth0 | grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}' now edit the networks config and make sure you added the line below: $ vi /etc/sysconfig/network Add the line: GATEWAY = y.y.y.y Now restart the network interface: $ /etc/init.d/networking restart Now ping the router: $ ping y.y.y.y Request timeout for icmp_seq 0 64 bytes from y.y.y.y: icmp_seq=1 ttl=56 time=1.792 ms Request timeout for icmp_seq 1 64 bytes from y.y.y.y: icmp_seq=3 ttl=56 time=1.790 ms 64 bytes from y.y.y.y: icmp_seq=4 ttl=56 time=1.762 ms Looks good, now let’s see if we can see anything outside. $ ping google.com PING google.com (173.194.67.138) 56(84) bytes of data. 64 bytes from wi-in-f138.1e100.net (173.194.67.138): icmp_seq=1 ttl=49 time=7.88 ms 64 bytes from wi-in-f138.1e100.net (173.194.67.138): icmp_seq=2 ttl=49 time=7.35 ms 64 bytes from wi-in-f138.1e100.net (173.194.67.138): icmp_seq=3 ttl=49 time=7.13 ms Now you can connect to the internet, and get all the packages you need.
November 26, 2012
by Kasia Gogolek
· 89,930 Views · 1 Like
article thumbnail
How to do a presentation in China? Some of my experiences
So the culture is different from Western culture we all know that! I am certainly not an expert on China but after living in China for almost 2 years knowing some language and working in a chinese company seeing presentations every week and also visiting over 30 western and chinese companies placed in China I think I have some insights about how you should organize your presentation in China. Since I recently went to Shanghai in order to to research exchange with Jiaotong University I was about to give a presentation to introduce my institute and me. So here you can find my rather uncommon presentation and some remarks, why some slides where designed in the way they are. http://www.rene-pickhardt.de/wp-content/uploads/2012/11/ApexLabIntroductionOfWeST.pdf Guanxi – your relations First of all I think it is really important to understand that in China everything is related to your relations (http://en.wikipedia.org/wiki/Guanxi). A chinese business card will always name a view of your best and strongest contacts. This is more important than your adress for example. If a conference starts people exchange namecards before they sit down and discuss. This principle of Guanxi is also reflected in the style presentations are made. Here are some basic rules: Show pictures of people you worked together with Show pictures of groups while you organized events Show pictures of the panels that run events Show your partners (for business not only clients but also people you are buying from or working together with in general) My way of respecting these principles: I first showed a group picture of our institute! I also showed for almost every project where I could get hold of it pictures of the people that are responsible for the project I did not only show the European research projects our university is in but listed all the different partners and showed logos of them Family The second thing is that in China the concept of family is very important. I would say as a rule of thumb if you want to make business with someone in china and you havent been introduced to their family things are not going like you might expect this. For this reason I have included some slides with a worldmap going further down to the place where I was born and where I studied and where my parents still leave! Localizing When I choosed a worldmap I did not only take one with Chinese language but I also took one where china was centered. In my contact data I also put chinese social networks. Remember Twitter, Facebook and many other sites are blocked in China. So if you really want to communicate with chinese people why not getting a QQ number or weibo account? Design of the slides You saw this on conferences many times. Chinese people just put a hack a lot of stuff on a slide. I strongly believe this is due to the fact that reading and recognizing Chinese characters is much faster than western characters. So if your presentation is in Chinese Language don’t be afraid to stuff your slides with information. I have seen many talks by Chinese people that where literally reading word by word what was written on the slides. Where in western countries this is considered bad practice in China this is all right. Language Speaking of Language: Of course if you know some chinese it shows respect if you at least try to include some chinese. I split my presentation in 2 parts. One which was in chinese and one that was in english. Have an interesting take away message So in my case I included the fact that we have PhD positions open and scholarships. That our institut is really international and the working language is english. Of course I also included some slides about my past and current research like Graphity and Typology During the presentation: In China it is not rude at all if ones cellphone rings and one has more important stuff to do. You as presenter should switch of your phone but you should not be disturbed or annoyed if people in the audience receive phone calls and go out of the room doing that business. This is very common in China. I am sure there are many more rules on how to hold a presentation in China and maybe I even made some mistakes in my presentation but at least I have the feeling that the reaction was quite positiv. So if you have questions, suggestions and feedback feel free to drop a line I am more than happy to discuss cultural topics!
November 11, 2012
by René Pickhardt
· 17,275 Views
article thumbnail
Smart Batching
How often have we all heard that “batching” will increase latency? As someone with a passion for low-latency systems this surprises me. In my experience when batching is done correctly, not only does it increase throughput, it can also reduce average latency and keep it consistent. Well then, how can batching magically reduce latency? It comes down to what algorithm and data structures are employed. In a distributed environment we are often having to batch up messages/events into network packets to achieve greater throughput. We also employ similar techniques in buffering writes to storage to reduce the number of IOPS. That storage could be a block device backed file-system or a relational database. Most IO devices can only handle a modest number of IO operations per second, so it is best to fill those operations efficiently. Many approaches to batching involve waiting for a timeout to occur and this will by its very nature increase latency. The batch can also get filled before the timeout occurs making the latency even more unpredictable. Figure 1. Figure 1. above depicts decoupling the access to an IO device, and therefore the contention for access to it, by introducing a queue like structure to stage the messages/events to be sent and a thread doing the batching for writing to the device. The Algorithm An approach to batching uses the following algorithm in Java pseudo code: public final class NetworkBatcher implements Runnable { private final NetworkFacade network; private final Queue queue; private final ByteBuffer buffer; public NetworkBatcher(final NetworkFacade network, final int maxPacketSize, final Queue queue) { this.network = network; buffer = ByteBuffer.allocate(maxPacketSize); this.queue = queue; } @Override public void run() { while (!Thread.currentThread().isInterrupted()) { while (null == queue.peek()) { employWaitStrategy(); // block, spin, yield, etc. } Message msg; while (null != (msg = queue.poll())) { if (msg.size() > buffer.remaining()) { sendBuffer(); } buffer.put(msg.getBytes()); } sendBuffer(); } } private void sendBuffer() { buffer.flip(); network.send(buffer); buffer.clear(); } } Basically, wait for data to become available and as soon as it is, send it right away. While sending a previous message or waiting on new messages, a burst of traffic may arrive which can all be sent in a batch, up to the size of the buffer, to the underlying resource. This approach can use ConcurrentLinkedQueue which provides low-latency and avoid locks. However it has an issue in not creating back pressure to stall producing/publishing threads if they are outpacing the batcher whereby the queue could grow out of control because it is unbounded. I’ve often had to wrap ConcurrentLinkedQueue to track its size and thus create back pressure. This size tracking can add 50% to the processing cost of using this queue in my experience. This algorithm respects the single writer principle and can often be employed when writing to a network or storage device, and thus avoid lock contention in third party API libraries. By avoiding the contention we avoid the J-Curve latency profile normally associated with contention on resources, due to the queuing effect on locks. With this algorithm, as load increases, latency stays constant until the underlying device is saturated with traffic resulting in a more "bathtub" profile than the J-Curve. Let’s take a worked example of handling 10 messages that arrive as a burst of traffic. In most systems traffic comes in bursts and is seldom uniformly spaced out in time. One approach will assume no batching and the threads write to device API directly as in Figure 1. above. The other will use a lock free data structure to collect the messages plus a single thread consuming messages in a loop as per the algorithm above. For the example let’s assume it takes 100µs to write a single buffer to the network device as a synchronous operation and have it acknowledged. The buffer will ideally be less than the MTU of the network in size when latency is critical. Many network sub-systems are asynchronous and support pipelining but we will make the above assumption to clarify the example. If the network operation is using a protocol like HTTP under REST or Web Services then this assumption matches the underlying implementation. Best (µs) Average (µs) Worst (µs) Packets Sent Serial 100 500 1,000 10 Smart Batching 100 150 200 1-2 The absolute lowest latency will be achieved if a message is sent from the thread originating the data directly to the resource, if the resource is un-contended. The table above shows what happens when contention occurs and a queuing effect kicks in. With the serial approach 10 individual packets will have to be sent and these typically need to queue on a lock managing access to the resource, therefore they get processed sequentially. The above figures assume the locking strategy works perfectly with no perceivable overhead which is unlikely in a real application. For the batching solution it is likely all 10 packets will be picked up in first batch if the concurrent queue is efficient, thus giving the best case latency scenario. In the worst case only one message is sent in the first batch with the other nine following in the next. Therefore in the worst case scenario one message has a latency of 100µs and the following 9 have a latency of 200µs thus giving a worst case average of 190µs which is significantly better than the serial approach. This is one good example when the simplest solution is just a bit too simple because of the contention. The batching solution helps achieve consistent low-latency under burst conditions and is best for throughput. It also has a nice effect across the network on the receiving end in that the receiver has to process fewer packets and therefore makes the communication more efficient both ends. Most hardware handles data in buffers up to a fixed size for efficiency. For a storage device this will typically be a 4KB block. For networks this will be the MTU and is typically 1500 bytes for Ethernet. When batching, it is best to understand the underlying hardware and write batches down in ideal buffer size to be optimally efficient. However keep in mind that some devices need to envelope the data, e.g. the Ethernet and IP headers for network packets so the buffer needs to allow for this. There will always be an increased latency from a thread switch and the cost of exchange via the data structure. However there are a number of very good non-blocking structures available using lock-free techniques. For the Disruptor this type of exchange can be achieved in as little as 50-100ns thus making the choice of taking the smart batching approach a no brainer for low-latency or high-throughput distributed systems. This technique can be employed for many problems and not just IO. The core of the Disruptor uses this technique to help rebalance the system when the publishers burst and outpace the EventProcessors. The algorithm can be seen inside the BatchEventProcessor. Note: For this algorithm to work the queueing structure must handle the contention better than the underlying resource. Many queue implementations are extremely poor at managing contention. Use science and measure before coming to a conclusion. Batching with the Disruptor The code below shows the same algorithm in action using the Disruptor's EventHandler mechanism. In my experience, this is a very effective technique for handling any IO device efficiently and keeping latency low when dealing with load or burst traffic. public final class NetworkBatchHandler implements EventHander { private final NetworkFacade network; private final ByteBuffer buffer; public NetworkBatchHandler(final NetworkFacade network, final int maxPacketSize) { this.network = network; buffer = ByteBuffer.allocate(maxPacketSize); } public void onEvent(Message msg, long sequence, boolean endOfBatch) throws Exception { if (msg.size() > buffer.remaining()) { sendBuffer(); } buffer.put(msg.getBytes()); if (endOfBatch) { sendBuffer(); } } private void sendBuffer() { buffer.flip(); network.send(buffer); buffer.clear(); } } The endOfBatch parameter greatly simplifies the handling of the batch compared to the double loop in the algorithm above. I have simplified the examples to illustrate the algorithm. Clearly error handling and other edge conditions need to be considered. Separation of IO from Work Processing There is another very good reason to separate the IO from the threads doing the work processing. Handing off the IO to another thread means the worker thread, or threads, can continue processing without blocking in a nice cache friendly manner. I've found this to be critical in achieving high-performance throughput. If the underlying IO device or resource becomes briefly saturated then the messages can be queued for the batcher thread allowing the work processing threads to continue. The batching thread then feeds the messages to the IO device in the most efficient way possible allowing the data structure to handle the burst and if full apply the necessary back pressure, thus providing a good separation of concerns in the workflow. Conclusion So there you have it. Smart Batching can be employed in concert with the appropriate data structures to achieve consistent low-latency and maximum throughput. From http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html
October 26, 2011
by Martin Thompson
· 11,380 Views · 1 Like
article thumbnail
Brute forcing a bin packing problem
Even a basic planning problem, such as bin packing, can be notoriously hard to solve and scale. One might consider the brute force algorithm. Let's take a look at how that algorithm works out on the cloud balance example of Drools Planner: Given a set of servers with different hardware (CPU, memory and network bandwidth) and given a set of processes with different hardware requirements, assign each process to 1 server and minimize the total cost of the active servers. The brute force algorithm is simple: try every combination between processes where each process is assigned to each server. For example, if we have 6 processes (P0, P1, P2, P3, P4, P5) and 2 servers (S0, S1), we'd try these solutions: P0->S0, P1->S0, P2->S0, P3->S0, P4->S0, P5->S0 P0->S0, P1->S0, P2->S0, P3->S0, P4->S0, P5->S1 P0->S0, P1->S0, P2->S0, P3->S0, P4->S1, P5->S0 P0->S0, P1->S0, P2->S0, P3->S0, P4->S1, P5->S1 ... P0->S1, P1->S1, P2->S1, P3->S1, P4->S1, P5->S1 On my machine, it takes 15ms to calculate the score of these 2^6 combinations. When I scale out to 9 processes and 3 servers, which are 3^9 combinations, it becomes 1582ms. So it scales like this: Notice that despite that the number of processes has not even doubled, the running time multiplied by 100! For comparison, I 've added the running time of the First Fit algorithm. And it gets worse: for 12 processes and 4 servers, which are 4^12 combinations, it take more than 17 minutes: What if we want to distribute 3000 processes over 1000 servers? With this kind of scalability, it will take too long. In fact, the brute force algorithm is useless. Luckily, Drools Planner implements several other optimization algorithms, which can handle such loads. If you want to know more about them, take a look at the Drools Planner manual or come to my talk at JUDCon London (31 Oct - 1 Nov). This article was originally posted on the Drools & jBPM blog.
September 26, 2011
by Geoffrey De Smet
· 9,938 Views
article thumbnail
Map Reduce and Stream Processing
Hadoop Map/Reduce model is very good in processing large amount of data in parallel. It provides a general partitioning mechanism (based on the key of the data) to distribute aggregation workload across different machines. Basically, map/reduce algorithm design is all about how to select the right key for the record at different stage of processing. However, "time dimension" has a very different characteristic compared to other dimensional attributes of data, especially when real-time data processing is concerned. It presents a different set of challenges to the batch oriented, Map/Reduce model. Real-time processing demands a very low latency of response, which means there isn't too much data accumulated at the "time" dimension for processing. Data collected from multiple sources may not have all arrived at the point of aggregation. In the standard model of Map/Reduce, the reduce phase cannot start until the map phase is completed. And all the intermediate data is persisted in the disk before download to the reducer. All these added to significant latency of the processing. Here is a more detail description of this high latency characteristic of Hadoop. Although Hadoop Map/Reduce is designed for batch-oriented work load, certain application, such as fraud detection, ad display, network monitoring requires real-time response for processing large amount of data, have started to looked at various way of tweaking Hadoop to fit in the more real-time processing environment. Here I try to look at some technique to perform low-latency parallel processing based on the Map/Reduce model. General stream processing model In this model, data are produced at various OLTP system, which update the transaction data store and also asynchronously send additional data for analytic processing. The analytic processing will write the output to a decision model, which will feed back information to the OLTP system for real-time decision making. Notice the "asynchronous nature" of the analytic processing which is decoupled from the OLTP system, this way the OLTP system won't be slow down waiting for the completion of the analytic processing. Nevetheless, we still need to perform the analytic processing ASAP, otherwise the decision model will not be very useful if it doesn't reflect the current picture of the world. What latency is tolerable is application specific. Micro-batch in Map/Reduce One approach is to cut the data into small batches based on time window (e.g. every hour) and submit the data collected in each batch to the Map Reduce job. Staging mechanism is needed such that the OLTP application can continue independent of the analytic processing. A job scheduler is used to regulate the producer and consumer so each of them can proceed independently. Continuous Map/Reduce Here lets imagine some possible modification of the Map/Reduce execution model to cater for real-time stream processing. I am not trying to worry about the backward compatibility of Hadoop which is the approach that Hadoop online prototype (HOP) is taking. Long running The first modification is to make the mapper and reducer long-running. Therefore, we cannot wait for the end of the map phase before starting the reduce phase as the map phase never ends. This implies the mapper push the data to the reducer once it complete its processing and let the reducer to sort the data. A downside of this approach is that it offers no opportunity to run the combine() function on the map side to reduce the bandwidth utilization. It also shift more workload to the reducer which now needs to do the sorting. Notice there is a tradeoff between latency and optimization. Optimization requires more data to be accumulated at the source (ie: the Mapper) so local consolidation (ie: combine) can be performed. Unfortunately, low latency requires the data to be sent ASAP so not much accumulation can be done. HOP suggest an adaptive flow control mechanism such that data is pushed out to reducer ASAP until the reducer is overloaded and push back (using some sort of flow control protocol). Then the mapper will buffer the processed message and perform combine() before it send to the reducer. This approach automatically shift back and forth the aggregation workload between the reducer and the mapper. Time Window: Slice and Range This is a "time slice" concept and a "time range" concept. "Slice" defines a time window where result is accumulated before the reduce processing is executed. This is also the minimum amount of data that the mapper should accumulate before sending to the reducer. "Range" defines the time window where results are aggregated. It can be a landmark window where it has a well-defined starting point, or a jumping window (consider a moving landmark scenario). It can also be a sliding window where is a fixed size window from the current time is aggregated. After receiving a specific time slice from every mapper, the reducer can start the aggregation processing and combine the result with the previous aggregation result. Slice can be dynamically adjusted based on the amount of data sent from the mapper. Incremental processing Notice that the reducer need to compute the aggregated slice value after receive all records of the same slice from all mappers. After that it calls the user-defined merge() function to merge the slice value with the range value. In case the range need to be refreshed (e.g. reaching a jumping window boundary), the init() functin will be called to get a refreshed range value. If the range value need to be updated (when certain slice value falls outside a sliding range), the unmerge() function will be invoked. Here is an example of how we keep tracked of the average hit rate (ie: total hits per hour) within a 24 hour sliding window with update happens per hour (ie: an one-hour slice). # Call at each hit record map(k1, hitRecord) { site = hitRecord.site # lookup the slice of the particular key slice = lookupSlice(site) if (slice.time - now > 60.minutes) { # Notify reducer whole slice of site is sent advance(site, slice) slice = lookupSlice(site) } emitIntermediate(site, slice, 1) } combine(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(site, slice, hitCount) } # Called when reducer receive full slice from all mappers reduce(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } sv = SliceValue.new sv.hitCount = hitCount return sv } # Called at each jumping window boundary init(slice) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue } # Called after each reduce() merge(rangeValue, slice, sliceValue) { rangeValue.hitCount += sliceValue.hitCount } # Called when a slice fall out the sliding window unmerge(rangeValue, slice, sliceValue) { rangeValue.hitCount -= sliceValue.hitCount }
November 23, 2010
by Ricky Ho
· 17,240 Views · 1 Like
article thumbnail
Leader vs Ruler
When I was trying to search for "leaders vs. rulers" on Google, I found many references to governments, royalty, and the military, throughout history. But the strange thing is that none of the articles seemed to distinguish between leaders and rulers. As if leaders and rulers are the same kind of people. They are not. Leaders Last week I was reading the book Tribes, by Seth Godin. In his book Seth says that never in history has it been so easy for anyone to be a leader. These days, with the use of social media, each of us is able to attract our own followers. And on Twitter, this is exactly what we're doing (quite literally). Seth explains that a crowd becomes a tribe when it has a leader that the people are following out of their own free will. And the interesting thing is that people can follow different leaders for different causes. In software projects it is the same. Some people can take the lead on an architectural level, while some have the lead on a functional level. Still others may be the first ones to turn to when people need advice about tools or processes. A complex system does not need a single leader. In fact, I believe a cross-functional team functions best when it has multiple leaders, each with his own area(s) of interest. Rulers In social systems the rulers are of an entirely different breed. While leaders use the power of attraction to convince people what to do, rulers use the power of authority to tell people what to do. Ruling people's lives is the very purpose of the ruler's job. With ruling comes law-making, enforcement and sanctioning, also called the trias politica (legislature, executive, judiciary). Unfortunately, rulers have gotten a bit of a bad name over the centuries. (Much of it deserved, by the way.) But ruling isn't all that bad. Laws, enforcement and sanctions are necessary evils, and in many social systems rulers can peacefully co-exist with leaders. For example: in any football (or soccer) match you will find leaders (one in each team) and rulers (the referees). They all play their parts in making the game work for everyone. Are managers rulers? There's no doubt in my mind that managers are rulers. They are (usually) the only ones with the authority to hire and fire people, and to place them in (or remove them from) teams or departments. They are able to tell people what software to use, what clothes to wear, and how much to pay for a place at the parking lot. Are managers leaders? This is a more interesting question. Lots of management book have been trying hard to turn managers into leaders. The last one I read was Good to Great, by Jim Collins. In his book Jim listed a 5-level hierarchy: Level 5 Executive: Builds enduring greatness through a paradoxical blend of personal humility and professional will. Level 4 Effective Leader: Catalyzes commitment to and vigorous pursuit of a clear and compelling vision, stimulating higher performance standards. Level 3 Manager: Organizes people and resources toward the effective and efficient pursuit of pre-determined objectives. Level 2 Contributing Team Member: Contributes individual capabilities to the achievement of group objectives and works effectively with others in a group setting. Level 1 Highly Capable Individual: Makes productive contributions through talent, knowledge, skills, and good work habits. The problem I have with Jim's hierarchy is that it suggests a linear progression to "higher" levels (where a leader is on a "higher" level than a manager). This doesn't fit with my observations of how social networks operate. In a software project, or any other social network, there can be many leaders, each with his or her own goals and desires. Some are taking initiatives for better architectures, some are leading the way to better user interface design, and some are guiding their followers towards better customer service, better processes, better software tools, or better coffee. To be a leader is not the next step for managers It is the manager's job to give room to leaders There are thousands of leaders on Twitter, and they all have their own huge numbers of followers. But who are the managers of Twitter? Only Evan Williams, Biz Stone and Jack Dorsey are. It's their platform. It's their game. They are the referees, making the laws, enforcing them, and sanctioning, while thousands of leaders and tribes are running around trying to score. Sure, it's ok when managers are trying to be leaders. Nothing wrong with that. Evan, Biz and Jack have a large number of followers themselves too. But they don't have the largest tribes. Managers are on top of things, but they are not on top. Rulers don't need to have the largest tribes themselves. Being a great ruler is hard enough already. If you think you need to be a great leader too, you're just making it hard for yourself. Referees contribute to great football/soccer games by being great rulers. They don't attempt to lead. It's not their job. They are in charge, but they are not the ones with the biggest egos. In his presentation Step Back from Chaos Jonathan Whitty shows that managers are often not the hubs in a social network. It's the informal leaders in a network through which most of the communication flows. It's the managers' job to make sure that leadership is cultivated, and that the emerging leaders are following the rules. So, you can be a leader, or you can be a ruler. And if you're exceptionally talented, perhaps you can be both. Which one will you be?
April 28, 2009
by Jurgen Appelo
· 6,603 Views
  • Previous
  • ...
  • 61
  • 62
  • 63
  • 64
  • 65
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×