New Adventures In Real-Time : An Interview With Greg Bollella
[img_assist|nid=3688|title=|desc=|link=none|align=right|width=151|height=113]The Java Real Time Specification is bringing Java to new places, including the unsuspecting PLC market. This year at JavaONE, Greg Bollella introduced the Blue Wonder system for industrial automation, built on top of Java Real Time. When you look at the details behind Blue Wonder, it's obvious that the Java Real Time specification is going to bring about a revival for the Java programming language, providing new opportunities in many industries from industrial automation and finance to military applications. I met up with Greg to talk about the real time specification in detail, the challenges in writing time critical applications and to see the infinite possibilities that it provides.
James Sugrue: Could you give us a background into the Java Real Time Specification?
Greg Bollella: Real time Java was JSR-1 and we finished in 2000, became solid with the 1.0 release in 2001. I started JSR-1 while at IBM, and moved to Sun in 2000 to continue working on Real Time Java.
My academic background and PhD was in realtime scheduling. Precisely it’s about how to support Real Time Computing within general purpose Operating Systems and VM’s, although I wasn’t thinking of VMs when I was doing that PhD. I’ve been thinking about how do you do real time within general purpose systems for the last 16 years. The general idea behind this is there’s so much value in what we create in the general purpose side of the computing industry that’s not able to be completely shared with the realtime programming community. So my dissertation, real time Java, and Blue Wonder and the Java Real Time System are all focused on bringing that value to the real time and embedded programmer.
JSR-1 has a new revision JSR-282 that is fixing problems with the first release, adding some stuff and moving forward with the spec. There’s also JSR-302 which is a strict subset of JSR-282 that is crafted in such a way that one would be able build an implementation of it that would be fairly easily certified under DO178B for Safety Critical Systems. (DO178B is the US certification protocol for software systems that would be used in safety critical systems such as flight controls for commercial airliners). That’s really difficult stuff to get right. So it’s not that you couldn’t possibly take regular Java through that certification, it just would be unbelievably expensive and maybe not ever finished. So this is a subset of very strict programming guidelines. So that’s 302.
One of the things about the spec, JSR1 is that the spec states that you have to be compliant to Java first before you can be compliant to JSR1. So you have to be SE, ME or EE compliant first. So our implementation at Sun, The Sun Java Real Time System, is compliant to Java SE. We pass all the test suites that SE does and any program that runs on SE5 also runs on Java RTS – that’s an important piece of the puzzle.
Sugrue: Does having an SE 5 application running on Java RTS make it any more safe?
Bollella: The application is still functionally correct but to get the real time behaviour you do have to do some coding. So if you have an application, say in finance - they think they want them to be realtime but they just can’t throw the application into the Real Time System and have magic happen.You need to think about the architecture and which parts of the application you want to be real time, change the thread types or re-architect it using the APIs from JSR1. There’s no magic unfortunately, but the changes are very straightforward and pretty easy to implement.
Sugrue: Is there a Reference Implementation available from Sun for this JSR?
Bollella: Not from Sun, we’re not the spec leads but there’s a reference implementation from a company called TimeSys. The TimeSys implementation is free. It's a good reference implementation but not optimized for performance and probably not a good choice for production.
There’s product versions available from some companies – IBM has a product implementation of JSR1 and Sun does. There’s a couple of small implementations of systems claiming to be Real Time Java – but they aren’t as they are not compliant to regular Java, or JSR1. They say they are compatable with JSR1 and Java but they officially haven’t passed the test suites. These are companies such as Aonix and Aicas. They claim things that they probably shouldn’t be claiming, but that’s the world.
They are interesting implementations – they don’t support the same kind of model that real JSR1 implementations like IBM and ours do where you can do full blown, non-real time stuff hammering away and be isolated from the real time stuff which still meets it’s deadlines. So you can have your standard Java app running and your real time app running and maintain realtime, safety critical stuff.
Sugrue: What are the main challenges in writing a real-time application in Java?
Bollella: We have to first talk about what real time means and talk about the predictability you get in Real Time Java. So we have our implementation of Java RTS and we run these predictability tests on it. What these tests measure is primarily if you have a periodic process defined in JSR1, and you do everything you can to the machine to make this process wake up as soon as it possibly can at the start of each of it’s periods. Measure How much interference is there from garage collection, file I/O, disk I/O and network operations – all those things might cause the logic of the periodic process to be delayed. That's the maximum interference.
There’s three environments in which you write code in JSR1. There’s the regular Java environment which is based on thread types from java.lang and behaves very like Java. There are two others, the Real Time Thread Context and the No Heap Thread Context.
Real Time Thread behaves sematically just like a java lang thread – same programming model. You can actually take java.lang.Thread and change it’s type to a real time thread and it will still run, so they are perfectly semantically equal. The real time thread gets it’s predictability because of the real time garbage collection system we have in the VM, and a number of other features like the real-time priorities (59 of them), access to Immortal Memory and Scoped Memory, Initialization Time Compilation, Async Events, etc.
So with real time threads, in the RTT Context, in this periodic process testing we can make assertions that no matter what else is going on in the VM, that thread will get control within about 150 microseconds after the beginning of the period. So that’s our maximum worst case latency that we experience. The best case is 1 or 2 microseconds. But the worst case when you have a lot of GC running and I/O occurring, it gets pushed up to about 150 microseconds. With that we’re able to do some pretty interesting things.
In the No Heap area do the same tests. With No Heap Threads, they’re not able to touch any objects on the heap, and so you have special memory areas for them to allocate their objects in and it’s a more restrictive environment – it’s harder to program, but we wouldn’t expect people to do complex things there, just very low level loops doing some sort of high frequency activity. And there we can get down that maximum interference to about 15 microseconds, so you can write some pretty high frequency Java code in the no heap thread, but the programming model is different and a bit tougher to get right.
One of the challenges in Real Time Java would be if you are writing in the real time thread context, which a lot of our customers do is balancing the behaviour of the garbage collector and the threads that are doing the time critical work. The collection algorithm we have allows you to set the priority of threads higher than that of the real time GC, so threads can actually interrupt the GC. The critical section between the threads and the GC is actually very small. It’s within that 150 microseconds window that I talked about. But you have to ensure that these time critical threads, if they’re going to continually interrupt the GC there has to be some periodicity to them in some way, shape or form that allows the GC to get enough cycles to clean up the heap behind them. In typical real time programming you don’t use dynamic memory at all – usually the statically allocate and reuse memory locations. We encourage people to use dynamic allocation and let the GC do the work.
Sugrue: Apart from BlueWonder, what other Java Real Time demo systems have you built?
Bollella: Last year at JavaONE we did a controller for an ABB IRB 340 FlexPicker robot.
We did the whole feedback controller in real time java on top of Solaris for this robot. What we did was take photographs of attendees, took the resulting jpeg, did some edge detection on it, turned it into vector list – manipulated it into a list that was understandable by the controller, produced a trajectory for the robot which picked up a pen and drew a picture of the person on a piece of paper. That was all done with Real Time Threads. We needed a 1,000 HZ process, a 1 millisecond period, to read the position of the motors, find out where they are now, figure out where we told them to go in the last ms, figure out what the delta is to that – we know where we want them to go in the next ms, and we use all that to figure what voltage and amps to send off to the motors. We did that reliably at 1000HZ and drove the robot just fine.
Although there are challenges you can do fairly demanding real time control systems in RT Java. The whole controller was a couple of graduate students and myself – it took less a month and I didn’t contribute much to that.
The other advantage through this kind of robotics is they also build a Java3D simulation of this robot and the control code is exactly the same code and class file for the simulator as well as the code driving the actual robot. So there’s no loss of fidelity – in simulators a lot of the time you need to write the control code and compile through the simulation host – to run on the target you then need to recompile for the target. So we avoid the porting effort between simulation and target – the same set of bytecode can be used, and the timing is well controlled for both.
Using Blue Wonder systems for control means you can do simulation and not have a loss in fidelity. There’s always the problem between the simulator itself and the actual robot – how well do you match simulator and the robot – but that problem exists no matter what. At least we take out half of the problem.
Sugrue: How did Sun get involved in Industrial Automation?
Bollella: There was this large industrial automation company that build power generation plants. You think PLCs for the mid-tier control – so they have a HMI layer and an administration, accounting and training layer up top. The middle layer is a layer of PLCs doing the real time control and then there’s the field devices. They wanted to replace those PLCs with something that was less expensive, more open and more modern. They actually took a competitive product that runs on a version of Linux that they got and hacked up themselves. They speced up some hardware and got it built up by a hardware vendor. They built their control system for power plants on top of that – and it worked really well and they were happy with it. What they weren’t happy with though was the VM was at it’s limit and they couldn’t do any non-realtime stuff on it and they had to do all the integration work. They were specifying the hardware, doing operating systems maintenance and put all the pieces together, which they didn’t want to do. So they came to Sun and we said, “OK we’ll do this box for you and deliver it, install Solaris & JavaRTS and even install your program on it when you start shipping to the plants themselves.” So we’re moving forward on this – that’s how the Blue Wonder project was born.
It’s gotten a lot of interest from other people in the automation space, as well as in the automotive space. So we’re pretty happy about the reception that we’ve got.
Sugrue: Do you think it means that at some stage Sun will become a PLC vendor?
Bollella: I don’t know if you’d say we’re a PLC vendor, but what we’d like to do is produce systems that could be used in the same way that PLC’s are being used.
Think about this – you have a factory or power plant that has 200 to 1000 of these PLCs, Currently they are isolated separate units that don’t talk to one another. To fiddle with one of them you have to walk over to it. But if you imagine replacing those PLCs with Blue Wonder systems every one of those is now just a Solaris box on the network.
And so the opportunity for simple things like NFS – all of these PLCs can be writing log files to an NFS mounted directory on the higher level server. So management and everything else is exactly the same as 1000 machines in an IT shop, because they’re all Solaris. It really makes a big difference to some of our customers – it’s opening up different worlds for them. A PLC is no longer just an isolated blob that has to be managed individually – it’s just a Solaris node on the network. All the advantages that come from that, now accrue to that PLC.
Sugrue: You’re providing the hardware to this company. So it’s a kind of a one off product rather than working for a major automation vendor?
Bollella: Well I wouldn’t call it a one off project – we have a number of people interested in it. It has a part number and we intend to sell them to whoever wants to buy them.
Sugrue: Are they available for the general public or automation companies or is that in the future?
Bollella: Right now we have certifications for CE and FCC certification so we can sell them to anybody who deploys them in the EU and a few other countries. We’re going to start working on UL certification for the US. We’re not worldwide yet, but we can take orders for anybody who wants to deploy in Europe.
Sugrue: How much would one of these cost?
Bollella : It depends on the size of the order – if you wanted to buy one I think it would be around $3000-$4000 dollars for one unit with volume discounts available. That price is without any Profibus cards included. Those cards are not inexpensive; they start at a bit under $1000 for one card from the company we’re buying them from (again, volume discounts apply).
If there’s customers that have a particular project in mind and they’re interested in Blue Wonder and would like to experiment with it, we can get some loaner machines for a few months.
In larger quantities we can drive the price of the system down.
Sugrue: Did you write you own Java drivers for the Profibus cards?
Bollella: We wrote a Solaris device driver that talks to the profibus card. There’s a PC104+ expansion bus in there and that’s electrically the same as PCI, just a different form factor. So we wrote a PCI device driver and then wrote Java classes on top of that so you get write to the driver from Real Time Java code - that comes bundles with Blue Wonder.
Sugrue: Has the whole Blue Wonder project taken very long?
Bollella: That’s the interesting thing – what we’re trying to do is stay as close to off the shelf stuff as we possibly can, so that the real time Java on there is unmodified from our 2.0 release of the Sun Product version. The only two things we do different with Solaris are writing the new device driver for Profibus and we don’t install all the packages – we’re selective on the packages we install to keep the amount of disk and memory used down.
We’ve been able to build something to do more functionality than a PLC and include Java in there too.
Sugrue: Because it’s so easy in theory, is there anything stopping someone from taking your idea and the TimeSys implementation and producing their own version of Blue Wonder?
Bollella: The first thing is that they would have to get a real time Java implementation. Our RT Java is not open source, it’s a binary product. Java itself is open source so they could grab that and build an implementation of Real Time Java from that. Solaris is open source so they could take that. But it would be more work for them to do that.
They could partner with us, and get our implementation, or even partner with Aonix or IBM. It would be probably as straightforward for one of those companies to do it as it was for us. The thing is to get the JSR 1 implementation first.
Sugrue: Is this as far as Sun plan to go with Industrial Automation - providing this off the shelf PLC replacements and partnering with other companies.
Bollella: We do offer training and consulting for writing real time Java programs and engineering service kind of stuff. We’re helping a couple of customers port their existing systems to Blue Wonder/Java RTS. We’re helping some customers with greenfield implementations.
Sugrue: Industrial Automation is usually slow to take up new trends, and to change implementations- how did Java RTS change this?
Bolllella: That’s really true – not only are they slow to move technologies, but there are also a lot of entrenched interests who don’t want new technology to get in. But I think that what people are starting to see is that even in the entrenched automation companies like Allen Bradley, there’s a lot of push from their customers to get more visibility from the regular IT side of the company onto the factory floor. There’s a real gap between those two right now. Like I said, all of a sudden the control in the floor for automation, packaging or machine tools are really Solaris nodes, the gap is bridged. There’s no gateways between the layers or stuff like that. The CEO of a company could conceivably check the log from a PLC anywhere in the world (I doubt they’d want to do that exactly). That seamless integration is the driver, with the need to interconnect machines from different manufacturers.
There’s an effort for Sun to become a member of MT Connect (Machine Tools Connect.) – it’s a http based protocol that allows you to access a machine tool, which is a web server now, by URL. We’re working with them to see how BlueWonder plays. Now not only is it a web server, but it’s also a Solaris node – you really get a seamless integration between the world as the rest of us know it and the factory floor, which has before been completely separate.
Sugrue: Where did the name ‘Blue Wonder’ come from?
Bollella: It’s kind of a long story. All the projects that I’ve worked on I’ve named after bridges. I did my graduate work in North Carolina, where Fred Brooks (he wrote The Mythical Man Month) is on the faculty. Fred talks about how the cathedrals of Europe were interesting to the engineering community because they didn’t really know how to build them – it was all through trial and error. Yet there was a lot of customer demand for cathedrals so they kept trying – they fell down, they took a long time, they were really expensive.
The same thing applies to bridges. If you’ve read any books by Henry Petroski at Duke, To Engineer Is Human: The Role of Failure in Successful Design (1985), and things like that, he talks about the bridges in the UK. At one point when they moved from wooden truss bridges to iron bridges, after a while they collapsed a lot, and killed a lot of people. The little anecdote is that the Queen got so pissed about this, as she was losing subjects and tax dollars, that she went to the engineers academy to get them to fix this. The engineers didn’t know anything about metal fatigue, so they were having these catastrophic failures.
Building real time software systems, but especially distributed real time systems - we find them in places you wouldn’t expect. Every automobile that’s driving around less than 10 years old is really a distributed real time system. It just happens to have a second function as a car. Distributed Real-Time Systems are really hard to build and the engineering community doesn’t really know how to build them in a coherent repeatable way. My claim is that real time, and distributed real time systems are now occupying that same place in the engineering community that cathedrals did in the 1000s and bridges did in the 1800s and 1900’s.
So I’ve had Mackinac, Golden Gate, Royal Gorge and a new one called Gateshead. So, Blue Wonder is a bridge in Dresden, Germany.
Sugrue: Best place to have an automation related project come from really! So, where else is the Java Real Time System going?
Bollella: We had some press releases on this stuff. Reuters is using Java RTS as the implementation for their next generation, market facing application. This is really important for Reuters as it generates quite a bit of revenue for them. I’ve been helping them architect their new systems – their old system was written in Ada and it wasn’t going to meet the new requirements. So we do offer that consultant help.
We’re also working with a couple of automotive manufacturers at using BlueWonder to prototype some really interesting new automobiles. So BlueWonder will be in these cars and doing dynamic control for the vehicle. That’ll be a different form factor – for prototyping we used BlueWonder as seen at JavaONE, but if and when we go to production with them we’ll go to a smaller unit with fewer I/Os and different features and get the price down quite a bit.
There’s a military integrator we’re working with – they’re rebuilding the software for this really large radar in Florida that tracks objects way up in geosynchronous orbit, They turn this thing on and the lights dim in Florida! It’s based on IBM computers from decades ago – they can only get their hardware parts on ebay now.
If you look at OpenESB and the SOA model, there are uses for being able to give some of the messages going through the system the chance to meet their quality of service requirements, to get to their destination at the right time.
So it’s quite a range of things that we’re involved with really – from industrial automation, to finance, to military and back into more standard technologies like ESB and SOA.
Sugrue: Have you got a big group working on Java RTS?
Bollella: No - the reason we’re not a big group is because what we do is really modifying system products. So we took Java, turned it into Real Time Java. We took Real Time Java and Solaris, wrapped it up in hardware and turned it into BlueWonder. If you looked at the total R&D that went into Blue Wonder, you’d have to include the R&D time put into Java and Solaris. So in some sense you can add up Java and Solaris to our group, they contributed a significant amount to the effort as well and by that measure we’re a pretty big group.