DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Culture and Methodologies Topics

article thumbnail
Compute Grids vs. Data Grids
in a nutshell, grid computing is a way to distribute your computations across multiple computers (nodes). however, even jms does that, but jms is not a grid computing product - it's a messaging protocol. to correctly classify grid computing products we have to split them into 2 categories: compute grids and data grids. compute grid compute grids allow you to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel. the obvious benefit here is that your computation will perform faster as it now can use resources from all grid nodes in parallel. one of the most common design patterns for parallel execution is mapreduce . however, compute grids are useful even if you don't need to split your computation - they help you improve overall scalability and fault-tolerance of your system by offloading your computations onto most available nodes. some of the "must have" compute grid features are: automatic deployment - allows for automatic deployment of classes and resources onto grid without any extra steps from user. this feature alone provides one of the largest productivity boosts in distributed systems. users usually are able to simply execute a task from one grid node and as task execution penetrates the grid, all classes and resources are also automatically deployed. topology resolution - allows to provision nodes based on any node characteristic or user-specific configuration. for example, you can decide to only include linux nodes for execution, or to only include a certain group of nodes within certain time window. you should also be able to choose all nodes with cpu loaded, say, under 50% that have more than 2gb of available heap memory. collision resolution - allows users to control which jobs get executed, which jobs get rejected, how many jobs can be executed in parallel, order of overall execution, etc. load balancing - allows to balance properly balance your system load within grid. usually range of load balancing policies varies within products. some of the most common ones are round robin, random, or adaptive. more advanced vendors also provide affinity load balancing where grid jobs always end up on the same node based on job's affinity key. this policy works well with data grids described below. fail-over - grid jobs should automatically fail-over onto other nodes in case of node crash or some other job failure. checkpoints - long running jobs should be able to periodically store their intermediate state. this is useful for fail-overs, when a failed job should be able to pick up its execution from the latest checkpoint, rather than start from scratch. grid events - a querying mechanism for all grid events is essential. any grid node should be able to query all events that happened on remote grid nodes during grid task execution. node metrics - a good compute grid solution should be able to provide dynamic grid metrics for all grid nodes. metrics should include vital node statistics, from cpu load to average job execution time. this is especially useful for load balancing, when the system or user need to pick the least loaded node for execution. pluggability - in order to blend into any environment a good compute grid should have well thought out pluggability points. for example, if running on top of jboss, a compute grid should totally reuse jboss communication and discovery protocols. data grid integration - it is important that compute grid are able to natively integrate with data grids as quite often businesses will need both, computational and data features working within same application. some compute grid vendors: - gridgain - professional open source - jppf - open source data grid data grids allow you to distribute your data across the grid. most of us are used to the term distributed cache rather than data grid (data grid does sound more savvy though). the main goal of data grid is to provide as much data as possible from memory on every grid node and to ensure data coherency. some of the important data grid features include: data replication - all data is fully replicated to all nodes in the grid. this strategy consumes the most resources, however it is the most effective solution for read-mostly scenarios, as data is available everywhere for immediate access. data invalidation - in this scenario, nodes load data on demand. whenever data changes on one of the nodes, then the same data on all other nodes is purged (invalidated). then this data will be loaded on-demand the next time it is accessed. distributed transactions - transactions are required to ensure data coherency. cache updates must work just like database updates - whenever an update failed, then the whole transaction must be rolled back. most data grid support various transaction policies, such as read committed, write committed, serializable, etc... data backups - useful for fail-over. some data grid products provide ability to assign backup nodes for the data. this way whenever a node crashes, the data is immediately available from another node. data affinity/partitioning - data affinity allows you to split/partition your whole data set into multiple subsets and assign every subset to a grid node. in the purest form, data is not replicated between nodes at all, every node is only responsible for it's own subset of data. however, various data grid products may provide different flavors of data affinity, such as replication only to back up nodes for example. data affinity is one of the more advanced features, and is not provided by every vendor. to my knowledge, according to product websites, out of commercial vendors oracle coherence and gemstone have it (there may be others). in professional open source space you can take a look at combination of gridgain with affinity load balancing and jbosscache . some data grid/cache vendors: - oracle coherence - commercial - gemstone - commercial - gigaspaces - commercial - jbosscache - professional open source - ehcache - open source
July 31, 2008
by Dmitriy Setrakyan
· 28,317 Views · 3 Likes
article thumbnail
Spring Batch - Hello World
This is an introductory tutorial to Spring Batch. It does not aim to provide a complete guide to the framework but rather to facilitate the first contact. Spring Batch is quite rich in functionalities, and this is basically how I started learning it. Keep in mind that we will only be scratching the surface. Before we start All the examples will have the lofty task of printing "Hello World!" though in different ways. They were developed with Spring Batch 1.0. I'll provide a Maven 2 project and I'll run the examples with Maven but of course it is not a requirement to work with Spring Batch. Spring Batch in 2 Words Fortunately, Spring Batch model objects have self-explanatory names. Let's try to enumerate the most important and to link them together: A batch Job is composed of one or more Steps. A JobInstance represents a given Job, parametrized with a set of typed properties called JobParameters. Each run of of a JobInstance is a JobExecution. Imagine a job reading entries from a data base and generating an xml representation of it and then doing some clean-up. We have a Job composed of 2 steps: reading/writing and clean-up. If we parametrize this job by the date of the generated data then our Friday the 13th job is a JobInstance. Each time we run this instance (if a failure occurs for instance) is a JobExecution. This model gives a great flexibility regarding how jobs are launched and run. This naturally brings us to launching jobs with their job parameters, which is the responsibility of JobLauncher. Finally, various objects in the framework require a JobRepository to store runtime information related to the batch execution. In fact, Spring Batch domain model is much more elaborate but this will suffice for our purpose. Well, it took more than 2 words and I feel compelled to make a joke about it, but I won't. So let's move to the next section. Common Objects For each job, we will use a separate xml context definition file. However there is a number of common objects that we will need recurrently. I will group them in an applicationContext.xml which will be imported from within job definitions. Let's go through these common objects: JobLauncher JobLaunchers are responsible for starting a Job with a given job parameters. The provided implementation, SimpleJobLauncher, relies on a TaskExecutor to launch the jobs. If no specific TaskExecutor is set then a SyncTaskExecutor is used. JobRepository We will use the SimpleJobRepository implementation which requires a set of execution Daos to store its information. JobInstanceDao, JobExecutionDao, StepExecutionDao These data access objects are used by SimpleJobRepository to store execution related information. Two sets of implementations are provided by Spring Batch: Map based (in-memory) and Jdbc based. In a real application the Jdbc variants are more suitable but we will use the simpler in-memory alternative in this example. Here's our applicationContext.xml: Hello World with Tasklets A tasklet is an object containing any custom logic to be executed as a part of a job. Tasklets are built by implementing the Tasklet interface. Let's implement a simple tasklet that simply prints a message: public class PrintTasklet implements Tasklet{ private String message; public void setMessage(String message) { this.message = message; } public ExitStatus execute() throws Exception { System.out.print(message); return ExitStatus.FINISHED; } } Notice that the execute method returns an ExitStatus to indicate the status of the execution of the tasklet. We will define our first job now in a simpleJob.xml application context. We will use the SimpleJob implementation which executes all of its steps sequentailly. In order to plug a tasklet into a job, we need a TaskletStep. I also added an abstract bean definition for tasklet steps in order to simplify the configuration: ; Running the Job Now we need something to kick-start the execution of our jobs. Spring Batch provides a convenient class to achieve that from the command line: CommandLineJobRunner. In its simplest form this class takes 2 arguments: the xml application context containing the job to launch and the bean id of that job. It naturally requires a JobLauncher to be configured in the application context. Here's how to launch the job with Maven. Of course, it can be run with the java command directly (you need to specify the class path then): mvn exec:java -Dexec.mainClass=org.springframework.batch.core.launch.support.CommandLineJobRunner -Dexec.args="simpleJob.xml simpleJob" Hopefully, your efforts will be rewarded with a "Hello World!" printed on the console. The code source can be downloaded here. What's Next? This is the first part of 3. In the next part we will improve on this example while the third part will be dedicated to item oriented steps and flat files readers and writers. Hope you find it useful.
May 23, 2008
by Tareq Abedrabbo
· 299,385 Views
article thumbnail
Interview: Game Over for the JDK's Date and Time Classes
JSR 310 aims to modernize the date and calendar classes. The goal is to provide a more advanced and comprehensive model for date and time than those found in the Date and Calendar APIs. The JSR's leaders, Stephen Colebourne and Michael Nascimento, are presenting their work at JavaOne and give an overview below. Firstly, please briefly introduce yourselves. Michael Nascimento. I'm a senior technical consultant at Summa technologies and the founder of the Genesis open source project. I have also served as an expert on a few JSRs, such as the Common Annotations for the Java Platform (JSR-250). Stephen Colebourne. I am employed building travel e-commerce booking engines and am involved with many open source projects, such as Apache Commons and JodaTime. I am involved in this JSR because of my JodaTime project. JodaTime? What's that? Stephen: JodaTime provides a complete replacement of the date and time classes in the JDK: public boolean isAfterPayDay(DateTime datetime) { if (datetime.getMonthOfYear() == 2) { // February is month 2!! return datetime.getDayOfMonth() > 26; } return datetime.getDayOfMonth() > 28; } public Days daysToNewYear(LocalDate fromDate) { LocalDate newYear = fromDate.plusYears(1).withDayOfYear(1); return Days.daysBetween(fromDate, newYear); } public boolean isRentalOverdue(DateTime datetimeRented) { Period rentalPeriod = new Period().withDays(2).withHours(12); return datetimeRented.plus(rentalPeriod).isBeforeNow(); } public String getBirthMonthText(LocalDate dateOfBirth) { return dateOfBirth.monthOfYear().getAsText(Locale.ENGLISH); } What are the main things that are wrong with the current date and time classes? Stephen: The existing classes are pretty bad—probably the worst APIs in the JDK. They're buggy, mutable, cumbersome, many bugs, and they tend not to be threadsafe. Michael: The original date class comes from JDK 1.0. At the time, James Gosling tried to follow the related functions in C and didn't put much force into designing them from scratch. For example, they can't be internationalized and only local timezones are supported. Stephen: Right. The Gregorian calendar class is a direct port of the C-class, such as "January = 0". So, if you enter the month "12", the month is January because the algorithm wraps around. The algorithm performs calculations such as this that you don't expect. For example, with the Gregorian calendar class, getYear(), getMonth(), and getDay() are quick, while if you call combinations of getyear(), setyear() (and getMonth() setMonth(), and so on), performance will be bad because lots of calculations are done unexpectedly. Politely put, one can describe these classes as exhibiting "unusual performance characteristics". Why has it taken so long to fix these various problems? Stephen: People have known of these problems for several years. Some attempts have been made to fix the Calendar class, but it only got worse. Fixing these issues once and for all has never been a high enough priority. So why now and why you? Stephen: I started JodaTime in 2000/2001 and gradually solved the standard date and time class problems, releasing it in 2003. My solution has been picked up across the board, from small applications to the largest advertizing systems in the world. The point is that I wanted the solution to exist a few years as JodaTime, before heading into a JSR so that all the issues would have been identified in preparation for the JSR. In a nutshell, what does JodaTime offer me? Michael: Firstly, a better quality API. Stephen: Secondly, JodaTime supports a number of additional concepts. Firstly, "periods", such as if you wanted to store the concept of 5 weeks and 3 days. Secondly, "intervals", so that you'll be able to store the interval between the start of JavaOne and its end, i.e., for example, from Monday May 5, 9 a.m. to May 9, 3 p.m. Thirdly, an updated timezone implementation to make it easy to pick up timezone changes, which could even be on an annual basis. Finally, handling of different calendar systems, such as Islamic calendar systems / Coptic calendar systems, and so on, which don't exist in the standard JDK. Michael: The third point is why I got interested in this JSR in the first place. I'm from Brazil where the daylight saving systems change each year and there's always one or two weeks of chaos. I asked myself why things go wrong every year around this issue. Stephen. Possibly we could offer a solution consisting of a JAR file with the latest set of rules, which you could then put on the classpath. However, sometimes you'd need both sets of rules at the same time. We're still thinking about these situations and ought to be able to come up with something. By the way, where does the name "Joda" come from? Stephen: "Joda" was a 4 letter domain name starting with "J" that was free in 2003. I simply typed random things beginning with "J" and found that that one was free... Where is the JSR process now? Michael: We are progressing it in an open manner. All discussions are on public mailing lists and Wikis. All repositories are open and Issuezilla is open. Stephen: We are using java.net to build a reference implementation and a testing kit in Subversion. People can go there and try it out. It is all "work in progress". The basic API is there. Right now, parsing needs to be finished and some loose ends need to be tidied up. Parsing, intervals, and multiple calendar systems are missing at the moment. Can you say something about the JSR's timeline? Stephen: We hope that we'll be in Java 7, but given that there's no date for it, there's no guarantee that we'll finish in time. We received a little bit of funding from the OpenJDK challenge to get to early draft review by August. Michael: It's really important that people get involved, the last chance to influence design aspects is the early draft review, scheduled for August, which is coming near. Two previous attempts have been made for rewriting these classes and it's unlikely there'll be another one after ours. So it is really important to let your voice be heard because the more feedback we get the better. Stephen: There's been good quality feedback. We've had suggestions consisting of sample implementations of intervals, people pointing to different ISO specifications, and suggestions to expand into areas outside our scope. People should take a look at the algorithms too. Maybe someone could come up with better algorithms than those that we already have. Michael: We've also been nominated for a JCP Program Award, probably because we're the main examples of individuals, rather than a company, leading a JSR. The results will be announced on Tuesday during JavaOne. Will you present something around your JSR at JavaOne? Michael: Our technical session on Thursday at 1.30 is completely full and there's a repeat session on Friday at the same time, that is, at 1.30. In the session, we will cover all the basic classes, show examples of the code and how to get started with it. Stephen: There'll be little bit of explanation around the design principles, with examples of how bad the current date and time classes are. There'll also be a small puzzler, asking participants to identify the number of bugs in an existing bit of JDK code... Further Reading JSR 310 JSR 310 Technical Sessions at JavaOne JSR 310 General Purpose Area for Anyone to Leave Messages JSR 310 Mailing List Stephen Colebourne's blog Michael Nascimento's blog JCP Program's Award Nominations
May 5, 2008
by Geertjan Wielenga
· 13,235 Views
  • Previous
  • ...
  • 137
  • 138
  • 139
  • 140
  • 141
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
Ă—