DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Languages Topics

article thumbnail
Introducing the New Date and Time API for JDK 8
Date and time handling in Java is a somewhat tricky part when you are new to the language. Time can be accessed via the static method System.currentTimeMillis() which returns the current time in milliseconds from January 1st 1970. If you prefer to work with Objects instead you can use java.util.Date, a class whose methods are mostly deprecated in recent versions of Java. To work with time offsets, say add one month to a date, there is java.util.GregorianCalendar. All in all, those methods described here are not very convenient to work with. Java 7 and below are lacking a good date and time API. The Joda Time library is a common drop-in if you need to work with date/time. With JSR 310 (Java Specification Request) this is about to change. JSR 310 adds a new date, time and calendar API to Java 8. The ThreeTen project provides a reference implementation to this new API and can already be utilized in current Java projects (I however recommend not to do this for production). As the README states: The API is currently considered usable and accurate, yet incomplete and subject to change. If you use this API you must be able to handle incompatible changes in later versions. Building ThreeTen Building the ThreeTen project is relatively easy. It requires both Git and Ant to be installed on your system. git clone git://github.com/ThreeTen/threeten.git cd threeten ant This will first fetch the most recent version of ThreeTen and then start the build process using ant. Note that building the library also requires either OpenJDK 1.6 or Oracle JDK 1.6. JSR 310 The new API specifies a number of new classes which are divided into the categories of continuous and human time. Continuous time is based on Unix time and is represented as a single incrementing number. Class Description Instant A point in time in nanoseconds from January 1st 1970 Duration An amount of time measured in nanoseconds Human time is based on fields that we use in our daily lifes such as day, hour, minute and second. It is represented by a group of classes, some of which we will discuss in this article. Class Description LocalDate a date, without time of day, offset or zone LocalTime the time of day, without date, offset or zone LocalDateTime the date and time, without offset or zone OffsetDate a date with an offset such as +02:00, without time of day or zone OffsetTime the time of day with an offset such as +02:00, without date or zone OffsetDateTime the date and time with an offset such as +02:00, without a zone ZonedDateTime the date and time with a time zone and offset YearMonth a year and month MonthDay month and day Year/MonthOfDay/DayOfWeek/... classes for the important fields DateTimeFields stores a map of field-value pairs which may be invalid Calendrical access to the low-level API Period a descriptive amount of time, such as "2 months and 3 days" In addition to the above classes three support classes have been implemented. The Clock class wraps the current time and date, ZoneOffset is a time offset from UTC and ZoneId defines a time zone such as 'Australia/Brisbane'. Using the API Getting the current time The current time is represented by the Clock class. The class is abstract, so you can not create instances of it. The systemUTC() static method will return the current time based on your system clock and set to UTC. import javax.time.Clock; Clock clock = Clock.systemUTC(); To use the default time zone on your system there also is systemDefaultZone(). Clock clock = Clock.systemDefaultZone(); The millis() method can then be used to access the current time in milliseconds from January 1st, 1970. This shows, that the Clock class and all subclasses are wrapped around System.currentTimeMillis(). Clock clock = Clock.systemDefaultZone(); long time = clock.millis(); Working with time zones To work with time zones you need to import the ZoneId class. The class provides a method to get the default system time zone: import javax.time.ZoneId; import javax.time.Clock; ZoneId zone = ZoneId.systemDefault(); Clock clock = Clock.system(zone); As seen above, the ZoneId can then be used to get an instance of a Clock with that time zone. Other time zones can be accessed by their name, e.g.: ZoneId zone = ZoneId.of("Europe/Berlin"); Clock clock = Clock.system(zone); Getting human date and time Working with a time represented in a single long variable is not what we wanted. We want to work with objects that represent human readable time. The LocalDate, LocalTime and LocalDateTime classes do just that. import javax.time.LocalDate; // The now() method returns the current DateTime LocalDate date = LocalDate.now(); System.out.printf("%s-%s-%s", date.getYear(), date.getMonthValue(), date.getDayOfMonth() ); Using LocalDate to print the current date Doing calculations with times and dates One of the most important functionalities of JSR-310 is that you can do calculations with dates and times. The API makes it very easy to do that. import javax.time.LocalTime; import javax.time.Period; import static javax.time.calendrical.LocalPeriodUnit.HOURS; Period p = Period.of(5, HOURS); LocalTime time = LocalTime.now(); LocalTime newTime; newTime = time.plus(5, HOURS); // or newTime = time.plusHours(5); // or newTime = time.plus(p); Three ways of adding 5 hours to the current time Each class that represents human time implements the AdjustableDateTime interface. The interface requires the plus and the minus method that take a value and a PeriodUnit as argument. Conclusion This article gave a (very) brief introduction into the new date and time API that will ship with Java 8. The API seems to be very consistent and well thought through and provides many ways to interact with dates and times. Upon release of Java 8 the API will be moved from the javax.time package over to java.time, so there will be no conflict if you start using the current implementation.
September 25, 2012
by Fabian Becker
· 78,533 Views
article thumbnail
Nested Data Structures, and non-1NF design in PostgreSQL
This has been adapted from an ongoing series currently running on my blog. It has been adapted to be more self-contained, and rely less on other blog entries. For more see http://ledgersmbdev.blogspot.com PostgreSQL provides a very advanced set of tools for doing data modelling in ways which drift back and forth across a relational and non-relational divide. While it is generally a good idea to make the database relational first, and add objects later, the principles of object-relational database design allow you to do a lot more with PostgreSQL than you can on many other database platforms. This article will discuss the use of non-first-normal-form designs, in particular the storage of arrays of tuples in columns to simulate a nested table. The possible uses and problems of such a design will be discussed in detail. One of the promises of object-relational modelling is the ability to address information modelling on complex and nested data structures. Nested data structures bring considerable richness to the database, which is lost in a pure, flat, relational model. Nested data structures can be used to model tuple constraints in ways that are impossible to do when looking at flat data structures, at least as long as those constraints are limited to the information in a single tuple. At the same time there are cases where they simplify things and cases where they complicate things. This is true both in the case of using these for storage and for interfacing with stored procedures. PostgreSQL allows for nested tuples to be stored in a database, and for arrays of tuples. Other ORDBMS's allow something similar (Informix, DB2, and Oracle all support nested tables). Nested tables in PostgreSQL provide a number of gotchas, and additionally exposing the data in them to relational queries takes some extra work. In this post we will look at modelling general ledger transactions using a nested table approach, and both the benefits and limitations of this approach. In general this trades one set of problems for another and it is important to recognize the problems going in. The storage example came out of a brainstorming session I had with Marc Balmer of Micro Systems, though it is worth noting that this is not the solution they use in their products, nor is it the approach currently used by LedgerSMB. Basic Table Structure: The basic data schema will end up looking like this: CREATE TABLE journal_type ( id serial not null unique, label text primary key ); CREATE TABLE account ( id serial not null unique, control_code text primary key, -- account number description text ); CREATE TYPE journal_line_type AS ( account_id int, amount numeric ); CREATE TABLE journal_entry ( id serial not null unique, journal_type int references journal_type(id), source_document_id text,-- for example invoice number date_posted date not null, description text, line_items journal_line_type[], PRIMARY KEY (journal_type, source_document_id) ); This schema has a number of obvious gotchas and cannot, by itself, guarantee the sorts of things we want to do. However, using object-relational modelling we can fix these in ways that cannot do in a purely relational schema. The main problems are: First, since this is a double entry model, we need a constraint that says that the sum of the amounts of the lines must always equal zero. However, if we just add a sum() aggregate, we will end up with it summing every record in the db every time we do an insert, which is not what we want. We also want to make sure that no account_id's are null and no amounts are null. Additionally it is not possible in the schema above to easily expose the journal line information to purely relational tools. However we can use a VIEW to do this, though this produces yet more problems. Finally referential integrity enforcement between the account lines and accounts cannot be done declaratively. We will have to create TRIGGERs to enforce this manually. These problems are traded off against the fact that the relational model does not allow for the first problem to be solved at all so we trade off the fact that we have some solutions which are a bit of a pain for the fact that we have some solutions at all. Nested Table Constraints If we simply had a tuple as a column, we could look inside the tuple with check constraints. Something like check((column).subcolumn is not null). However in this case we cannot do that because we need to aggregate on a set of tuples attached to the row. To do this instead we create a set of table methods for managing the constraints: CREATE OR REPLACE FUNCTION is_balanced(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ SELECT sum(amount) = 0 FROM unnest($1.line_items); $$; CREATE OR REPLACE FUNCTION has_no_null_account_ids(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ SELECT bool_and(account_id is not null) FROM unnest($1.line_items); $$; CREATE OR REPLACE FUNCTION has_no_null_amounts(journal_entry) RETURNS BOOL LANGUAGE SQL AS $$ select bool_and(amount is not null) from unnest($1.line_items); $$; We can then create our constraints. Note that because we have to create the methods first, we have to add our constraints after the functions are defined, and these are added after the table is constructed. I have gone ahead and given these friendly names so that errors are easier for people (and machines) to process and handle. ALTER TABLE journal_entry ADD CONSTRAINT is_balanced CHECK ((journal_entry).is_balanced); ALTER TABLE journal_entry ADD CONSTRAINT has_no_null_account_ids CHECK ((journal_entry).has_no_null_account_ids); ALTER TABLE journal_entry ADD CONSTRAINT has_no_null_amounts CHECK ((journal_entry).has_no_null_amounts); Now we have integrity constraints reaching into our nested data. So let's test this out. insert into journal_type (label) values ('General'); We will re-use the account data from the previous post: or_examples=# select * from account; id | control_code | description ----+--------------+------------- 1 | 1500 | Inventory 2 | 4500 | Sales 3 | 5500 | Purchase (3 rows) Let's try inserting a few meaningless transactions, some of which violate our constraints: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type]); ERROR: new row for relation "journal_entry" violates check constraint "is_balanced" So far so good. insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(null, -100)::journal_line_type]); ERROR: new row for relation "journal_entry" violates check constraint "has_no_null_account_ids" Still good. insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(2, -100)::journal_line_type, row(3, NULL)::journal_line_type]) ERROR: new row for relation "journal_entry" violates check constraint "has_no_null_amounts" Great. All constraints working properly. Let's try inserting a valid row: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10001', now()::date, 'This is a test', ARRAY[row(1, 100)::journal_line_type, row(2, -100)::journal_line_type]); And it works! or_examples=# select * from journal_entry; id | journal_type | source_document_id | date_posted | description | li ne_items ----+--------------+--------------------+-------------+----------------+------------------------ 5 | 1 | ref-10001 | 2012-08-23 | This is a test | {"(1,100)","(2,-100)"} (1 row) Break-Out Views A second major problem that we will be facing with this schema is that if someone wants to create a report using a reporting tool that only really supports relational data very well, then the financial data will be opaque and not available. This scenario is one of the reasons why I think it is important generally to push the relational model to its breaking point before looking at object-relational functions. Consequently I think when doing nested tables it is important to ensure that the data in them is available through a relational interface, in this case, a view. In this case, we may want to model debits and credits in a way which is re-usable, so we will start by creating two type methods: CREATE OR REPLACE FUNCTION debits(journal_line_type) RETURNS NUMERIC LANGUAGE SQL AS $$ SELECT CASE WHEN $1.amount < 0 THEN $1.amount * -1 ELSE NULL END $$; CREATE OR REPLACE FUNCTION credits(journal_line_type) RETURNS NUMERIC LANGUAGE SQL AS $$ SELECT CASE WHEN $1.amount > 0 THEN $1.amount ELSE NULL END $$; Now we can use these as virtual columns anywhere a journal_line_type is used. The view definition itself is rather convoluted and this may impact performance. I am waiting for the LATERAL construct to become available which will make this easier. CREATE VIEW journal_line_items AS SELECT id AS journal_entry_id, (li).*, (li).debits, (li).credits FROM (SELECT je.*, unnest(line_items) li FROM journal_entry je) j; Remember li.debits and li.credits gets turned by the parser into debits(li) and credits(li), allowing for class.method notation here. Testing this out: SELECT * FROM journal_line_items; gives us journal_entry_id | account_id | amount | debits | credits ------------------+------------+--------+--------+--------- 5 | 1 | 100 | | 100 5 | 2 | -100 | 100 | 6 | 1 | 200 | | 200 6 | 3 | -200 | 200 | As you can see, this works. Now people with purely relational tools can access the information in the nested table. In general it is almost always worth creating break-out views of this sort where nested data is stored. However it is important to note that with larger data sets this is insufficient because indexing considerations makes it hard to look up specific information on a row level. This may or may not be the end of the world depending on data set size. Referential Integrity Controls The final problem is that relational integrity is not a well defined concept for nested data. For this reason, if we value relational integrity and foreign keys are involved, we must find ways of enforcing these. The simplest solution is a trigger which runs on insert, update, or delete, and manages another relation which can be used as a proxy for relational integrity checks. For example, we could: CREATE TABLE je_account ( je_id int references journal_entry (id), account_id int references account(id), primary key (je_id, account_id) ); This will be a very narrow table and so should be quick to search. It may also be useful in determining which accounts to look at for transactions if we need to do that. This table could then be used to optimize queries. To maintain the table we need to recognize that never ever will a journal entry's line items be updated or deleted. This is due to the need to maintain clear audit controls and trails. We may add other flags to the table to indicate transactions but we can handle insert, update, and delete conditions with a trigger, namely: CREATE FUNCTION je_ri_management() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ DECLARE accounts int[]; BEGIN IF TG_OP ILIKE 'INSERT' THEN INSERT INTO je_account (je_id, account_id) SELECT NEW.id, account_id FROM unnest(NEW.line_items) GROUP BY account_id; RETURN NEW; ELSIF TG_OP ILIKE 'UPDATE' THEN IF NEW.line_items <> OLD.line_items THEN RAISE EXCEPTION 'Cannot journal entry line items!'; ELSE RETURN NEW; END IF; ELSIF TG_OP ILIKE 'DELETE' THEN RAISE EXCEPTION 'Cannot delete journal entries!'; ELSE RAISE EXCEPTION 'Invalid TG_OP in trigger'; END IF; END; $$; Then we add the trigger with: CREATE TRIGGER je_breakout_for_ri AFTER INSERT OR UPDATE OR DELETE ON journal_entry FOR EACH ROW EXECUTE PROCEDURE je_ri_management(); The final invalid TG_OP could be omitted but this is not a bad check to have. Let's try this out: insert into journal_entry (journal_type, source_document_id, date_posted, description, line_items) values (1, 'ref-10003', now()::date, 'This is a test', ARRAY[row(1, 200)::journal_line_type, row(3, -200)::journal_line_type]); or_examples=# select * from je_account; je_id | account_id -------+------------ 10 | 3 10 | 1 (2 rows) In this way referential integrity can be enforced. Solution 2.0: Refactoring the above to eliminate the view. The above solution will work great for small businesses but for larger businesses, querying this data will become slow for certain kinds of reports. Storage here is tied to a specific criteria, and indexing is somewhat problematic. There are ways we can address this, but they are not always optimal. At the same time our work is simplified because the actual accounting details are append-only. One solution to this is to refactor the above solution. Instead of: Main table Relational view Materialized view for referential integrity checking we can have: Main table, with tweaked storage for line items Materialized view for RI checking and relational access Unfortunately this sort of refactoring after the fact isn't simple. Typically you want to convert the journal_line_type type to a journal_line_type table, and inherit this in your materialized view table. You cannot simply drop and recreate since the column you are storing the data in is dependent on the structure. The solution is to rename the type, create a new one in its place. This must be done manually and there is no current capability to copy a composite type's structure into a table. You will then need to create a cast and a cast function. Then, when you can afford the downtime, you will want to convert the table to the new type. It is quite possible that the downtime will be delayed and you will have an extended time period where you are half-way through migrating the structure of your database. You can, however, decide to create a cast between the table and the type, perhaps an implicit one (though this is not inherited) and use this to centralize your logic. Unfortunately this leads to duplication-related complexity and in an ideal world would be avoided. However, assuming that the downtime ends up being tolerable, the resulting structures will end up such that they can be more readily optimized for a variety of workloads. In this regard you would have a main table, most likely with line_items moved to extended storage, whose function is to model journal entries as journal entries and apply relevant constraints, and a second table which models journal entry lines as independent lines. This also simplifies some of the constraint issues on the first table, and makes the modelling easier because we only have to look into the nested storage where we are looking at subset constraints. This section then provides a warning regarding the use of advanced ORDBMS functionality, namely that it is easy to get tunnel vision and create problems for the future. The complexity cost here is so high, that the primary model should generally remain relational, with things like nested storage primarily used to create constraints that cannot be effectively modelled otherwise. However, this becomes a great deal more complicated where values may be update or deleted. Here, however, we have a relatively simple case regarding data writes combined with complex constraints that cannot be effectively expressed in normalized, relational SQL. Therefore the standard maintenance concerns that counsel against duplicating information may give way to the fact that such duplication allows for richer constraints. Now, if we had been aware of the problems going in we would have chosen this structure all along. Our design would have been: CREATE TYPE journal_line AS ( entry_id bigserial primary key, --only possible key je_id int not null, account_id int, amount numeric ); After creating the journal entry table we'd: ALTER TABLE journal_line ADD FOREIGN KEY (je_id) REFERENCES journal_entry(id); If we have to handle purging old data we can make that key ON DELETE CASCADE. And the lines would have been of this type instead. We can then get rid of all constraints and their supporting functions other than the is_balanced one. Our debit and credit functions then also reference this type. Our trigger then looks like: CREATE FUNCTION je_ri_management() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ DECLARE accounts int[]; BEGIN IF TG_OP ILIKE 'INSERT' THEN INSERT INTO journal_line (je_id, account_id, amount) SELECT NEW.id, account_id, amount FROM unnest(NEW.line_items); RETURN NEW; ELSIF TG_OP ILIKE 'UPDATE' THEN RAISE EXCEPTION 'Cannot journal entry line items!'; ELSIF TG_OP ILIKE 'DELETE' THEN RAISE EXCEPTION 'Cannot delete journal entries!'; ELSE RAISE EXCEPTION 'Invalid TG_OP in trigger'; END IF; END; $$; Approval workflows can be handled with a separate status table with its own constraints. Deletions of old information (up to a specific snapshot) can be handled by a stored procedure which is unit tested and disables this trigger before purging data. This system has the advantage of having several small components which are all complete and easily understood, and it is made possible because the data is exclusively append-only. As you can see from the above examples, nested data structures greatly complicate the data model and create problems with relational math that must be addressed if data logic will remain meaningful. This is a complex field, and it adds a lot of complexity to storage. In general, these are best avoided in actual data storage except where this approach makes formerly insurmountable problems manageable. Moreover, they add complexity to optimization once data gets large. Thus while non-atomic fields in this regard make sense as an initial point of entry in some narrow cases, as a point of actual query, they are very rarely the right approaches. It is possible that, at some point, nested storage will be able to have its own indexes, foreign keys, etc. but I cannot imagine this being a high priority and so it isn't clear that this will ever happen. In general, it usually makes the most sense to simply store the data in a pseudo-normalized way, with any non-1NF designs being the initial point of entry in a linear write model. Nested Data Structures as Interfaces Nested data structures as interfaces to stored procedures are a little more manageable. The main difficulties are in application-side data construction and output parsing. Some languages handle this more easily than others. Upper-level construction and handling of these structures is relatively straight-forward on the database-side and poses none of these problems. However, they do cause additional complexity and this must be managed carefully. The biggest issue when interfacing with an application is that ROW types are not usually automatically constructed by application-level frameworks even if they have arrays. This leaves the programmer to choose between unstructured text arrays which are fundamentally non-discoverable (and thus brittle), and arrays of tuples which are discoverable but require a lot of additional application code to handle. At the same time as a chicken and egg problem, frameworks will not add handling for this sort of problem unless people are already trying to do it. So my general recommendation is to use nested data types everywhere in the database sparingly, only where the benefits clearly outweigh the complexity costs. Complexity costs are certainly lower in the interface level and there are many more cases where it these techniques are net wins there, but that does not mean that they should be routinely used even there.
September 25, 2012
by Chris Travers
· 20,814 Views
article thumbnail
8 Common Code Violations in Java
At work, recently I did a code cleanup of an existing Java project. After that exercise, I could see a common set of code violations that occur again and again in the code. So, I came up with a list of such common violations and shared it with my peers so that an awareness would help to improve the code quality and maintainability. I’m sharing the list here to a bigger audience. The list is not in any particular order and all derived from the rules enforced by code quality tools such as CheckStyle, FindBugs and PMD. Here we go! Format source code and Organize imports in Eclipse: Eclipse provides the option to auto-format the source code and organize the imports (thereby removing unused ones). You can use the following shortcut keys to invoke these functions. Ctrl + Shift + F – Formats the source code. Ctrl + Shift + O – Organizes the imports and removes the unused ones. Instead of you manually invoking these two functions, you can tell Eclipse to auto-format and auto-organize whenever you save a file. To do this, in Eclipse, go to Window -> Preferences -> Java -> Editor -> Save Actions and then enable Perform the selected actions on save and check Format source code + Organize imports. Avoid multiple returns (exit points) in methods: In your methods, make sure that you have only one exit point. Do not use returns in more than one places in a method body. For example, the below code is NOT RECOMMENDED because it has more then one exit points (return statements). private boolean isEligible(int age){ if(age > 18){ return true; }else{ return false; } } The above code can be rewritten like this (of course, the below code can be still improved, but that’ll be later). private boolean isEligible(int age){ boolean result; if(age > 18){ result = true; }else{ result = false; } return result; } Simplify if-else methods: We write several utility methods that takes a parameter, checks for some conditions and returns a value based on the condition. For example, consider the isEligible method that you just saw in the previous point. private boolean isEligible(int age){ boolean result; if(age > 18){ result = true; }else{ result = false; } return result; } The entire method can be re-written as a single return statement as below. private boolean isEligible(int age){ return age > 18; } Do not create new instances of Boolean, Integer or String: Avoid creating new instances of Boolean, Integer, String etc. For example, instead of using new Boolean(true), use Boolean.valueOf(true). The later statement has the same effect of the former one but it has improved performance. Use curly braces around block statements. Never forget to use curly braces around block level statements such as if, for, while. This reduces the ambiguity of your code and avoids the chances of introducing a new bug when you modify the block level statement. NOT RECOMMENDED if(age > 18) return true; else return false; RECOMMENDED if(age > 18){ return true; }else{ return false; } Mark method parameters as final, wherever applicable: Always mark the method parameters as final wherever applicable. If you do so, when you accidentally modify the value of the parameter, you’ll get a compiler warning. Also, it makes the compiler to optimize the byte code in a better way. RECOMMENDED private boolean isEligible(final int age){ ... } Name public static final fields in UPPERCASE: Always name the public static final fields (also known as Constants) in UPPERCASE. This lets you to easily differentiate constant fields from the local variables. NOT RECOMMENDED public static final String testAccountNo = "12345678"; RECOMMENDED public static final String TEST_ACCOUNT_NO = "12345678";, Combine multiple if statements into one: Wherever possible, try to combine multiple if statements into single one. For example, the below code; if(age > 18){ if( voted == false){ // eligible to vote. } } can be combined into single if statements, as: if(age > 18 && !voted){ // eligible to vote } switch should have default: Always add a default case for the switch statements. Avoid duplicate string literals, instead create a constant: If you have to use a string in several places, avoid using it as a literal. Instead create a String constant and use it. For example, from the below code, private void someMethod(){ logger.log("My Application" + e); .... .... logger.log("My Application" + f); } The string literal “My Application” can be made as an Constant and used in the code. public static final String MY_APP = "My Application"; private void someMethod(){ logger.log(MY_APP + e); .... .... logger.log(MY_APP + f); } Additional Resources: A collection of Java best practices. List of available Checkstyle checks. List of PMD Rule sets
September 14, 2012
by Veera Sundar
· 45,963 Views · 1 Like
article thumbnail
Perl in Node.js
Yes, Perl5 can be embedded in node.js! First of all, do a npm install perl. (P.S. node-perl requires a perl5 binary built with -fPIC and -Duseshrplib.) This is synchronous but useful embedded Perl5 for node.js. If you want to try any version of perl, you must check out perl-node. #>git clone git://github.com/hideo55/node-perl.git #>cd node-perl #>node-waf configure #>node-waf build #>node-waf install And then: var Perl = require('perl').Perl(); var perl = new Perl(); perl.Run({ opts : ["-Mfeature=say","-e","say 'Hello world'"] }, function(out,err){ console.log(out); }); perl.Run({ script : 'example.pl', args : ['foo', 'bar'] }); If you opted for Perl5: var Perl = require('perl-simple').Perl; var perl = new Perl(); var ret = perl.evaluate("reverse 'yoeman'"); console.log(ret); // => nameoy var Perl = require('../index.js').Perl; var perl = new Perl(); perl.use('LWP::UserAgent'); var ua = perl.getClass('LWP::UserAgent').new(); var res = ua.get('http://utf-8.jp/'); console.log(res.as_string()); Happy hacking!
September 14, 2012
by Hemanth HM
· 17,051 Views · 1 Like
article thumbnail
Erlang: tuples and lists
You can't seriously program in a language just with scalar types like numbers, strings and atoms. For this reason, now that we have a basic knowledge of Erlang's syntax and variables, we have to delve into two basic vector types: tuples and lists. Both tuples and lists represent a collection of values; however, some rules of thumb (imho) to choosing between them are: tuples deal with heterogeneous values, while lists are homegeneous. A tuple is then usually built as a sequence of values of different types, while all of the values of a list are of the same type. This struct versus array differentiation is true also in Python. Tuples and lists are pattern-matched differently (we'll see more of this when writing pattern matching code, of course). Tuples have O(1) random access, while lists have O(N) random access, being built of cons cells. In general, fixed-size structures are modelled as tuples while sequences of N values (where N varies at runtime) are modelled as lists. Tuples Erlang tuples are similar in syntax to Python's ones: 1> MyTuple = {number, 42}. {number,42} 2> tuple_size(MyTuple). 2 3> element(1, MyTuple). number 4> element(2, MyTuple). 42 And of course they are immutable, like every other value: 5> setelement(2, MyTuple, 43). {number,43} 6> MyTuple. {number,42} They can have any number of values: 7> {true, 23, "Hello"}. {true,23,"Hello"} And the empty tuple is: 8> {}. {} Lists Lists are built as a sequence of cons cells (one of LISP's basic data structures; cons means construct*). Each cons cell is composed by a value and a pointer to another cons cell, which may be empty. Thus the list [1, 2, 3] is composed of three cons cells: p1: [1, p2] p2: [2, p3] p3: [3, p_to_empty_list] Lists can be either built as sequences or even by specifying the cons cells directly. In the first case, values are separated by `,`, while in the second they are separated by `|`: 1> [] 1> . [] 2> [1]. [1] 3> [1, 2]. [1,2] 4> [1 | [2]]. [1,2] 5> [1, 2, 3]. [1,2,3] 6> [1 | [2 | [3]]]. [1,2,3] Every function operating on lists is defined in terms of two primitives, head and tail, which return respectively the first element of the list and the rest of the list with that element removed. While in other languages these functions are provided as head/1 and tail/1 (car and cdr for friends), in Erlang they are implemented via pattern matching; this means they are built into the language syntax. Our little exercise for today is to write these constructs as ordinary functions, to introduce how pattern matching works on lists. head/1 Let's start with a simple case: an empty list. If you ask for the first element of such a list, our implementation should raise an error as there is no such element. #!/usr/bin/escript head([]) -> throw(error). main(_) -> head([]). This indeed shows: escript: exception throw: error in function erl_eval:local_func/5 in call from escript:interpret/4 in call from escript:start/1 in call from init:start_it/1 in call from init:start_em/1 when executed. What has happened? We can put literal values in the formal arguments of a function, and the body of the function will only be executed if the values match these literals. Of course this also means that when we execute: main(_) -> head([1]). we get: escript: exception error: {function_clause,[{local,head,[[1]]}]} since the function is only defined for [] as an argument. Let's add another clause to make it work also for 1-element lists: #!/usr/bin/escript head([]) -> throw( error); head([Element]) -> Element. main(_) -> E = head([1]), io:format("Head: ~p~n", [E]). Note that cases are separated by ; instead of `.` and are evaluated in sequence, so you should put the corner cases (or the base case of recursion) first. Here we didn't use a literal pattern, since we don't know what is in the list in general. We used a variable name, so that Element is filled with the value of the only element of the list. Now we can extend the code further, so that it deals with multiple-value lists: #!/usr/bin/escript head([]) -> throw( error); head([Element]) -> Element; head([Element | _Tail]) -> Element. main(_) -> io:format("Head of [1]: ~p~n", [head([1])]), io:format("Head of [1, 2]: ~p~n", [head([1, 2])]). We use _Tail instead of Tail to tell Erlang that we don't need the value of this argument, but that it must exist. Actually, we know that [1] is actually [1 | []], so we can simplify this code a bit as the third clause would match the single-element list case: #!/usr/bin/escript head([]) -> throw( error); head([Element | _Tail]) -> Element. main(_) -> io:format("Head of [1]: ~p~n", [head([1])]), io:format("Head of [1, 2]: ~p~n", [head([1, 2])]). You're not limited to pattern matching to act on lists: explore the lists* module to see how member(), nth() or length() can be use to test an element's presence, read a single value or calculate the length of the list. I'm out of space for today, so I leave the similar tail/1 implementation as an exercise for the reader. Conclusions Tuples and lists are base Erlang data structures. Exercise with them and with pattern matching in the shell to make sure that you know how to manipulate variables before we move from defining functions to make them collaborate.
September 12, 2012
by Giorgio Sironi
· 21,210 Views
article thumbnail
A Better Java Shell Script Wrapper
In many Java projects, you often see wrapper shell script to invoke the java command with its custom application parameters. For example, $ANT_HOME/bin/ant, $GROOVY_HOME/bin/groovy, or even in our TimeMachine Scheduler you will see $TIMEMACHINE_HOME/bin/scheduler.sh. Writing these wrapper script is boring and error prone. Most of the problems come from setting the correct classpath for the application. If you're working on an in-house project for a company, then you can get away with hardcoding paths and your environment vars. But for open source projects, folks have to make the wrapper more flexible and generic. Most of them even provide a .bat version of it. Windows DOS is really a brutal and limited terminal to script away your project need. For this reason, I often encourage others to use Cygwin as much as they can. It at least has a real bash shell to work with. Another common problem with these wrappers is it can quickly get out of hand and have too many duplication of similar scripts liter every where in your project. In this post, I will show you a Java wrapper script that I've written. It's simple to use and very flexible for running just about any Java program. Let's see how it's used first, and then I will print its content at the bottom of the post. Introducing the run-java wrapper script If you take a look at $TIMEMACHINE_HOME/bin/scheduler.sh, you will see that it in turns calls a run-java script that comes in the same directory. DIR=$(dirname $0) SCHEDULER_HOME=$DIR/.. $DIR/run-java -Dscheduler.home="$SCHEDULER_HOME" timemachine.scheduler.tool.SchedulerServer "$@" As you can see, our run-java can take -D options. Not only this, it can also take -cp option as well! What's more is that you can specify these options even after the main class! This makes the run-java re-wrappable by other script, and still be able to add additional system properties and classpath. For examples, the TimeMachine comes with Groovy library, so instead of downloading it's full distribution again, you can simply invoke the groovy like this $TIMEMACHINE_HOME/bin/run-java groovy.ui.GroovyMain test.groovy You can use run-java in any directory you're in, so it's convenient. It will resolve it's own directory and load any jars in the lib directory automatically. Now if you want Groovy to run with more additional jars, you can use the -cp option like this: $TIMEMACHINE_HOME/bin/run-java -cp "$HOME/apps/my-app/lib/*" groovy.ui.GroovyMain test.groovy Often times things will go wrong if you are not careful with Java classpath, but with run-java script you can perform a dry run first: RUN_JAVA_DRY=1 $TIMEMACHINE_HOME/bin/run-java -cp "$HOME/apps/my-app/lib/*" groovy.ui.GroovyMain test.groovy You would run the above all in single line on a command prompt. It should print out your full java command with all options and arguments for you to inspect. There are many more options to the script, which you can find out more by reading the comments in it. The current script will work on any Linux bash or on a Windows Cygwin terminal. Using run-java during development with Maven Above examples are assuming you are in a released project structure such as this $TIMEMACHINE_HOME +- bin/run-java +- lib/*.jar But what about during development? A frequent use case is that you want to be able to run your latest compiled classes under target/classes without have to package up or release the entire project. You can use our run-java in these scenario as well. First, simply add bin/run-java in your project, then you run mvn compile dependency:copy-dependencies that will generate all the jar files into target/dependency. That's all. The run-java will automatically detect these directories and create the correct classpath to run your main class. If you use Eclipse IDE for development, then your target/classes will be always up-to-date, and the run-java can be a great gem to have in your project even for development. Get the run-java wrapper script now #!/usr/bin/env bash # # Copyright 2012 Zemian Deng # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # A wrapper script that run any Java6 application in unix/cygwin env. # # This script is assumed to be located in an application's "bin" directory. It will # auto resolve any symbolic link and always run in relative to this application # directory (which is one parent up from the script.) Therefore, this script can be # run any where in the file system and it will still reference this application # directory. # # This script will by default auto setup a Java classpath that picks up any "config" # and "lib" directories under the application directory. It also will also add a # any typical Maven project output directories such as "target/test-classes", # "target/classes", and "target/dependency" into classpath. This can be disable by # setting RUN_JAVA_NO_PARSE=1. # # If the "Default parameters" section bellow doesn't match to user's env, then user # may override these variables in their terminal session or preset them in shell's # profile startup script. The values of all path should be in cygwin/unix path, # and this script will auto convert them into Windows path where is needed. # # User may customize the Java classpath by setting RUN_JAVA_CP, which will prefix to existing # classpath, or use the "-cp" option, which will postfix to existing classpath. # # Usage: # run-java [java_opts] [-cp /more/classpath] [-Dsysprop=value] # # Example: # run-java example.Hello # run-java example.Hello -Dname=World # run-java org.junit.runner.JUnitCore example.HelloTest -cp "C:\apps\lib\junit4.8.2\*" # # Created by: Zemian Deng 03/09/2012 # This run script dir (resolve to absolute path) SCRIPT_DIR=$(cd $(dirname $0) && pwd) # This dir is where this script live. APP_DIR=$(cd $SCRIPT_DIR/.. && pwd) # Assume the application dir is one level up from script dir. # Default parameters JAVA_HOME=${JAVA_HOME:=/apps/jdk} # This is the home directory of Java development kit. RUN_JAVA_CP=${RUN_JAVA_CP:=$CLASSPATH} # A classpath prefix before -classpath option, default to $CLASSPATH RUN_JAVA_OPTS=${RUN_JAVA_OPTS:=} # Java options (-Xmx512m -XX:MaxPermSize=128m etc) RUN_JAVA_DEBUG=${RUN_JAVA_DEBUG:=} # If not empty, print the full java command line before executing it. RUN_JAVA_NO_PARSE=${RUN_JAVA_NO_PARSE:=} # If not empty, skip the auto parsing of -D and -cp options from script arguments. RUN_JAVA_NO_AUTOCP=${RUN_JAVA_NO_AUTOCP:=} # If not empty, do not auto setup Java classpath RUN_JAVA_DRY=${RUN_JAVA_DRY:=} # If not empty, do not exec Java command, but just print # OS specific support. $var _must_ be set to either true or false. CYGWIN=false; case "`uname`" in CYGWIN*) CYGWIN=true ;; esac # Define where is the java executable is JAVA_CMD=java if [ -d "$JAVA_HOME" ]; then JAVA_CMD="$JAVA_HOME/bin/java" fi # Auto setup applciation's Java Classpath (only if they exists) if [ -z "$RUN_JAVA_NO_AUTOCP" ]; then if $CYGWIN; then # Provide Windows directory conversion JAVA_HOME_WIN=$(cygpath -aw "$JAVA_HOME") APP_DIR_WIN=$(cygpath -aw "$APP_DIR") if [ -d "$APP_DIR_WIN\config" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\config" ; fi if [ -d "$APP_DIR_WIN\target\test-classes" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\target\test-classes" ; fi if [ -d "$APP_DIR_WIN\target\classes" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\target\classes" ; fi if [ -d "$APP_DIR_WIN\target\dependency" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\target\dependency\*" ; fi if [ -d "$APP_DIR_WIN\lib" ]; then RUN_JAVA_CP="$RUN_JAVA_CP;$APP_DIR_WIN\lib\*" ; fi else if [ -d "$APP_DIR/config" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/config" ; fi if [ -d "$APP_DIR/target/test-classes" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/target/test-classes" ; fi if [ -d "$APP_DIR/target/classes" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/target/classes" ; fi if [ -d "$APP_DIR/target/dependency" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/target/dependency/*" ; fi if [ -d "$APP_DIR/lib" ]; then RUN_JAVA_CP="$RUN_JAVA_CP:$APP_DIR/lib/*" ; fi fi fi # Parse addition "-cp" and "-D" after the Java main class from script arguments # This is done for convenient sake so users do not have to export RUN_JAVA_CP and RUN_JAVA_OPTS # saparately, but now they can pass into end of this run-java script instead. # This can be disable by setting RUN_JAVA_NO_PARSE=1. if [ -z "$RUN_JAVA_NO_PARSE" ]; then # Prepare variables for parsing FOUND_CP= declare -a NEW_ARGS IDX=0 # Parse all arguments and look for "-cp" and "-D" for ARG in "$@"; do if [[ -n $FOUND_CP ]]; then if [ "$OS" = "Windows_NT" ]; then # Can't use cygpath here, because cygpath will auto expand "*", which we do not # want. User will just have to use OS path when specifying "-cp" option. #ARG=$(cygpath -w -a $ARG) RUN_JAVA_CP="$RUN_JAVA_CP;$ARG" else RUN_JAVA_CP="$RUN_JAVA_CP:$ARG" fi FOUND_CP= else case $ARG in '-cp') FOUND_CP=1 ;; '-D'*) RUN_JAVA_OPTS="$RUN_JAVA_OPTS $ARG" ;; *) NEW_ARGS[$IDX]="$ARG" let IDX=$IDX+1 ;; esac fi done # Display full Java command. if [ -n "$RUN_JAVA_DEBUG" ] || [ -n "$RUN_JAVA_DRY" ]; then echo "$JAVA_CMD" $RUN_JAVA_OPTS -cp "$RUN_JAVA_CP" "${NEW_ARGS[@]}" fi # Run Java Main class using parsed variables if [ -z "$RUN_JAVA_DRY" ]; then "$JAVA_CMD" $RUN_JAVA_OPTS -cp "$RUN_JAVA_CP" "${NEW_ARGS[@]}" fi else # Display full Java command. if [ -n "$RUN_JAVA_DEBUG" ] || [ -n "$RUN_JAVA_DRY" ]; then echo "$JAVA_CMD" $RUN_JAVA_OPTS -cp "$RUN_JAVA_CP" "$@" fi # Run Java Main class if [ -z "$RUN_JAVA_DRY" ]; then "$JAVA_CMD" $RUN_JAVA_OPTS -cp "$RUN_JAVA_CP" "$@" fi fi
September 11, 2012
by Zemian Deng
· 14,719 Views
article thumbnail
Getting Started: Apache Camel Using Groovy
From their site, it says the Apache Camel is a versatile open-source integration framework based on known Enterprise Integration Patterns. It might seem like a vague definition, but I want to tell you that this is a very productive Java library that can solve many of typical IT problems! You can think of it as a very light weight ESB framework with "batteries" included. In every jobs I've been to so far, folks are writing their own solutions in one way or another to solve many common problems (or they would buy some very expensive enterprisy ESB servers that takes months and months to learn, config, and maintain). Things that we commonly solve are integration (glue) code of existing business services together, process data in a certain workflow manner, or move and transform data from one place to another etc. These are very typical need in many IT environments. The Apache Camel can be used in cases like these; not only that, but also in a very productive and effective way! In this article, I will show you how to get started with Apache Camel along with just few lines of Groovy script. You can certainly also start off with a full Java project to try out Camel, but I find Groovy will give you the shortest working example and learning curve. Getting started with Apache Camel using Groovy So let's begin. First let's see a hello world demo with Camel + Groovy. @Grab('org.apache.camel:camel-core:2.10.0') @Grab('org.slf4j:slf4j-simple:1.6.6') import org.apache.camel.* import org.apache.camel.impl.* import org.apache.camel.builder.* def camelContext = new DefaultCamelContext() camelContext.addRoutes(new RouteBuilder() { def void configure() { from("timer://jdkTimer?period=3000") .to("log://camelLogger?level=INFO") } }) camelContext.start() addShutdownHook{ camelContext.stop() } synchronized(this){ this.wait() } Save above into a file named helloCamel.groovy and then run it like this: $ groovy helloCamel.groovy 388 [main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.10.0 (CamelContext: camel-1) is starting 445 [main] INFO org.apache.camel.management.ManagementStrategyFactory - JMX enabled. 447 [main] INFO org.apache.camel.management.DefaultManagementLifecycleStrategy - StatisticsLevel at All so enabling load performance statistics 678 [main] INFO org.apache.camel.impl.converter.DefaultTypeConverter - Loaded 170 type converters 882 [main] INFO org.apache.camel.impl.DefaultCamelContext - Route: route1 started and consuming from: Endpoint[timer://jdkTimer?period=3000] 883 [main] INFO org.apache.camel.impl.DefaultCamelContext - Total 1 routes, of which 1 is started. 887 [main] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.10.0 (CamelContext: camel-1) started in 0.496 seconds 898 [Camel (camel-1) thread #1 - timer://jdkTimer] INFO camelLogger - Exchange[ExchangePattern:InOnly, BodyType:null, Body:[Body is null]] 3884 [Camel (camel-1) thread #1 - timer://jdkTimer] INFO camelLogger - Exchange[ExchangePattern:InOnly, BodyType:null, Body:[Body is null]] 6884 [Camel (camel-1) thread #1 - timer://jdkTimer] INFO camelLogger - Exchange[ExchangePattern:InOnly, BodyType:null, Body:[Body is null]] ... The little script above is simple but it presented few key features of Camel Groovyness. The first and last section of the helloCamel.groovy script are just Groovy featuers. The @Grab annotation will automatically download the dependency jars you specify. We import Java packages to use its classes later. At the end we ensure to shutdown Camel before exiting JVM through the Java Shutdown Hook mechanism. The program will sit and wait until user press CTRL+C, just as a typical server process behavior. The middle section is where the Camel action is. You would always create a Camel context to begin (think of it as the server or manager for the process.) And then you would add a Camel route (think of it as a workflow or pipeflow) that you like to process data (Camel likes to call these data "messages"). The route consists of a "from" starting point (where data generated), and one or more "to" points (where data going to be processed). Camel calls these destination 'points' as 'Endpoints'. These endpoints can be expressed in simple URI string format such as "timer://jdkTimer?period=3000". Here we are generating timer message in every 3 secs into the pipeflow, and then process by a logger URI, which will simply print to console output. After Camel context started, it will start processing data through the workflow, as you can observe from the output example above. Now try pressing CTRL+C to end its process. Notice how the Camel will shutdown everything very gracefully. 7312 [Thread-2] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.10.0 (CamelContext: camel-1) is shutting down 7312 [Thread-2] INFO org.apache.camel.impl.DefaultShutdownStrategy - Starting to graceful shutdown 1 routes (timeout 300 seconds) 7317 [Camel (camel-1) thread #2 - ShutdownTask] INFO org.apache.camel.impl.DefaultShutdownStrategy - Route: route1 shutdown complete, was consuming from: Endpoint[timer://jdkTimer?period=3000] 7317 [Thread-2] INFO org.apache.camel.impl.DefaultShutdownStrategy - Graceful shutdown of 1 routes completed in 0 seconds 7321 [Thread-2] INFO org.apache.camel.impl.converter.DefaultTypeConverter - TypeConverterRegistry utilization[attempts=2, hits=2, misses=0, failures=0] mappings[total=170, misses=0] 7322 [Thread-2] INFO org.apache.camel.impl.DefaultCamelContext - Apache Camel 2.10.0 (CamelContext: camel-1) is shutdown in 0.010 seconds. Uptime 7.053 seconds. So that's our first taste of Camel ride! However, we titled this section as "Hello World!" demo, and yet we haven't seen any. But you might have also noticed that above script are mostly boiler plate code that we setup. No user logic has been added yet. Not even the logging the message part! We simply configuring the route. Now let's modify the script little bit so we will actually add our user logic to process the timer message. @Grab('org.apache.camel:camel-core:2.10.0') @Grab('org.slf4j:slf4j-simple:1.6.6') import org.apache.camel.* import org.apache.camel.impl.* import org.apache.camel.builder.* def camelContext = new DefaultCamelContext() camelContext.addRoutes(new RouteBuilder() { def void configure() { from("timer://jdkTimer?period=3000") .to("log://camelLogger?level=INFO") .process(new Processor() { def void process(Exchange exchange) { println("Hello World!") } }) } }) camelContext.start() addShutdownHook{ camelContext.stop() } synchronized(this){ this.wait() } Notice how I can simply append the process code part right after the to("log...") line. I have added a "processor" code block to process the timer message. The logic is simple: we greet the world on each tick. Making Camel route more concise and practical Now, do I have you at Hello yet? If not, then I hope you will be patient and continue to follow along for few more practical features of Camel. First, if you were to put Camel in real use, I would recommend you setup your business logic separately from the workflow route definition. This is so that you can clearly express and see your entire pipeflow of route at a glance. To do this, you want to move the "processor", into a service bean. @Grab('org.apache.camel:camel-core:2.10.0') @Grab('org.slf4j:slf4j-simple:1.6.6') import org.apache.camel.* import org.apache.camel.impl.* import org.apache.camel.builder.* import org.apache.camel.util.jndi.* class SystemInfoService { def void run() { println("Hello World!") } } def jndiContext = new JndiContext(); jndiContext.bind("systemInfoPoller", new SystemInfoService()) def camelContext = new DefaultCamelContext(jndiContext) camelContext.addRoutes(new RouteBuilder() { def void configure() { from("timer://jdkTimer?period=3000") .to("log://camelLogger?level=INFO") .to("bean://systemInfoPoller?method=run") } }) camelContext.start() addShutdownHook{ camelContext.stop() } synchronized(this){ this.wait() } Now, see how compact this workflow route has become? The Camel's Java DSL such as "from().to().to()" for defining route are so clean and simple to use. You can even show this code snip to your Business Analysts, and they would likely be able to verify your business flow easily! Wouldn't that alone worth a million dollars? How about another demo: FilePoller Processing File polling processing is a very common and effective way to solve many business problems. If you work for commercial companies long enough, you might have written one before. A typical file poller would process incoming files from a directory and then process the content, and then move the file into a output directory. Let's make a Camel route to do just that. @Grab('org.apache.camel:camel-core:2.10.0') @Grab('org.slf4j:slf4j-simple:1.6.6') import org.apache.camel.* import org.apache.camel.impl.* import org.apache.camel.builder.* import org.apache.camel.util.jndi.* class UpperCaseTextService { def String transform(String text) { return text.toUpperCase() } } def jndiContext = new JndiContext(); jndiContext.bind("upperCaseTextService", new UpperCaseTextService()) def dataDir = "/${System.properties['user.home']}/test/file-poller-demo" def camelContext = new DefaultCamelContext(jndiContext) camelContext.addRoutes(new RouteBuilder() { def void configure() { from("file://${dataDir}/in") .to("log://camelLogger") .to("bean://upperCaseTextService?method=transform") .to("file://${dataDir}/out") } }) camelContext.start() addShutdownHook{ camelContext.stop() } synchronized(this){ this.wait() } Here you see I defined a route to poll a $HOME/test/file-poller-demo/in directory for text files. Once it's found it will log it to console, and then process by a service that transform the content text into upper case. After this, it will send the file into $HOME/test/file-poller-demo/out directory. My goodness, reading the Camel route above probably express what I wrote down just as effective. Do you see the benefits here? What's the "batteries" included part. If you've used Python programming before, you might have heard the pharase that they claim often: Python has "batteries" included. This means their interpreter comes with a rich of libaries for most of the common programming need. You can often write python program without have to download separated external libraries. I am making similar analogies here with Apache Camel. The Camel project comes with so many ready to use components that you can find just about any transport protocals that can carry data. These Camel "components" are ones that support different 'Endpoint URI' that we have seen in our demos above. We have simply shown you timer, log, bean, and file components, but there are over 120 more. You will find jms, http, ftp, cfx, or tcp just to name a few. The Camel project also has an option for you to define route in declarative xml format. The xml is just an extension of a Spring xml config with Camel's namespace handler added on top. Spring is optional in Camel, but you can use it together in a very powerful way.
September 10, 2012
by Zemian Deng
· 15,628 Views · 1 Like
article thumbnail
"Schemas" in CouchDB
schema noun ( pl. schemata or schemas ) 1 technical a representation of a plan or theory in the form of an outline or model: a schema of scientific reasoning. 2 Logic a syllogistic figure. 3 (in Kantian philosophy) a conception of what is common to all members of a class; a general or essential type or form. CouchDB is a schema-less document store, but there are times when a schema is a good thing to have around, one way or another. So can you have your cake and eat it too? Below I'll take a high level look at adding a kind of schema to an application and the benefits and draw backs associated with this way of working. What I describe below isn't for everyone. It goes against some of the core principles of CouchDB and makes your data much less human readable, but there are cases where that trade off is worth making. Schemas: WTF?! It might seem a bit weird to add a schema to a schema-less database but sometimes it is a very useful thing indeed. When you're dealing with large datasets verbose object key names can be a problem (e.g. cost you money) so you end up stuck between a rock and a hard place; either make your data terse and hard to use or be explicit and spend more on storage and network. { "shape": "triangle", "colour_label": "red", "opposite_length_in_mm": 767.12254256805875, "angle_in_radians": 1.5514293603308698, "adjacent_length_in_mm": 73.59881843627835 } What usually happens is some middle ground where a nice descriptive name like "angle_in_radians" gets reduced to "angle" or "rads". That's fine in that it reduces the storage and network required to deal with all that data. { "adj": 73.59881843627835, "shape": "triangle", "angle": 1.5514293603308698, "opp": 767.12254256805875, "colour": "red" } However, by making this small change you move the description of the data out of your database and into some undefined place; higher level code, documentation, shared knowledge, a whiteboard, a notebook, someones head. As your data becomes more terse you might rely on duck typing (deriving from the data itself what the data describes) to get data that quacks right in your application. That's fine so long as you have data that is sufficiently distinguishable from the other ducks on the pond; if I rely on pulling a triangle object from the database because it has an angle member I might accidentally pull out a rhombus or an icosahedron. To make sure you get the data you expect you might add an explicit type field to each data (e.g. "type=goose" or "shape=triangle") something which I've always felt was rather odd. This starts to add up on storage (remember you have a large dataset/flock of ducks) and, more importantly, it doesn't help with where the description of the data is held - you know that you have a goose but don't know what a goose is. This last point is important, especially if you're working in a team of developers. Knowing what describing a shape as a triangle means is vital in producing consistent code that many people can work on. The straight jacket of a SQL schema looks pretty comfy sometimes. Okay, I'll buy that a schema might be useful... So how do you add a schema into a CouchDB database, something that is inherently schema-less? Can I get the best of both worlds? Here's a little trick that might help. First you define a document that is the schema for a particular type of data: { "_id": "datatype/triangle/v1", "fields": [ "opposite_length_in_mm", "adjacent_length_in_mm", "angle_in_radians", "colour_label" ] } Then you change your document structure to reference that "schema": { "datatype": "triangle/v1", "data": [ 879.07395066446952, 84.607510245708468, 1.4444230241122715, "red" ] } Note that the schema is versioned and that ordering in the data list is important here! I now know precisely what the data represents without having to store that description in the data itself. This way of working has benefits beyond disk storage; you reduce wire traffic, and there is less for a client to parse before rendering it. This is especially useful if you're rendering into a browser based visualisation - you don't need a complex set of objects to make a bar chart, just a list of x and y values. I can also share the data structure with colleagues and be reasonably confident that when I'm talking about a "v1 triangle" they'll know that lengths are in millimeters, are the opposite and adjacent sides and that the angle is in radians, hopefully reducing the chance of costly mistakes. Isn't that error prone? Yes and no. If you make a mistake in the ordering of your fields then, yes you are going to have issues. This is reasonably easy to manage with some form of client verification (e.g. validation on a web form) and generating the interface from the data (e.g. use the schema definition to build the GUI). If you're adding these data into the database by hand (e.g. via a curl or futon) then you aren't going to be in the regime where this trick is useful; your dataset needs to be large for this to make sense. Things still quack What's particularly nice about this way of working is that I can still duck type the data, add additional fields to annotate it etc. since the schema isn't strictly enforced. Nothing stops me from having a triangle document like: { "datatype": "triangle/v1", "data": [ 879.07395066446952, 84.607510245708468, 1.4444230241122715, "red" ], "owner": "Simon", "location" "space" } My views that deal with the data with a schema will still work (by ignoring these additional fields), my MVC framework will still render my pages, and I'll still have all the data I want in my database. Nesting You could have a nested object structure like: { "datatype": "pattern/v1", "data": [ { "datatype": "triangle/v1", "data": [ 879.07395066446952, 84.607510245708468, 1.4444230241122715, "red" ], "owner": "Simon", "location" "space" }, { "datatype": "triangle/v1", "data": [ 879.07395066446952, 84.607510245708468, 1.4444230241122715, "blue" ], "owner": "Fred", "location" "space" }, { "datatype": "square/v1", data: [ 10, "green" ] } ] } But if you're going to have a schema you may as well reflect the nesting inside it, e.g say that you have a list of triangles and a list of squares: { "_id": "datatype/pattern/v1", "fields": [ ["triangle/v1"], ["square/v1"] ] } { "datatype": "pattern/v1", "data": [ [ { "data": [ 879.07395066446952, 84.607510245708468, 1.4444230241122715, "red" ], "owner": "Simon", "location" "space" }, { "data": [ 879.07395066446952, 84.607510245708468, 1.4444230241122715, "blue" ], "owner": "Fred", "location" "space" } ], [ { data: [ 10, "green" ] } ] } Schema evolution A nice feature of this way of working is that you can deal with schema evolutions; changing the format of your data. { "_id": "datatype/triangle/v2", "fields": [ "opposite_length_in_cm", "hypotenuse_length_in_cm", "angle_in_degrees", "colour_label" ] } There are only so many ways you can represent the data. While sometimes you may have a major schema evolution, one where old data is completely unusable, often changes are just tweaks for consistency (say changing the units of a quantity) or extending the schema by adding in optional data. In either case you should be able to use data from multiple schema versions together by using appropriate manipulations on the data. For example you could instantiate shape objects via a factory which knows how to create the right object for different schema versions. Validation The above does no validation of the data; the color field in the input data could be set to a number instead of a string, the angle to something non- physical etc. If you really needed validation you could do it with CouchDB's validation functions. If you go the fully validated route you'd want to define the schema in the design document (instead of as a normal doc) and use a CommonJS include to make sure that the validator in the app was doing the same thing as the schema. This ties you to a version of the design document (which is where the validators live), which may or may not be an issue. It will also considerably slow down insertion rate as CouchDB has to do more work to add your data. Personally I prefer to put validation logic in the client making writes. Views If I were using this way of working I would want to have a view which returned all the schema's defined on the database. This then allows me to build objects appropriately. A view to return schema's documents would look like: function(doc) { if (doc._id.slice(0, 'datatype'.length) == 'datatype') { emit (doc._id.slice('datatype/'.length, doc._id.length), doc.fields) } } You can pull out documents that have a schema with a simple view like: function(doc) { if (doc.datatype){ emit(doc.datatype, doc.data); } } This can be queried to find objects of a given shape using CouchDB's view slicing (e.g. ?startkey="square/v1"&endkey="square/v2") which returns data like: {"id":"datatype/square/v1","key":["square/v1",0],"value":["side_length_in_mm","colour_label"]}, {"id":"f98ffe7e4cd91cbb0d904f9098499ca8","key":["square/v1",1],"value":[872.4342711412228,"green"]}, {"id":"f98ffe7e4cd91cbb0d904f909849a218","key":["square/v1",1],"value":[370.29971491443905,"yellow"]}, {"id":"f98ffe7e4cd91cbb0d904f909849acd0","key":["square/v1",1],"value":[8.799279300193753,"yellow"]} You'll notice the name of the "schema" is the key and the values are held in value. This means I can parse the data into a set of appropriate objects with something like: var objects = []; function build(schema, data){ // Build the appropriate object for the schema... } for (row in data){ // build up the objects in a factory var obj = build(row.key, row.value); objects.push(obj); } If I wanted all versions of a shape the query would be, and used a vNUMERIC_COUNTER notation for versioning, ?startkey="square/v1"&endkey="square/vXXX" as numbers sort lower than strings. Taking it to the extreme If you are really worried about data size you can take this technique to the extreme by encoding the data arrays as a byte string and using the schema documents to describe that byte array. This effectively turns your JSON structure into something not dissimilar to a protocol buffer, at the expense of human readability and view complexity. If you are particularly concerned with data size over the wire (for example are writing an MMORPG) then this may be an acceptable trade off. Reminder This trick isn't suitable for every dataset. If you modify the data by hand it is prone to error. If you have a small dataset, or only ever send a small subset of the data to the client it's massive overkill. But if you have a large dataset of machine generated data, that needs to be frequently accessed over the WAN (think a monitoring app or game) then this is a nice way to reduce storage, network IO and browser render time. It's also worth reiterating that the schema is not enforced, you could have a square with 3 sides, and that adding strict schema enforcement with a validation function will considerably slow down insert rate.
September 8, 2012
by Simon Metson
· 10,352 Views
article thumbnail
Java 7: HashMap vs ConcurrentHashMap
As you may have seen from my past performance related articles and HashMap case studies, Java thread safety problems can bring down your Java EE application and the Java EE container fairly easily. One of most common problems I have observed when troubleshooting Java EE performance problems is infinite looping triggered from the non-thread safe HashMap get() and put() operations. This problem is known since several years but recent production problems have forced me to revisit this issue one more time. This article will revisit this classic thread safety problem and demonstrate, using a simple Java program, the risk associated with a wrong usage of the plain old java.util.HashMap data structure involved in a concurrent threads context. This proof of concept exercise will attempt to achieve the following 3 goals: Revisit and compare the Java program performance level between the non-thread safe and thread safe Map data structure implementations (HashMap, Hashtable, synchronized HashMap, ConcurrentHashMap) Replicate and demonstrate the HashMap infinite looping problem using a simple Java program that everybody can compile, run and understand Review the usage of the above Map data structures in a real-life and modern Java EE container implementation such as JBoss AS7 For more detail on the ConcurrentHashMap implementation strategy, I highly recommend the great article from Brian Goetz on this subject. Tools and server specifications As a starting point, find below the different tools and software’s used for the exercise: Sun/Oracle JDK & JRE 1.7 64-bit Eclipse Java EE IDE Windows Process Explorer (CPU per Java Thread correlation) JVM Thread Dump (stuck thread analysis and CPU per Thread correlation) The following local computer was used for the problem replication process and performance measurements: Intel(R) Core(TM) i5-2520M CPU @ 2.50Ghz (2 CPU cores, 4 logical cores) 8 GB RAM Windows 7 64-bit * Results and performance of the Java program may vary depending of your workstation or server specifications. Java program In order to help us achieve the above goals, a simple Java program was created as per below: The main Java program is HashMapInfiniteLoopSimulator.java A worker Thread class WorkerThread.java was also created The program is performing the following: Initialize different static Map data structures with initial size of 2 Assign the chosen Map to the worker threads (you can chose between 4 Map implementations) Create a certain number of worker threads (as per the header configuration). 3 worker threads were created for this proof of concept NB_THREADS = 3; Each of these worker threads has the same task: lookup and insert a new element in the assigned Map data structure using a random Integer element between 1 – 1 000 000. Each worker thread perform this task for a total of 500K iterations The overall program performs 50 iterations in order to allow enough ramp up time for the HotSpot JVM The concurrent threads context is achieved using the JDK ExecutorService As you can see, the Java program task is fairly simple but complex enough to generate the following critical criteria’s: Generate concurrency against a shared / static Map data structure Use a mix of get() and put() operations in order to attempt to trigger internal locks and / or internal corruption (for the non-thread safe implementation) Use a small Map initial size of 2, forcing the internal HashMap to trigger an internal rehash/resize Finally, the following parameters can be modified at your convenience: ## Number of worker threads private static final int NB_THREADS = 3; ## Number of Java program iterations private static final int NB_TEST_ITERATIONS = 50; ## Map data structure assignment. You can choose between 4 structures // Plain old HashMap (since JDK 1.2) nonThreadSafeMap = new HashMap(2); // Plain old Hashtable (since JDK 1.0) threadSafeMap1 = new Hashtable(2); // Fully synchronized HashMap threadSafeMap2 = new HashMap(2); threadSafeMap2 = Collections.synchronizedMap(threadSafeMap2); // ConcurrentHashMap (since JDK 1.5) threadSafeMap3 = new ConcurrentHashMap(2); /*** Assign map at your convenience ****/ assignedMapForTest = threadSafeMap3; Now find below the source code of our sample program. #### HashMapInfiniteLoopSimulator.java package org.ph.javaee.training4; import java.util.Collections; import java.util.Map; import java.util.HashMap; import java.util.Hashtable; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; /** * HashMapInfiniteLoopSimulator * @author Pierre-Hugues Charbonneau * */ public class HashMapInfiniteLoopSimulator { private static final int NB_THREADS = 3; private static final int NB_TEST_ITERATIONS = 50; private static Map assignedMapForTest = null; private static Map nonThreadSafeMap = null; private static Map threadSafeMap1 = null; private static Map threadSafeMap2 = null; private static Map threadSafeMap3 = null; /** * Main program * @param args */ public static void main(String[] args) { System.out.println("Infinite Looping HashMap Simulator"); System.out.println("Author: Pierre-Hugues Charbonneau"); System.out.println("http://javaeesupportpatterns.blogspot.com"); for (int i=0; i(2); // Plain old Hashtable (since JDK 1.0) threadSafeMap1 = new Hashtable(2); // Fully synchronized HashMap threadSafeMap2 = new HashMap(2); threadSafeMap2 = Collections.synchronizedMap(threadSafeMap2); // ConcurrentHashMap (since JDK 1.5) threadSafeMap3 = new ConcurrentHashMap(2); // ConcurrentHashMap /*** Assign map at your convenience ****/ assignedMapForTest = threadSafeMap3; long timeBefore = System.currentTimeMillis(); long timeAfter = 0; Float totalProcessingTime = null; ExecutorService executor = Executors.newFixedThreadPool(NB_THREADS); for (int j = 0; j < NB_THREADS; j++) { /** Assign the Map at your convenience **/ Runnable worker = new WorkerThread(assignedMapForTest); executor.execute(worker); } // This will make the executor accept no new threads // and finish all existing threads in the queue executor.shutdown(); // Wait until all threads are finish while (!executor.isTerminated()) { } timeAfter = System.currentTimeMillis(); totalProcessingTime = new Float( (float) (timeAfter - timeBefore) / (float) 1000); System.out.println("All threads completed in "+totalProcessingTime+" seconds"); } } } #### WorkerThread.java package org.ph.javaee.training4; import java.util.Map; /** * WorkerThread * * @author Pierre-Hugues Charbonneau * */ public class WorkerThread implements Runnable { private Map map = null; public WorkerThread(Map assignedMap) { this.map = assignedMap; } @Override public void run() { for (int i=0; i<500000; i++) { // Return 2 integers between 1-1000000 inclusive Integer newInteger1 = (int) Math.ceil(Math.random() * 1000000); Integer newInteger2 = (int) Math.ceil(Math.random() * 1000000); // 1. Attempt to retrieve a random Integer element Integer retrievedInteger = map.get(String.valueOf(newInteger1)); // 2. Attempt to insert a random Integer element map.put(String.valueOf(newInteger2), newInteger2); } } } Performance comparison between thread safe Map implementations The first goal is to compare the performance level of our program when using different thread safe Map implementations: Plain old Hashtable (since JDK 1.0) Fully synchronized HashMap (via Collections.synchronizedMap()) ConcurrentHashMap (since JDK 1.5) Find below the graphical results of the execution of the Java program for each iteration along with a sample of the program console output. # Output when using ConcurrentHashMap Infinite Looping HashMap Simulator Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com All threads completed in 0.984 seconds All threads completed in 0.908 seconds All threads completed in 0.706 seconds All threads completed in 1.068 seconds All threads completed in 0.621 seconds All threads completed in 0.594 seconds All threads completed in 0.569 seconds All threads completed in 0.599 seconds ……………… As you can see, the ConcurrentHashMap is the clear winner here, taking in average only half a second (after an initial ramp-up) for all 3 worker threads to concurrently read and insert data within a 500K looping statement against the assigned shared Map. Please note that no problem was found with the program execution e.g. no hang situation. The performance boost is definitely due to the improved ConcurrentHashMap performance such as the non-blocking get() operation. The 2 other Map implementations performance level was fairly similar with a small advantage for the synchronized HashMap. HashMap infinite looping problem replication The next objective is to replicate the HashMap infinite looping problem observed so often from Java EE production environments. In order to do that, you simply need to assign the non-thread safe HashMap implementation as per code snippet below: /*** Assign map at your convenience ****/ assignedMapForTest = nonThreadSafeMap; Running the program as is using the non-thread safe HashMap should lead to: No output other than the program header Significant CPU increase observed from the system At some point the Java program will hang and you will be forced to kill the Java process What happened? In order to understand this situation and confirm the problem, we will perform a CPU per Thread analysis from the Windows OS using Process Explorer and JVM Thread Dump. 1 - Run the program again then quickly capture the thread per CPU data from Process Explorer as per below. Under explore.exe you will need to right click over the javaw.exe and select properties. The threads tab will be displayed. We can see overall 4 threads using almost all the CPU of our system. 2 – Now you have to quickly capture a JVM Thread Dump using the JDK 1.7 jstack utility. For our example, we can see our 3 worker threads which seems busy/stuck performing get() and put() operations. ..\jdk1.7.0\bin>jstack 272 2012-08-29 14:07:26 Full thread dump Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode): "pool-1-thread-3" prio=6 tid=0x0000000006a3c000 nid=0x18a0 runnable [0x0000000007ebe000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) "pool-1-thread-2" prio=6 tid=0x0000000006a3b800 nid=0x6d4 runnable [0x000000000805f000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.get(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:29) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) "pool-1-thread-1" prio=6 tid=0x0000000006a3a800 nid=0x2bc runnable [0x0000000007d9e000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) .............. 3 – CPU per thread correlation It is now time to convert the Process Explorer thread ID DECIMAL format to HEXA format as per below. The HEXA value allows us to map and identify each thread as per below: ## TID: 1748 (nid=0X6D4) Thread name: pool-1-thread-2 CPU @25.71% Task: Worker thread executing a HashMap.get() operation at java.util.HashMap.get(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:29) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ## TID: 700 (nid=0X2BC) Thread name: pool-1-thread-1 CPU @23.55% Task: Worker thread executing a HashMap.put() operation at java.util.HashMap.put(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ## TID: 6304 (nid=0X18A0) Thread name: pool-1-thread-3 CPU @12.02% Task: Worker thread executing a HashMap.put() operation at java.util.HashMap.put(Unknown Source) at org.ph.javaee.training4.WorkerThread.run(WorkerThread.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ## TID: 5944 (nid=0X1738) Thread name: pool-1-thread-1 CPU @20.88% Task: Main Java program execution "main" prio=6 tid=0x0000000001e2b000 nid=0x1738 runnable [0x00000000029df000] java.lang.Thread.State: RUNNABLE at org.ph.javaee.training4.HashMapInfiniteLoopSimulator.main(HashMapInfiniteLoopSimulator.java:75) As you can see, the above correlation and analysis is quite revealing. Our main Java program is in a hang state because our 3 worker threads are using lot of CPU and not going anywhere. They may appear "stuck" performing HashMap get() & put() but in fact they are all involved in an infinite loop condition. This is exactly what we wanted to replicate. HashMap infinite looping deep dive Now let’s push the analysis one step further to better understand this looping condition. For this purpose, we added tracing code within the JDK 1.7 HashMap Java class itself in order to understand what is happening. Similar logging was added for the put() operation and also a trace indicating that the internal & automatic rehash/resize got triggered. The tracing added in get() and put() operations allows us to determine if the for() loop is dealing with circular dependency which would explain the infinite looping condition. #### HashMap.java get() operation public V get(Object key) { if (key == null) return getForNullKey(); int hash = hash(key.hashCode()); /*** P-H add-on- iteration counter ***/ int iterations = 1; for (Entry e = table[indexFor(hash, table.length)]; e != null; e = e.next) { /*** Circular dependency check ***/ Entry currentEntry = e; Entry nextEntry = e.next; Entry nextNextEntry = e.next != null?e.next.next:null; K currentKey = currentEntry.key; K nextNextKey = nextNextEntry != null?(nextNextEntry.key != null?nextNextEntry.key:null):null; System.out.println("HashMap.get() #Iterations : "+iterations++); if (currentKey != null && nextNextKey != null ) { if (currentKey == nextNextKey || currentKey.equals(nextNextKey)) System.out.println(" ** Circular Dependency detected! ["+currentEntry+"]["+nextEntry+"]"+"]["+nextNextEntry+"]"); } /***** END ***/ Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null; } HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.resize() in progress... HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 2 HashMap.resize() in progress... HashMap.resize() in progress... HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 2 HashMap.put() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 ** Circular Dependency detected! [362565=362565][333326=333326]][362565=362565] HashMap.put() #Iterations : 2 ** Circular Dependency detected! [333326=333326][362565=362565]][333326=333326] HashMap.put() #Iterations : 1 HashMap.put() #Iterations : 1 HashMap.get() #Iterations : 1 HashMap.put() #Iterations : 1 ............................. HashMap.put() #Iterations : 56823 Again, the added logging was quite revealing. We can see that following a few internal HashMap.resize() the internal structure became affected, creating circular dependency conditions and triggering this infinite looping condition (#iterations increasing and increasing...) with no exit condition. It is also showing that the resize() / rehash operation is the most at risk of internal corruption, especially when using the default HashMap size of 16. This means that the initial size of the HashMap appears to be a big factor in the risk & problem replication. Finally, it is interesting to note that we were able to successfully run the test case with the non-thread safe HashMap by assigning an initial size setting at 1000000, preventing any resize at all. Find below the merged graph results: The HashMap was our top performer but only when preventing an internal resize. Again, this is definitely not a solution to the thread safe risk but just a way to demonstrate that the resize operation is the most at risk given the entire manipulation of the HashMap performed at that time. The ConcurrentHashMap, by far, is our overall winner by providing both fast performance and thread safety against that test case. JBoss AS7 Map data structures usage We will now conclude this article by looking at the different Map implementations within a modern Java EE container implementation such as JBoss AS 7.1.2. You can obtain the latest source code from the github master branch. Find below the report: Total JBoss AS7.1.2 Java files (August 28, 2012 snapshot): 7302 Total Java classes using java.util.Hashtable: 72 Total Java classes using java.util.HashMap: 512 Total Java classes using synchronized HashMap: 18 Total Java classes using ConcurrentHashMap: 46 Hashtable references were found mainly within the test suite components and from naming and JNDI related implementations. This low usage is not a surprise here. References to the java.util.HashMap were found from 512 Java classes. Again not a surprise given how common this implementation is since the last several years. However, it is important to mention that a good ratio was found either from local variables (not shared across threads), synchronized HashMap or manual synchronization safeguard so “technically” thread safe and not exposed to the above infinite looping condition (pending/hidden bugs is still a reality given the complexity with Java concurrency programming…this case study involving Oracle Service Bus 11g is a perfect example). A low usage of synchronized HashMap was found with only 18 Java classes from packages such as JMS, EJB3, RMI and clustering. Finally, find below a breakdown of the ConcurrentHashMap usage which was our main interest here. As you will see below, this Map implementation is used by critical JBoss components layers such as the Web container, EJB3 implementation etc. ## JBoss Single Sign On Used to manage internal SSO ID's involving concurrent Thread access Total: 1 ## JBoss Java EE & Web Container Not surprising here since lot of internal Map data structures are used to manage the http sessions objects, deployment registry, clustering & replication, statistics etc. with heavy concurrent Thread access. Total: 11 ## JBoss JNDI & Security Layer Used by highly concurrent structures such as internal JNDI security management. Total: 4 ## JBoss domain & managed server management, rollout plans... Total: 7 ## JBoss EJB3 Used by data structures such as File Timer persistence store, application Exception, Entity Bean cache, serialization, passivation... Total: 8 ## JBoss kernel, Thread Pools & protocol management Used by high concurrent Threads Map data structures involved in handling and dispatching/processing incoming requests such as HTTP. Total: 3 ## JBoss connectors such as JDBC/XA DataSources... Total: 2 ## Weld (reference implementation of JSR-299: Contexts and Dependency Injection for the JavaTM EE platform) Used in the context of ClassLoader and concurrent static Map data structures involving concurrent Threads access. Total: 3 ## JBoss Test Suite Used in some integration testing test cases such as an internal Data Store, ClassLoader testing etc. Total: 3 Final words I hope this article has helped you revisit this classic problem and understand one of the common problems and risks associated with a wrong usage of the non-thread safe HashMap implementation. My main recommendation to you is to be careful when using an HashMap in a concurrent threads context. Unless you are a Java concurrency expert, I recommend that you use ConcurrentHashMap instead which offers a very good balance between performance and thread safety. As usual, extra due diligence is always recommended such as performing cycles of load & performance testing. This will allow you to detect thread safety and / or performance problems before you promote the solution to your client production environment. Please provide any comments and share your experience with ConcurrentHashMap or HashMap implementations and troubleshooting.
September 7, 2012
by Pierre - Hugues Charbonneau
· 154,869 Views · 5 Likes
article thumbnail
OCA Java 7: The if and if-else Constructs
Editor's Note: This post is a free chapter from the book from Manning Publications "In the OCA Java SE 7 programmer certification guide" by Mala Gupta In this article, I'll cover if and if-else constructs. We'll examine what happens when these constructs are used with and without curly braces {}. We'll also cover nested if and if-else constructs. The if construct and its flavors An if construct enables you to execute a set of statements in your code based on the result of a condition. This condition must always evaluate to a boolean or a Boolean value. You can specify a set of statements to execute when this condition evaluates to true or false. (In many Java books, you'll notice that the terms constructs and statements are used interchangeably.) Figure 1 shows multiple flavors of the if statement with their corresponding representations. if if-else if-else-if-else Figure 1 Multiple flavors of if statement: if, if-else, and if-else-if In figure 1, condition1 and condition2 refer to a variable or an expression that must evaluate to boolean or Boolean value. statement1, statement2, and statement3 refer to a single line of code or a code block. Because the Boolean wrapper class isn't covered in the OCA Java SE 7 Programmer I exam, we won't cover it here. We'll work with only the boolean data type. Exam Tip: then isn't a keyword in Java and isn't supposed to be used with the if statement. Let's look at the use of some flavors by first defining a set of variables: score, result, name, and file, as follows: int score = 100; String result = ""; String name = "Lion"; java.io.File file = new java.io.File("F"); Figure 2 shows the use of if, if-else, and if-else-if-else constructs and compares them by showing the code side by side. Figure 2 Multiple flavors of if statements implemented using code Let's quickly go through the code used in above if, if-else, and if-else-if-else statements. In the following example code, if condition name.equals("Lion") evaluates to true, a value of 200 is assigned to the variable score: if (name.equals("Lion")) #A score = 200; #A #A Example of if construct In the following example, if condition name.equals("Lion") evaluates to true, a value of 200 is assigned to the variable score. If this condition were to evaluate to false, a value of 300 is assigned to the variable score: if (name.equals("Lion")) #A score = 200; #A else #A score = 300; #A #A Example of if else construct In the following example, if score is equal to 100, the variable result is assigned a value of A. If score is equal to 50, the variable result is assigned a value of B. If the score is equal to 10, the variable result is assigned a value of C. If score doesn't match either of 100, 50, or 10, a value of F is assigned to the variable result. An if-else-if-else construct may use different conditions for all its if constructs: if (score == 100) #A result = "A"; else if (score == 50) #B result = "B"; else if (score == 10) #C result = "C"; else #D result = "F"; #A Condition 1 -> score == 100 #B Condition 2 -> score == 50 #C Condition 3 -> score == 10 #D If none of previous conditions evaluate to true, execute this else Figure 3 shows the previous code. Figure 3 The execution of the if-else-if-else code Figure 3 makes clear multiple points: The last else statement is part of the last if construct and not any of the if constructs before it. The if-else-if-else is an if-else construct, where its else part defines another if construct. A few other programming languages, such as VB and C#, use if-elsif and if-elseif (without space) constructs to define if-else-if constructs. If you've programmed with any of these languages, note the difference is with respect to Java. The following code is equal to the previous code: if (score == 100) result = "A"; else if (score == 50) result = "B"; else if (score == 10) result = "C"; else result="F"; Again, note that none of the previous if constructs use then to define the code to execute if a condition evaluates to true. As mentioned previously, unlike other programming languages, then isn't a keyword in Java and isn't used with the if construct. Exam Tip The if-else-if-else is an if-else construct, where else part defines another if construct. The boolean expression used as a condition for if construct can also include assignment operation. Missing else blocks What happens if you don't define the else statements for an if construct? It's acceptable to define one course of action for an if construct, as follows (omitting the else part): boolean testValue = false; if (testValue == true) System.out.println("value is true"); But you can't define the else part for an if construct, skipping the if code block. The following code won't compile: boolean testValue = false; if (testValue == true) else #A System.out.println("value is false"); #A This won't compile What follows is another interesting and bizarre piece of code: int score = 100; if((score=score+10) > 110); #1 #1 Missing then or else part Line #1 is a valid line of code, even if it doesn't define both the then and else part of the if statement. In this case, if condition evaluates and that's it. The if construct doesn't define any code that should execute based on the result of this condition. Note if(testValue==true) is same as using if(testValue). Similarly, if(testValue==false) is same as using if(!testValue). Implications of presence and absence of {} in if-else constructs You can execute a single statement or a block of statements, when if condition evaluates to true or false values. A block of statement is marked by enclosing single or multiple statements within a pair of curly braces ({}). Examine the following code: String name = "Lion"; int score = 100; if (name.equals("Lion")) score = 200; What happens if you want to execute another line of code, if value of variable name is equal to Lion? Is the following code correct? String name = "Lion"; int score = 100; if (name.equals("Lion")) score = 200; name = "Larry"; #1 #1 Set name to Larry Exam Tip In the exam, watch out for code similar to the above mentioned if construct that uses misleading indentation. In the absence of a code block definition (marked with a pair of {}), only the statement following the if construct forms its part. What happens to the same code if you define an else part for your if construct as follows: String name = "Lion"; int score = 100; if (name.equals("Lion")) score = 200; name = "Larry"; #A else score = 129; #A This statement isn't part of the if construct In this case, the previous code won't compile. The compiler will report that the else part is defined without an if statement. If this leaves you confused, examine the following code, which is indented in order to emphasize the fact that line name = "Larry" isn't part of the else construct: String name = "Lion"; int score = 100; if (name.equals ("Lion")) score = 200; name = "Larry"; #A else #B score = 129; #A Right indentation to emphasize that this statement isn't part of the if construct #B else seems to be defined without a preceding if construct If you want to execute multiple statements for if construct, you should define them within a block of code. You can do so by defining all this code within curly braces ({}). To follow is an example: String name = "Lion"; int score = 100; if (name.equals("Lion")) { #A score = 200; #B name = "Larry"; #B } #C else score = 129; #A Start of code block #B Statements to execute if (name.equals("Lion")) evaluates to true #C End of code block Similarly, you may define multiple lines of code for the else part (incorrectly) as follows: String name = "Lion"; if (name.equals("Lion")) System.out.println("Lion"); else System.out.println("Not a Lion"); System.out.println("Again, not a Lion"); #1 #1 Not part of else construct. Will execute irrespective of the value of variable name The output of the above code is as follows: Lion Again, not a Lion Though code on line #1 seems to execute only if value of variable name matches with value Lion, this is not the case. It is indented incorrectly to trick you into believing that it is a part of the else block. The above code is same as the following code (with correct indentation): String name = "Lion"; if (name.equals("Lion")) System.out.println("Lion"); else System.out.println("Not a Lion"); System.out.println("Again, not a Lion"); #1 #1 Not part of else construct. Will execute irrespective of the value of variable name If you wish to execute the last two statements in the previous code, only if the if condition evaluates to false, you can do so by using {}: String name = "Lion"; if (name.equals("Lion")) System.out.println("Lion"); else { System.out.println("Not a Lion"); System.out.println("Again, not a Lion"); #1 } #1 Now part of else construct. Will execute only when if condition evaluates to false You can define another statement, construct or loop, to execute for an if condition, without using {}, as follows: String name = "Lion"; if (name.equals("Lion")) #A for (int i = 0; i < 3; ++i) #B System.out.println(i); #C #A if condition #B for loop is a single construct that will execute if name.equals("Lion") evaluates to true #C This code is part of the for loop defined at previous line System.out.println(i) is part of the for loop, and not an unrelated statement that follows the for loop. So this code is correct and gives the following output: 0 1 2 Appropriate vs. inappropriate expressions passed as arguments to an if statement The result of an expression used in an if construct must evaluate to a boolean or Boolean value. Given the following definition of variables: int score = 100; boolean allow = false; String name = "Lion"; Up next are examples of some of the valid expressions that can be passed on to an if construct. Note that using == is not a good practice to compare two String objects for equality. The correct way to compare two String objects is to use equals method from the String class. However, comparing two String values using == is a valid expression that returns a boolean value and may also be used in the exam: (score == 100) #A (name == "Lio") #B (score <= 100 || allow) #C (allow) #D #A Evaluates to true #B Evaluates to false #C Evaluates to true #D Evaluates to false Now comes the tricky part of passing an assignment operation to an if construct. What do you think is the output of the following code? boolean allow = false; if (allow = true) #A System.out.println("value is true"); else System.out.println("value is false"); #A This is assignment, not comparison You may think that because the value of the boolean variable allow is set to false, the previous code output's value is false. Revisit the code and notice that assignment operation allow = true assigns the value true to the boolean variable allow. Further, its result is also a boolean value, which makes it eligible to be passed on as an argument to the if construct, Although the previous code has no syntactical errors, it's a logical error-an error in the program logic. The correct code to compare a boolean variable with a boolean literal value should be defined as follows: boolean allow = false; if (allow == true) #A System.out.println("value is true"); else System.out.println("value is false"); #A This is comparison Exam Tip Watch out for the code in the exam that uses the assignment operator (=) to compare a boolean value in the if condition. It won't compare the boolean value; it'll assign a value to it. The correct operator to compare a boolean value is equality operator (==). Nested if constructs A nested if construct is an if construct defined within another if construct. Theoretically, you don't have a limit on the levels of nested if and if-else constructs. Whenever you come across nested if and if-else constructs, you need to be careful about determining the else part of an if statement. If this statement doesn't make a lot of sense, take a look at the following code and determine its output: int score = 110; if (score > 200) #1 if (score <400) #2 if (score > 300) System.out.println(1); else System.out.println(2); else #3 System.out.println(3); #3 #1 if (score>200) #2 if (score<400) #3 To which if does this else belongs? Based on the way the code is indented, you may believe that else at #3 belongs to the if defined at #1. But it belongs to the if defined at #2. To follow is the code with the correct indentation: int score = 110; if (score > 200) if (score <400) if (score > 300) System.out.println(1); else System.out.println(2); else #A System.out.println(3); #A #A This else belongs to the if with condition (score<400) Next, you need to understand how to do the following: How to define an else for an outer if, other than the one that it'll be assigned to by default How to determine to which if does an else belong in nested if constructs Both of these tasks are simple. Let's start with the first one. How to define an else for an outer if other than the one that it'll be assigned to by default The key point is to use curly braces, as follows: int score = 110; if (score > 200) { #1 if (score <400) if (score > 300) System.out.println(1); else System.out.println(2); } #2 else #3 System.out.println(3); #3 #1 Start if construct for score > 200 #2 End if construct for score > 200 #3 else for score > 200 The curly braces at #1 and #2 mark the start and the end of the if condition (score>200) defined at #1. Hence, the else at #3 that follows #2 belongs to the if defined at #1. How to determine to which if an else belongs in nested if constructs If code uses curly braces to mark the start and end of the territory of an if or else construct, it can be simple, as mentioned in the previous section, "How to define an else for an outer if than the one that it'll be assigned to by default." When the if constructs don't use curly braces, don't get confused by the code indentation. Try to match all if with their corresponding else in the following poorly indented code: if (score > 200) if (score <400) if (score > 300) System.out.println(1); else System.out.println(2); else System.out.println(3); Start working inside out, with the innermost if-else statement, matching else with its nearest unmatched if statement. Figure 4 shows how to match the if-else pairs for the previous code, marked with 1, 2, and 3. Figure 4 Matching if-else pairs for poorly indented code Summary We covered the different flavors of the if construct. You saw what happens when these constructs are used with and without curly braces {}. We also covered nested if and if-else constructs. The humble if-else construct can virtually define any set of simple or complicated conditions. OCA Java SE 7 Programmer I Certification Guide By Mala Gupta In the OCA Java SE 7 programmer exam, you'll be asked you'll be asked how to define and control the flow in your code. In this article, based on chapter 4 of OCA Java SE 7 Programmer I Certification Guide, author Mala Gupta How show you to use if, if-else, if-else-if-else and nested if constructs and the difference when these if constructs are used with and without curly braces {}. Here are some other Manning titles you might be interested in: Unit Testing in Java Lasse Koskela Making Java Groovy Kenneth Kousen Play for Java Nicolas Leroux and Sietse de Kaper
September 6, 2012
by Allen Coin
· 15,573 Views
article thumbnail
Building A Simple API Proxy Server with PHP
these days i’m playing with backbone and using public api as a source. the web browser has one horrible feature: it don’t allow you to fetch any external resource to our host due to the cross-origin restriction. for example if we have a server at localhost we cannot perform one ajax request to another host different than localhost. nowadays there is a header to allow it: access-control-allow-origin . the problem is that the remote server must set up this header. for example i was playing with github’s api and github doesn’t have this header. if the server is my server, is pretty straightforward to put this header but obviously i’m not the sysadmin of github, so i cannot do it. what the solution? one possible solution is, for example, create a proxy server at localhost with php. with php we can use any remote api with curl (i wrote about it here and here for example). it’s not difficult, but i asked myself: can we create a dummy proxy server with php to handle any request to localhost and redirects to the real server, instead of create one proxy for each request?. let’s start. problably there is one open source solution (tell me if you know it) but i’m on holidays and i want to code a little bit (i now, it looks insane but that’s me ). the idea is: ... $proxy->register('github', 'https://api.github.com'); ... and when i type: http://localhost/github/users/gonzalo123 and create a proxy to : https://api.github.com/users/gonzalo123 the request method is also important. if we create a post request to localhost we want a post request to github too. this time we’re not going to reinvent the wheel, so we will use symfony componets so we will use composer to start our project: we create a conposer.json file with the dependencies: { "require": { "symfony/class-loader":"dev-master", "symfony/http-foundation":"dev-master" } } now php composer.phar install and we can start coding. the script will look like this: register('github', 'https://api.github.com'); $proxy->run(); foreach($proxy->getheaders() as $header) { header($header); } echo $proxy->getcontent(); as we can see we can register as many servers as we want. in this example we only register github. the application only has two classes: restproxy , who extracts the information from the request object and calls to the real server through curlwrapper . request = $request; $this->curl = $curl; } public function register($name, $url) { $this->map[$name] = $url; } public function run() { foreach ($this->map as $name => $mapurl) { return $this->dispatch($name, $mapurl); } } private function dispatch($name, $mapurl) { $url = $this->request->getpathinfo(); if (strpos($url, $name) == 1) { $url = $mapurl . str_replace("/{$name}", null, $url); $querystring = $this->request->getquerystring(); switch ($this->request->getmethod()) { case 'get': $this->content = $this->curl->doget($url, $querystring); break; case 'post': $this->content = $this->curl->dopost($url, $querystring); break; case 'delete': $this->content = $this->curl->dodelete($url, $querystring); break; case 'put': $this->content = $this->curl->doput($url, $querystring); break; } $this->headers = $this->curl->getheaders(); } } public function getheaders() { return $this->headers; } public function getcontent() { return $this->content; } } the restproxy receive two instances in the constructor via dependency injection (curlwrapper and request). this architecture helps a lot in the tests , because we can mock both instances. very helpfully when building restproxy. the restproxy is registerd within packaist so we can install it using composer installer: first install componser curl -s https://getcomposer.org/installer | php and create a new project: php composer.phar create-project gonzalo123/rest-proxy proxy if we are using php5.4 (if not, what are you waiting for?) we can run the build-in server cd proxy php -s localhost:8888 -t www/ now we only need to open a web browser and type: http://localhost:8888/github/users/gonzalo123 the library is very minimal (it’s enough for my experiment) and it does’t allow authorization. of course full code is available in github .
September 2, 2012
by Gonzalo Ayuso
· 20,272 Views
article thumbnail
Using Spring Profiles and Java Configuration
My last blog introduced Spring 3.1’s profiles and explained both the business case for using them and demonstrated their use with Spring XML configuration files. It seems, however, that a good number of developers prefer using Spring’s Java based application configuration, so Spring have designed a way of using profiles with their existing @Configuration annotation. I’m going to demonstrate profiles and the @Configuration annotation using the Person class from my previous blog. This is a simple bean class whose properties vary depending upon which profile is active. public class Person { private final String firstName; private final String lastName; private final int age; public Person(String firstName, String lastName, int age) { this.firstName = firstName; this.lastName = lastName; this.age = age; } public String getFirstName() { return firstName; } public String getLastName() { return lastName; } public int getAge() { return age; } } Remember that the Guys at Spring recommend that Spring profiles should only be used when you need to load different types or sets of classes and that for setting properties you should continue using the PropertyPlaceholderConfigurer. The reason I’m breaking the rules is that I want to try to write the simplest code possible to demonstrate profiles and Java configuration. At the heart of using Spring profiles with Java configuration is Spring’s new @Profile annotation. The @Profile annotation is used attach a profile name to an @Configuration annotation. It takes a single parameter that can be used in two ways. Firstly to attach a single profile to an @Configuration annotation: @Profile("test1") and secondly, to attach multiple profiles: @Profile({ "test1", "test2" }) Again, I’m going to define two profiles “test1” and “test2” and associate each with a configuration file. Firstly “test1”: @Configuration @Profile("test1") public class Test1ProfileConfig { @Bean public Person employee() { return new Person("John", "Smith", 55); } } ...and then “test2”: @Configuration @Profile("test2") public class Test2ProfileConfig { @Bean public Person employee() { return new Person("Fred", "Williams", 22); } } In the code above, you can see that I'm creating a Person bean with an effective id of employee (this is from the method name) that returns differing property values in each profile. Also note that the @Profile is marked as: @Target(value=TYPE) ...which means that is can only be placed next to the @Configuration annotation. Having attached an @Profile to an @Configuration, the next thing to do is to activate your selected @Profile. This uses exactly the same principles and techniques that I described in my last blog and again, to my mind, the most useful activation technique is to use the "spring.profiles.active" system property. @Test public void testProfileActiveUsingSystemProperties() { System.setProperty("spring.profiles.active", "test1"); ApplicationContext ctx = new ClassPathXmlApplicationContext("profiles-config.xml"); Person person = ctx.getBean("employee", Person.class); String firstName = person.getFirstName(); assertEquals("John", firstName); } Obviously, you wouldn’t want to hard code things as I’ve done above and best practice usually means keeping the system properties configuration separate from your application. This gives you the option of using either a simple command line argument such as: -Dspring.profiles.active="test1" ...or by adding # Setting a property value spring.profiles.active=test1 to Tomcat’s catalina.properties So, that’s all there is to it: you create your Spring profiles by annotating an @Configuration with an @Profile annotation and then switching on the profile you want to use by setting the spring.profiles.active system property to your profile’s name. As usual, the Guys at Spring don’t just confine you to using system properties to activate profiles, you can do things programatically. For example, the following code creates an AnnotationConfigApplicationContext and then uses an Environment object to activate the “test1” profile, before registering our @Configuration classes. @Test public void testAnnotationConfigApplicationContextThatWorks() { // Can register a list of config classes AnnotationConfigApplicationContext ctx = new AnnotationConfigApplicationContext(); ctx.getEnvironment().setActiveProfiles("test1"); ctx.register(Test1ProfileConfig.class, Test2ProfileConfig.class); ctx.refresh(); Person person = ctx.getBean("employee", Person.class); String firstName = person.getFirstName(); assertEquals("John", firstName); } This is all fine and good, but beware, you need to call AnnotationConfigApplicationContext’s methods in the right order. For example, if you register your @Configuration classes before you specify your profile, then you’ll get an IllegalStateException. @Test(expected = IllegalStateException.class) public void testAnnotationConfigApplicationContextThatFails() { // Can register a list of config classes AnnotationConfigApplicationContext ctx = new AnnotationConfigApplicationContext( Test1ProfileConfig.class, Test2ProfileConfig.class); ctx.getEnvironment().setActiveProfiles("test1"); ctx.refresh(); Person person = ctx.getBean("employee", Person.class); String firstName = person.getFirstName(); assertEquals("John", firstName); } Before closing today’s blog, the code below demonstrates the ability to attach multiple @Profiles to an @Configuration annotation. @Configuration @Profile({ "test1", "test2" }) public class MulitpleProfileConfig { @Bean public Person tourDeFranceWinner() { return new Person("Bradley", "Wiggins", 32); } } @Test public void testMulipleAssignedProfilesUsingSystemProperties() { System.setProperty("spring.profiles.active", "test1"); ApplicationContext ctx = new ClassPathXmlApplicationContext("profiles-config.xml"); Person person = ctx.getBean("tourDeFranceWinner", Person.class); String firstName = person.getFirstName(); assertEquals("Bradley", firstName); System.setProperty("spring.profiles.active", "test2"); ctx = new ClassPathXmlApplicationContext("profiles-config.xml"); person = ctx.getBean("tourDeFranceWinner", Person.class); firstName = person.getFirstName(); assertEquals("Bradley", firstName); } In the code above, 2012 Tour De France winner Bradley Wiggins appears in both the “test1” and “test2” profiles.
August 30, 2012
by Roger Hughes
· 129,654 Views · 6 Likes
article thumbnail
Performance Test: Groovy 2.0 vs. Java
At the end of July 2012, Groovy 2.0 was released with support for static type checking and some performance improvements through the use of JDK7 invokedynamic and type inference as a result of type information now available through static typing. I was interested in seeing some estimate as to how significant the performance improvements in Groovy 2.0 have turned out and how Groovy 2.0 would now compare to Java in terms of performance. In case the performance gap had become minor, or at least acceptable, in the meantime, it would certainly be time to take a serious look at Groovy. Groovy has been ready for production for a long time. So, let's see whether it can compare with Java in terms of performance. The only performance measurement I could find on the Internet was this little benchmark measurment on jlabgroovy. The measurement only consists of calculating Fibonacci numbers with and without the @CompileStatic annotation. That's it; i.e., it's certainly not very meaningful in striving to get an overall impression. I was only interested in obtaining some rough estimate of how Groovy compares to Java as far as performance is concerned. Java performance measurement included Alas, no measurement was included in this little benchmark as to how much time Java takes to calculate Fibonacci numbers. So I "ported" the Groovy code to Java (here it is) and repeated the measurements. All measurements were done on an Intel Core2 Duo CPU E8400 3.00 GHz using JDK7u6 running on Windows 7 with Service Pack 1. I used Eclipse Juno with the Groovy plugin using the Groovy compiler version 2.0.0.xx-20120703-1400-e42-RELEASE. These are the figures I obtained without having a warm-up phase: Groovy 2.0 without @CompileStatic Groovy/Java performance factor Groovy 2.0 with @CompileStatic Groovy/Java performance factor Kotlin 0.1.2580 Java static ternary 4352ms 4.7 926ms 1.0 1005ms 924ms static if 4267ms 4.7 911ms 0.9 1828ms 917ms instance ternary 4577ms 2.7 1681ms 1.8 994ms 917ms instance if 4592ms 2.9 1604ms 1.7 1611ms 969ms I also did measurements with a warm-up phase of various length with the conclusion that there is no benefit for either language with or without the @CompileStatic. Since the Fibonacci algorithm is that recursive the warm-up phase seems to be "included" for any Fibonacci number that is not very small. We can see that the performance improvements due to static typing have made quite a difference. This little comparison does little justice, though. To me, the impression that static typing in Groovy has had in conjunction with type inference has led to significant performance improvements—and in the same way it has led to Groovy++ becoming very strong. With the @CompileStatic, the performance of Groovy is about 1-2 times slower than Java, and without Groovy, it's about 3-5 times slower. Unhappily, the measurements of "instance ternary" and "instance if" are the slowest. Unless we want to create masterpieces in programming with static functions, the measurements for "static ternary" and "static if" are not that relevant for most of the code with the ambition to be object-oriented (based on instances). Conclusion When Groovy was about 10-20 times slower than Java (see benchmark table almost at the end of this article) it is questionable whether the @CompileStatic was used or not. This means to me that Groovy is ready for applications where performance has to be somewhat comparable to Java. Earlier, Groovy (or Ruby, Closure, etc.) could only serve as a plus on your CV because of the performance impediment (at least here in Europe). New JVM kid on the block: Kotlin I added the figures for Kotlin as well (here is the code). Kotlin is a relatively new statically typed JVM-based Java-compatible programming language. Kotlin is more concise than Java by supporting variable type inferences, higher-order functions (closures), extension functions, mixins and first-class delegation, etc. Contrary to Groovy, it is more geared towards Scala, but also integrates well with Java. Kotlin is still under development and has yet to be officially released. So the figures have to be taken with caution as the guys at JetBrains are still working on the code optimization. Ideally, Kotlin should be as fast as Java. The measurements were done with the current "official" release 0.1.2580. And what about future performance improvements? At the time when JDK1.3 was the most recent JDK, I still earned my pay with Smalltalk development. At that time the performance of VisualWorks Smalltalk (now Cincom Smalltalk) and IBM VA for Smalltalk (now owned by Instantiations) was very good comparable to Java. And Smalltalk is a dynamically typed language, like pre-Goovy 2.0 and Ruby, where the compiler cannot make use of type inference to do optimizations. Because of this, it always appeared strange to me that Groovy, Ruby and other JVM-based dynamic languages had such a big performance penalty compared to Java when Smalltalk had not. From that point of view I think there's still room for Groovy performance improvements beyond @CompileStatic.
August 28, 2012
by Oliver Plohmann
· 49,873 Views · 1 Like
article thumbnail
Convert Any Image to HTML5 Canvas
Even before talking about the technicalities of converting an image to a canvas element, check out the demo! DEMO input any image URL there and hit the convert button! P.S : It also accepts data-uri! The code : function draw() { // Get the canvas element and set the dimensions. var canvas = document.getElementById('canvas'); canvas.height = window.innerHeight; canvas.width = window.innerWidth; // Get a 2D context. var ctx = canvas.getContext('2d'); // create new image object to use as pattern var img = new Image(); img.src = document.getElementById('url').value; img.onload = function(){ // Create pattern and don't repeat! var ptrn = ctx.createPattern(img,'no-repeat'); ctx.fillStyle = ptrn; ctx.fillRect(0,0,canvas.width,canvas.height); } } The Magic Behind : All the credits goes to createPattern() nsIDOMCanvasPattern createPattern(in nsIDOMHTMLElement image, in DOMString repetition); Elobrating : context.createPattern(image,"repeat|repeat-x|repeat-y|no-repeat"); Hope this was useful, anyway there are loads of fun with the canvas element, happy hacking! Edit 0 After the interesting question by @pinkham in the comment section, from the page of MDN : Although you can use images without CORS approval in your canvas, doing so taints the canvas. Provided that you have a server hosting images along with appropriate Access-Control-Allow-Origin header, you will be able to save those images to localStorage as if they were served from your domain. var img = new Image, canvas = document.createElement("canvas"), ctx = canvas.getContext("2d"), src = "http://example.com/image"; // insert image url here img.crossOrigin = "Anonymous"; img.onload = function() { canvas.width = img.width; canvas.height = img.height; ctx.drawImage( img, 0, 0 ); localStorage.setItem( "savedImageData", canvas.toDataURL("image/png") ); } img.src = src; // make sure the load event fires for cached images too if ( img.complete || img.complete === undefined ) { img.src = "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///ywAAAAAAQABAAACAUwAOw=="; img.src = src; }
August 27, 2012
by Hemanth HM
· 21,406 Views
article thumbnail
Handle the Middle of a XML Document with JAXB and StAX
Recently I have come across a lot of people asking how to read data from, or write data to the middle of an XML document. In this post I will demonstrate how this can be done using JAXB with StAX. Note: JAXB (JSR-222) and StAX (JSR-173) implementations are included in the JDK/JRE since Java SE 6. XML (input.xml) We will be using a SOAP message as our sample XML. The outer portions of the XML document represent information relevant to the Web Service and the inner portions (lines 5-8) represent the data we want to convert to our domain model. Jane Doe Java Model Our Java model consists of a single domain class. The concepts in this example also apply to larger domain models. package blog.stax.middle; import javax.xml.bind.annotation.*; @XmlAccessorType(XmlAccessType.FIELD) public class Customer { @XmlAttribute int id; String firstName; String lastName; } Unmarshal Demo To unmarshal from the middle of an XML document all we need to do is the following: Create an XMLStreamReader from the XML input (line 12). Advance the XMLStreamReader to the return element (lines 13-16). Unmarshal an instance of Customer from the XMLStreamReader (line 20) package blog.stax.middle; import javax.xml.bind.*; import javax.xml.stream.*; import javax.xml.transform.stream.StreamSource; public class UnmarshalDemo { public static void main(String[] args) throws Exception { XMLInputFactory xif = XMLInputFactory.newFactory(); StreamSource xml = new StreamSource("src/blog/stax/middle/input.xml"); XMLStreamReader xsr = xif.createXMLStreamReader(xml); xsr.nextTag(); while(!xsr.getLocalName().equals("return")) { xsr.nextTag(); } JAXBContext jc = JAXBContext.newInstance(Customer.class); Unmarshaller unmarshaller = jc.createUnmarshaller(); JAXBElement jb = unmarshaller.unmarshal(xsr, Customer.class); xsr.close(); Customer customer = jb.getValue(); System.out.println(customer.id); System.out.println(customer.firstName); System.out.println(customer.lastName); } } Output Below is the output from running the unmarshal demo. 123 Jane Doe Marshal Demo To marshal to the middle of an XML document all we need to do is the following: Create an XMLStreamWriter for the XML output (line 18). Start the document and write the outer elements (lines 19-22). Set the Marshaller.JAXB_FRAGMENT property on the Marshaller (line 26) to prevent the XML declaration from being written. Marshal an instance of Customer to the XMLStreamWriter (line 27). End the document, this will close any elements that have been opened (line 29). package blog.stax.middle; import javax.xml.bind.*; import javax.xml.namespace.QName; import javax.xml.stream.*; public class MarshalDemo { public static void main(String[] args) throws Exception { Customer customer = new Customer(); customer.id = 123; customer.firstName = "Jane"; customer.lastName = "Doe"; QName root = new QName("response"); JAXBElement je = new JAXBElement(root, Customer.class, customer); XMLOutputFactory xof = XMLOutputFactory.newFactory(); XMLStreamWriter xsw = xof.createXMLStreamWriter(System.out); xsw.writeStartDocument(); xsw.writeStartElement("S", "Envelope", "http://schemas.xmlsoap.org/soap/envelope/"); xsw.writeStartElement("S", "Body", "http://schemas.xmlsoap.org/soap/envelope/"); xsw.writeStartElement("ns0", "findCustomerResponse", "http://service.jaxws.blog/"); JAXBContext jc = JAXBContext.newInstance(Customer.class); Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true); marshaller.marshal(je, xsw); xsw.writeEndDocument(); xsw.close(); } } Output Below is the output from running the marshal demo. Note that the output from running the demo code will appear on a single line, I have formatted the output here to make it easier to read. Jane Doe
August 27, 2012
by Blaise Doughan
· 20,177 Views
article thumbnail
Groovy Closures Do Not Have Access to Private Methods in a Super Class
Recently I came upon a groovy oddity. (At least it is perceived by me to be an oddity). Closures in a groovy class do not have access to a private method if that method is defined in the superclass. This seems odd paired against the fact that regular methods in a super class can access private method defined in the super class Background Groovy closures have the same scope access to class member variables and methods as a regular groovy method. In other words, closures are bound to variables in the scope they are defined. See the codehaus link for the official documentation: http://groovy.codehaus.org/Closures This implies that a closure will play by the rules of Object Orientation in the Java language. However, I found that closures do not have access to private methods that are defined in a super class. The best way to demonstrate is through a short example from Grails. I have used TDD for the example. Note this is a dummy case with no business purpose. Later on I will offer a more reasonable scenario in the business context where I encountered this scenario. Simple Groovy Example Take a class that has a closure, a public method, and a private method. Then extend that class. Try and invoke the closure. We get an error. package closure.access class SendCheckService { def calendarService /** * Closure that invokes a private method */ def closureToSendCheck = { sendPersonalCheck() } def regularMethod() { sendPersonalCheck() } /* * Private method that we want to see executed */ private sendPersonalCheck() { println "Sending Personal Check" } } class FooService extends SendCheckService { } Here are some tests the demonstrate the error. package closure.access import grails.test.* class SendCheckServiceTests extends GrailsUnitTestCase { def sendCheckService protected void setUp() { super.setUp() sendCheckService = new SendCheckService() } /** * Invocation of the closure from the super class prints out the message: * "Sending Personal Check" */ void testClosureToSendCheck() { sendCheckService.closureToSendCheck() } /** * Invocation of the method from the super class prints out the message: * "Sending Personal Check" */ void testRegularMethod() { sendCheckService.regularMethod() } /** * Invocation of the the closure from the subclass yields an error: * groovy.lang.MissingMethodException: No signature of method: closure.access.FooService.sendPersonalCheck() * is applicable for argument types: () values: [] * This essentially means it does not exists. */ void testFoo_ClosureToSendCheck() { def fooService = new FooService() fooService.closureToSendCheck() } /** * Invocation of the method does not yield and error! */ void testFoo_RegularMethod() { def fooService = new FooService() fooService.regularMethod() } } All is Well testClosureToSendCheck() executes fine where the closure can access the private method. This is as expected. It is all well and good because we have not extended class yet. testRegularMethod() just demonstrates regular OO principles. A method is able to invoke private methods. Not as Expected testFoo_ClosureToSendCheck() invokes a subclass of SendCheckService. It calls the same closure, yet we get served up a MissingMethodException. testFoo_RegularMethod() just contrasts testFoo_ClosureToSendCheck(). I invoked this test to show that we should be able to have closures access private methods because other regular methods can! Why Even Try to Have a Closure Call A Private Method This may be a double loop learning question any intelligent developer might ask. It questions why we need to even get into this mess. This is a valid point and should be explained. It is optimal to use closures in an attempt to reuse existing template logic. Let me give a simple business problem. Imagine we need to code a system that sends out various types of checks: personal checks and business checks. We have the stipulation that these two events must NEVER be done together. Only send a personal check at one instance, and send a business check at another time. However, they both need to follow the same logic. They must be sent on a business day (no holidays or weekends). Thus, we have a scenario where they need the same calendar logic, but it is needed separately. Duplicate Logic We could just code the calendar logic twice. (Remember sending business checks and personal checks together in one request cannot occur!) package closure.access class SendCheckService2 { def calendarService def triggerPersonalCheck () { if (calendarService.todayIsBusinessDay()) { sendPersonalCheck() } else { println "DO NOTHING" } } def triggerBusinessCheck() { if (calendarService.todayIsBusinessDay()) { sendBusinessCheck() } else { println "DO NOTHING" } } private sendPersonalCheck() { println "Sending Personal Check" } private sendBusinessCheck() { println "Sending Business Check" } } Use the DRY Principle In order to avoid this and follow DRY, we can use closures. Create a method that accepts a closure, and pass it the code snippets to execute in a closure. As a result we have the calendar logic defined once, but executed separately upon a different code snippet. package closure.access class SendCheckService3 { def calendarService def triggerPersonalCheck() { checkIfBusinessDayAndExecute(sendBusinessCheck) } def triggerBusinessCheck() { checkIfBusinessDayAndExecute(sendBusinessCheck) } def checkIfBusinessDayAndExecute(Closure closure) { if (calendarService.todayIsBusinessDay()) { closure() } else { println "DO NOTHING" } } /** * This is now a closure we can pass around */ def sendPersonalCheck = { println "Sending Personal Check" } /** * This is now a closure we can pass around */ def sendBusinessCheck = { println "Sending Business Check" } } In my real world scenario where I encountered the closure issue, it happened that my closure was trying to execute a private method in an abstract class. This is where I observed the problem. JVM Thoughts I honestly do not know the gory details behind why closures in super classes cannot access private methods, but I have an idea. Groovy creates closures by compiling them as inner classes. Since the subclass extends the superclass and then contains a closure, the inner class does not have access to the super class' methods. The reason why a method has access, is because it is compiled as one instance of the class. The closure implementation is not that way (being a inner class), and thus this is why we see a violation of Object Oriented behavior. If you have a better explanation or futher knowledge of the details behind this issue, please describe them in the comments. Thanks for taking the time to delve in this area. Thanks to Scott Risk for helping with examples.
August 26, 2012
by Nirav Assar
· 12,148 Views
article thumbnail
Adding Hibernate Entity Level Filtering feature to Spring Data JPA Repository
Original Article: http://borislam.blogspot.hk/2012/07/adding-hibernate-entity-level-filter.html Those who have used data filtering features of hibernate should know that it is very powerful. You could define a set of filtering criteria to an entity class or a collection. Spring data JPA is a very handy library but it does not have fitering features. In this post, I will demonstarte how to add the hibernate filter features at entity level. You can use this features when you are using Hibernate Entity Manager. We can just define annotation in your repositoy interface to enable this features. Step 1. Define filter at entity level as usual. Just use hibernate @FilterDef annotation @Entity @Table(name = "STUDENT") @FilterDef(name="filterBySchoolAndClass", parameters={@ParamDef(name="school", type="string"),@ParamDef(name="class", type="integer")}) public class Student extends GenericEntity implements Serializable { // add your properties ... } Step2. Define two custom annotations. These two annotations are to be used in your repository interfaces. You could apply the hibernate filter defined in step 1 to specific query through these annotations. @Target(ElementType.TYPE) @Retention(RetentionPolicy.RUNTIME) public @interface EntityFilter { FilterQuery[] filterQueries() default {}; } @Retention(RetentionPolicy.RUNTIME) public @interface FilterQuery { String name() default ""; String jpql() default ""; } Step3. Add a method to your Spring data JPA base repository. This method will read the annotation you defined (i.e. @FilterQuery) and apply hibernate filter to the query by just simply unwrap the EntityManager. You could specify the parameter in your hibernate filter and also the parameter in you query in this method. If you do not know how to add custom method to your Spring data JPA base repository, please see my previous article for how to customize your Spring data JPA base repository for detail. You can see in previous article that I intentionally expose the repository interface (i.e. the springDataRepositoryInterface property) in the GenericRepositoryImpl. This small tricks enable me to access the annotation in the repository interface easily. public List doQueryWithFilter( String filterName, String filterQueryName, Map inFilterParams, Map inQueryParams){ if (GenericRepository.class.isAssignableFrom(getSpringDataRepositoryInterface())) { Annotation entityFilterAnn = getSpringDataRepositoryInterface().getAnnotation(EntityFilter.class); if(entityFilterAnn != null){ EntityFilter entityFilter = (EntityFilter)entityFilterAnn; FilterQuery[] filterQuerys = entityFilter.filterQueries() ; for (FilterQuery fQuery : filterQuerys) { if (StringUtils.equals(filterQueryName, fQuery.name())) { String jpql = fQuery.jpql(); Filter filter = em.unwrap(Session.class).enableFilter(filterName); //set filter parameter for (Object key: inFilterParams.keySet()) { String filterParamName = key.toString(); Object filterParamValue = inFilterParams.get(key); filter.setParameter(filterParamName, filterParamValue); } //set query parameter Query query= em.createQuery(jpql); for (Object key: inQueryParams.keySet()) { String queryParamName = key.toString(); Object queryParamValue = inQueryParams.get(key); query.setParameter(queryParamName, queryParamValue); } return query.getResultList(); } } } } } return null; } Last Step: example usage In your repositry, define which query you would like to apply hibernate filter through your @EntityFilter and @FilterQuery annotation. @EntityFilter ( filterQueries = { @FilterQuery(name="query1", jpql="SELECT s FROM Student LEFT JOIN FETCH s.Subject where s.subject = :subject" ), @FilterQuery(name="query2", jpql="SELECT s FROM Student LEFT JOIN s.TeacherSubject where s.teacher = :teacher") } ) public interface StudentRepository extends GenericRepository { } In your service or business class that inject your repository, you could just simply call the doQueryWithFilter() method to enable the filtering function. @Service public class StudentService { @Inject private StudentRepository studentRepository; public List searchStudent( String subject, String school, String class) { List studentList; // Prepare parameters for query filter HashMap inFilterParams = new HashMap(); inFilterParams.put("school", "Hong Kong Secondary School"); inFilterParams.put("class", "S5"); // Prepare parameters for query HashMap inParams = new HashMap(); inParams.put("subject", "Physics"); studentList = studentRepository.doQueryWithFilter( "filterBySchoolAndClass", "query1", inFilterParams, inParams); return studentList; } }
August 24, 2012
by Boris Lam
· 56,829 Views · 1 Like
article thumbnail
Advanced Dependency Injection With Guice
The more I use dependency injection (DI) in my code, the more it alters the way I see both my design and implementation. Injection is so convenient and powerful that you end up wanting to make sure you use it as often as you can. And as it turns out, you can use it in many, many places. Let’s cover briefly the most obvious scenarios where DI, and more specifically, Guice, are a good fit: objects created either at class loading time or very early in your application. These two aspects are covered by either direct injection or by providers, which allow you to start building some of your object graph before you can inject more objects. I won’t go too much in details about these two use cases since they are explained in pretty much any Guice tutorial you can find on the net. Once the injector has created your graph of objects, you are pretty much back to normal and instantiating your “runtime objects” (the objects you create during the life time of your application) the normal way, most likely with “new” or factories. However, you will quickly start noticing that you need some runtime information to create these objects, other parts of them could be injected. Let’s take the following example: we have a GeoService interface that provides various geolocation functions, such as telling you if two addresses are close to each other: public interface GeoService { /** * @return true if the two addresses are within @param{miles} * miles of each other. */ boolean isNear(Address address1, Address address2, int miles); } Then you have a Person class which uses this service and also needs a name and an address to be instantiated: public class Person { // Fields omitted public Person(String name, Address address, GeoService gs) { this.name = name; this.address = address; this.geoService = gs; } public boolean livesNear(Person otherPerson) { return geoService.isNear(address, otherPerson.getAddress(), 2 /* miles */); } } Something odd should jump at you right away with this class: while name and address are part of the identity of a Person object, the presence of the GeoService instance in it feels wrong. The service is a singleton that is created on start up, so a perfect candidate to be injected, but how can I achieve the creation of a Person object when some of its information is supplied by Guice and the other part by myself? Guice gives you a very elegant and flexible way to implement this scenario with “assisted injection”. The first step is to define a factory for our objects that represents exactly how we want to create them: public interface PersonFactory { Person create(String name, Address address); } Since only name and address participate in the identity of our Person objects, these are the only parameters we need to construct our objects. The other parameters should be supplied by Guice so we modify our Person constructor to let Guice know: @Inject public Person(@Assisted String name, @Assisted Address address, GeoService geoService) { this.name = name; this.address = address; this.geoService = geoService; } In this code, I have added an @Inject annotation on the constructor and an @Assisted annotation on each parameter that I will be providing. Guice will take care of injecting the rest. Finally, we connect the factory to its objects when creating the module: Module module1 = new FactoryModuleBuilder() .build(PersonFactory.class); The important part here is to realize that we will never instantiate PersonFactory: Guice will. From now on, all we need to do whenever we want to instantiate a Person object is to ask Guice to hand us a factory: @Inject private PersonFactory personFactory; // ... Person p = personFactory.create("Bob", new Address("1 Ocean st")); If you want to find out more, take a look at the main documentation for assisted injection, which explains how to support overloaded constructors and also how to create different kinds of objects within the same factory. Wrapping up Let’s take a look at what we did. First, we started with a suspicious looking constructor: public Person(String name, Address address, GeoService s) { This constructor is suspicious because it accepts parameters that do not participate in the identity of the object (you won’t use the GeoService parameter when calculating the hash code of a Person object). Instead, we replaced this constructor with a factory that only accepts identity fields: public interface PersonFactory { Person create(String name, Address address); } and we let Guice’s assisted injection take care of creating a fully formed object for us. This observation leads us to the Identity Constructor rule: If a constructor accepts parameters that are not used to define the identity of the objects, consider injecting these parameters. Once you start looking at your objects with this rule in mind, you will be surprised to find out how many of them can benefit from assisted injection.
August 23, 2012
by Cedric Beust
· 36,595 Views · 2 Likes
article thumbnail
How to Migrate Drupal to Azure Web Sites
DrupalCon Munich is next week, and I am lucky enough to be going. As part of preparing for the conference, I thought it would be worthwhile to see just how easy (or difficult) it would be to migrate an existing Drupal site to Windows Azure Web Sites. So, in this post, I’ll do just that. Fortunately, because Windows Azure Web Sites supports both PHP and MySQL, the migration process is relatively straightforward. And, because Drupal and PHP run on any platform, the process I’ll describe should work for moving Drupal to Windows Azure Web Sites regardless of what platform you are moving from. Of course, Drupal installations can vary widely, so YMMV. I tested the instructions below on relatively small (and simple) Drupal installation running on CentOS 5. (Unfortunately, I won’t be using Drush since it isn’t supported on Windows Azure Websites.) If you are considering moving a large and complex Drupal application, may want to consider moving to Windows Azure Cloud Services (more information about that here: Migrating a Drupal Site from LAMP to Windows Azure). Before getting started, it’s worth noting that Windows Azure Websites lets you run up to 10 Web Sites for free in a multitenant environment. And, you can seamlessly upgrade to private, reserved VM instances as your traffic grows. To sign up, try the Windows Azure 90-day free trial. 1. Create a Windows Azure Web Site and MySQL database There is a step-by-step tutorial on http://www.windowsazure.com that walks you through creating a new website and a MySQL database, so I’ll refer you there to get started: Create a PHP-MySQL Windows Azure web site and deploy using Git. If you intend to use Git to publish your Drupal site, then go ahead and follow the instructions for setting up a Git repository. Make sure to follow the instructions in the Get remote MySQL connection information section as you will need that information later. You can ignore the remainder of the tutorial for the purposes of deploying your Drupal site, but if you are new to Windows Azure Web Sites (and to Git), you might find the additional reading informative. Ok, now you have a new website with a MySQL database, your have your MySQL database connection information, and you have (optionally) created a remote Git repository and made note of the Git deployment instructions. Now you are ready to copy your database to MySQL in Windows Azure Web Sites. 2. Copy database to MySQL in Windows Azure Web Sites I’m sure there is more than one way to copy your Drupal database, but I found the mysqldump tool to be effective and easy to use. To copy from a local machine to Windows Azure Web Sites, here’s the command I used: mysqldump -u local_username --password=local_password drupal | mysql -h remote_host -u remote_username --password=remote_password remote_db_name You will, of course, have to provide the username and password for your existing Drupal database, and you will have to provide the hostname, username, password, and database name for the MySQL database you created in step 1. This information is available in the connection string information that you should have noted in step 1. i.e. You should have a connection string that looks something like this: Database=remote_db_name;Data Source=remote_host;User Id=remote_username;Password=remote_password Depending on the size of your database, the copying process could take several minutes. Now your Drupal database is live in Windows Azure Websites. Before you deploy your Drupal code, you need to modify it so it can connect to the new database. 3. Modify database connection info in settings.php Here, you will again need your new database connection information. Open the /drupal/sites/default/setting.php file in your favorite text editor, and replace the values of ‘database’, ‘username’, ‘password’, and ‘host’ in the $databases array with the correct values for your new database. When you are finished, you should have something similar to this: $databases = array ( 'default' => array ( 'default' => array ( 'database' => 'remote_db_name', 'username' => 'remote_username', 'password' => 'remote_password', 'host' => 'remote_host', 'port' => '', 'driver' => 'mysql', 'prefix' => '', ), ), ); Be sure to save the settings.phpfile, then you are ready to deploy. 4. Deploy Drupal code using Git or FTP The last step is to deploy your code to Windows Azure Web Sites using Git or FTP. If you are using FTP, you can get the FTP hostname and username from you website’s dashboard. Then, use your favorite FTP client to upload your Drupal files to the /site/wwwroot folder of the remote site. If you are using Git, you need to set up a Git repository in Windows Azure Web Sites (steps for this are in the tutorial mentioned earlier). And, you will need Git installed on your local machine. Then, just follow the instructions provided after you created the repository: One note about using Git here: depending on your Git settings, your .gitignore file (a hidden file and a sibling to the .git folder created in your local root directory after you executed git commit), some files in your Drupal application may be ignored. In my case, all the files in the sites directory were ignored. If this happens, you will want to edit the .gitignore file so that these files aren’t ignored and redeploy. After you have deployed Drupal to Windows Azure Web Sites, you can continue to deploy updates via Git or FTP. Related information If you are looking for more information about Windows Azure Web Sites, these posts might be helpful: Windows Azure Websites- A PHP Perspective Windows Azure Websites, Web Roles, and VMs- When to use which- Configuring PHP in Windows Azure Websites with .user.ini Files One last thing you might consider, depending on your site, is using the Windows Azure Integration Module to store and serve your site’s media files.
August 19, 2012
by Brian Swan
· 10,234 Views
article thumbnail
8 Ways to Improve Your Java EE Production Support Skills
This article will provide you with 8 ways to improve your production support skills which may help you better enjoy your IT support job.
August 15, 2012
by Pierre - Hugues Charbonneau
· 32,550 Views · 2 Likes
  • Previous
  • ...
  • 432
  • 433
  • 434
  • 435
  • 436
  • 437
  • 438
  • 439
  • 440
  • 441
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×